Skip to content

Scannable File Formats#

Discovery can scan the following file formats:

  • Structured data with taggable content and metadata:

    • .csv
    • .tsv
    • .json
    • .parquet
    • .orc
    • .avro
    • .avro (nested)
    • .parquet (nested)
    • .json (nested)
    • .sas
    • .xml
    • .html
  • Compressed/archive data with taggable content and metadata:

    • .snappy.parquet
    • .snappy.orc
    • .snappy.avro
    • .zlib.orc
    • .zlib.parquet
    • .zlib.avro
    • .gzip (single or multiple files)
    • .zip (single or multiple files)
    • .jar (single or multiple files)
    • .tar.gz (single or multiple files)
    • .gz (single or multiple files)
    • .lzo/.lzop
  • Unstructured data with taggable content and metadata:

    • .pdf
    • .txt
    • .dat
    • .xls
    • .xlsx
    • .doc
    • .docx
  • Media data with taggable metadata. For the following file formats, Discovery only supports metadata extraction:

    • .jpeg
    • .mp4
    • .mpeg

Last update: September 28, 2021