Supported File Types

Overview


This technical note explains and details the capacity of file processing and hosting within the Lexbe eDiscovery Platform.


Background


Upon upload, the Lexbe eDiscovery Platform (LEP) creates a document record and hosts all native files in the ORIGINAL tab of the Doc Viewer.  The native file is accessible and can be viewed directly within the Doc Viewer where applicable, or alternatively can be easily downloaded and viewed in its native viewer on a local computer.


Through automatic processing, LEP converts all supported files to PDF.  If a file is unable to be converted to PDF (media file, incompatible, etc.) it will receive a placeholder.


LEP has the ability to both extract text from a native file as well as OCR the converted PDF and/or other image files (e.g., JPG, PNG TIFF, BMP, image-based PDFs).  This provides the most robust search index and acts as a safeguard to capture all available text.  For native files unable to be converted to PDF, the available text is extracted and added to the search index.  For image based files, text is recognized through OCR and added to the search index.  For some files, text is both extracted and OCR’d.  For more information see Uber Index


LEP also offers a "DeNIST" process per request to further filter data sets.  "NIST" in DeNIST is the National Institute of Standards and Technology.  This list is maintained and updated several times per year as part of the National Software Reference Library Project.  Through the DeNIST process, LEP compares all ESI in the collection against the National Software Library list and removes files matching those on the list. These are known system files and therefore unlikely to contain relevant information. For more information see DeNIST


Details


The spreadsheet attached at the bottom of the page outlines all file types recognized by LEP and details the functionality LEP can provide with regard to each type.


The first tab “Supported for Processing” details specific file types and their supported processing elements.  The columns outlined are below. The second tab “Recognized File Types” lists over 1,000 extensions that LEP can recognize and host. 



Extension

File extension

Application

Corresponding native program

File Type

General file type description

Container Expansion Supported

Documents within container file can be expanded and processed as individual documents; SourceFilePath maintained to show container file relationship 

Native Text Extraction

When text is available, it will be extracted from these files types, added to search index and maintained on the HTML tab of Doc Viewer

Metadata Extraction

When metadata is available, it will be extracted and maintained within LEP’s built in metadata fields

Automatic PDF/TIFF Conversion


These file types will be automatically converted to PDF

Manual PDF/TIFF Conversion

These file types can be converted to PDF manually and per request (Professional Services charges apply).

Supported OCR


When text is available, it will be OCR’d, added to search index and maintained on the TEXT tab of Doc Viewer

Expected Placeholder

These file types are generally not compatible with PDF conversion (container file, media file, etc)

Audio Transcription

These media files can be transcribed per request (transcription charges apply); see Auto Transcription

Searchable in Uber Index

All files on this list, Supported for Processing, are searchable in Uber Index



Additional Information


For any questions or additional information, please contact professionalservices@lexbe.com




Ĉ
Support eDiscoveryPlatform.com,
Jul 28, 2020, 9:01 AM