Supported File Types for Automated PDF and TIFF Creation

This Technical Note describes supported file types used and supported in the Lexbe eDiscovery Platform (LEP).

General

The Lexbe eDiscovery Platform (LEP) supports a number of file formats for native review. For files that include text in the file, this text is indexed into the application search engine for full-featured search and retrieval.

Prior to converting files, LEP applies container file expansion, DeNIST, and extension repair procedures.  See Automated ESI Processing for more information.  

Files that do not convert as part of the automated processing services are marked with a placeholder file, either in PDF or TIFF, depending on the service ordered.

Prior to converting files, LEP applies container file expansion, DeNIST, and extension repair procedures.  See Automated ESI Processing for more information.  

Files that do not convert as part of the automated processing services are marked with a placeholder file, either in PDF or TIFF, depending on the service ordered.  

Supported File Types for Automated PDF and TIFF Creation 

LEP identifies and attempts to convert (to TIFF or PDF, depending on the service), the following file types:

 Ext  Application/Description
 Type
  bmp    Image BMP    Image
  class   Java programming file  Text
  config    Application configuration File  Text
  css    Cascading style sheet, web page support  Text
  csv   Comma-separated values  Text
  doc    Microsoft Word   Text
  docx    Microsoft Word   Text
  eml
  Microsoft Outlook Express email
 Email
  gif   Image GIF  Image
  htm   HTML web page  Web Page
  html    HTML web page   Web Page
  ics    iCalendar file   Text
  inf    Setup Information File  Text
  ini    Text configuration file  Text
  jpeg   JPEG   Image
  jpg    JPG  Image
  js   Javascript programming file  Text
  json   Javascript object notation file  Text
  lnk    Windows File Shortcut  Text
  log   Appplication log file  Text
  manifest    Java programming file  Text
  mht    HTML web page  Web Page
  mht   MHT archives saved by Internet Explorer  Web page
  msg
  Micorsoft Outlook email
 Email
  pdf   Adobe Acrobat, converted from text  Text
  pdf    Adobe Acrobat, image only  Image
  pdf   Adobe Acrobat, text under image  Image
  php    PHP programming file  Text
  png    PNG image   Image
  pps   Microsoft PowerPoint    Presentation
  ppsx    Microsoft PowerPoint   Presentation
  ppt    Microsoft PowerPoint   Presentation
  pptx   Microsoft PowerPoint   Presentation
  pst    Microsoft Outlook data files
 Container
  py    Python Programming Script    Text
  rar    RAR  Container
  rtf    Microsoft Rich Text Format   Text
  tif    TIF  Image
  tiff    TIFF   Image
  txt    ASCII   Text
  url   Unliform Resource Locator file  Text
  vcf    Vcard contact information file   Text
  xls    Microsoft Excel   Spreadsheet
  xlsm   Microsoft Excel   Spreadsheet
  xlsx   Microsoft Excel   Spreadsheet
  xml     XML text  Text
  zip   Archive   Container                
   
Autocad Supported File Types

 Ext  Application/Description
 Type
 dwg  Autocad Native Format  Design data
 dxf   Autocad Drawing Exchange Format  Design data

Mac Supported File Types

 Ext  Application/Description
 Type
 .pages  iWork Pages for the Mac   Text
 .numbers  iWork Numbers for the Mac   Spreadsheet
 .key  iWork Keynotes for the Mac   Presentation

**MBX email files do not auto convert see below**

Failure to Convert Standard File Types

If a standard file type fails to convert, a placeholder file is created and it is noted in the database record that the file Failed to Convert. Standard file types may fail to convert for a variety of reasons, including: file corruption, file type mis-identification, print or data extraction issues, and password protection.  Inherently, password protected files are not searchable (even with dual index) and require extra due diligence.  

Some non-converted standard file types can be converted manually as a professional service (billable hourly or per GB, depending on file type and issues involved).

Other Files Not Converted

LEP does not auto-convert files other than the standard file types listed above.  Instead, a placeholder file is created and it is noted in the database record that the file was Not Converted (i.e., is not supported). 

Some non-converted, non-standard file types can be converted manually as a professional service (
billable hourly or per GB, depending on file type and issues involved).

Files that do not convert would include: media files, some container files, some email files, database files, and others, described in more detail below.  

Failure to convert a file does not mean it does not contain probative evidence, only that it did not convert with automated procedures.  These files should be reviewed and further steps taken to convert, when appropriate.

Media Files

Media files (video and audio) cannot be converted to TIFF or PDF.  They can be uploaded to LEP and coded.  They can sometimes be viewed or played depending on file type, connection speed, local browser, computer settings, and installed applications.  The following is a list of common media file types:

 Ext  Application/Description  Type
 avi
 Windows video
 Video
 asf   ASF  Video
 m4a  QuickTime  Video
 m4p  Apple  Video
 m4v    QuickTime  Video
 mov   QuickTime  Audio
 mp3  MP3   Audio
 swf   Flash
 Video
 wav   Wav file  Audio
 wma   WMA  Audio or Video
 wmf     Windows  Metafile Format 
 wmv  Windows Video
 Video

Unusual Container Files

As part of automated processing, LEP extracts ZIP and RAR files.  LEP does not automatically extract unusual container files. Examples, would include:  7z, G7, Iza, Jar, Sit.  Many container files can be extracted manually as a technical service (billed hourly). 

Email Files

The automated conversion process automatically converts Outlook PST and MSG files. 
The automated conversion process does convert productivity files used for the Mac (e.g., Microsoft Office for the Mac, Apple Numbers, etc.)  These files occasionally convert, depending on version and other factors.  However, more often, will not convert, mis-convert or generate internal Mac resource fork files.  Other email files or stores that can be manually converted as a professional service prior to automated processing are listed below:

 Ext  Application/Description
 dbx    Microsoft Outlook Express 5 and 6 
 mbs  Opera Email for Windows
 mbx   Eudora message files
 mbx   MBOX archives, including Google Mail, Apple Mail, and  Thunderbird
 nsf  Lotus Notes

Database Files

The automated conversion process does not convert database files.  Database types (depending on type and version) that can be processed manually as a professional service are listed below: 

 Ext  Application/Description
 dbf
 Oracle or other database                                                                  
 frm  MySQL
 myd  MySQL
 myi  MySQL
 mdb   Microsoft Access Database 
 mdbx   Microsoft Access Database 
 iif  Intuit interchange file, Quickbooks
 ldf  SQL Server
 qba  Quickbooks
 qbb  Quickbooks
 qbm  Quickbooks
 qbw  Quickbooks
 qbx  Quickbooks
 qby  Quickbooks

Mac Files

Mac email is not supported (Mac native or Outlook for the Mac) but these files can be converted by Professional Services. Best practice is for Mac productivity files to be converted to MS Office for Windows version 2007 or 2010 prior to upload. See also Pre-Processing MBox Files (Gmail and Apple Mail)

Other Files Not Automatically Converted

Many of the file types listed below can be converted to PDF or TIFF manually as a professional service (billed hourly). 

 Ext  Application/Description  Type
 123  Lotus 1-2-3 (*.123, *.wk?)  Spreadsheet
 art  Bitmap image file compressed by AOL  Image
 doc  Microsoft Word for the Mac   Text
 docs  Microsoft Word for the Mac   Text
 epsf  EPSF    Image
 hjt   Treepad HJT files  Other
 mpp  Microsoft Project   Other
 mppx   Microsoft Project   Other
 obd  Office binder document  Container
 qpw   Quattro Pro    Spreadsheet
 sam   Ami Pro   Text
 tmp  Application Temporary File  Other
 vdx   Visio XML files   Image
 vcf  MS Outlook, contact info  Text
 xlk  Backup file created by MS Excel  Other
 wb1    Quattro Pro   Spreadsheet
 wb2  Quattro Pro   Spreadsheet
 wb3   Quattro Pro   Spreadsheet
 wks   Microsoft Works
 Text
 wpg    WPG version 1.0 only   Image