Native Load File Spec

This technical note details the Lexbe eDiscovery Platform (LEP) specifications for accepting native load files for import or ingestion into LEP.  LEP can also accept unprocessed native files, Outlook PSTs, and files that have been processed to single-paged TIFF Images with load files (see TIFF Load File Spec).  Load file Field Names should be named pursuant to our Standard Metadata Processing & Load File Fields document.

A Native load file has the advantage of allowing processed native files to be loaded to LEP and associated with metadata included in the Native load file.

Native load files can be produced from a number of eDiscovery processing and Early Case Assessment tools, including LexisNexis Law PreDiscovery and Early Data Analyzer, iPro Allegro and review tools such as Kcura Relativity and iPro Eclipse.

The load file format that LEP accepts is also known in the industry as a Concordance Native load file.

Load Files

A standardized Native Concordance load file consists of a text-delimited file ending with the file extension DAT.  The Concordance Load File references one document per line, and includes document metadata.

Document Files

These files reference the following: 
Native files.  Native versions of files, expanded from container files (zips, rar, etc) and separated to the individual file level, but with minimal or no ESI processing applied.
Text files.  Single page text files containing ASCI text of processed ESI.  Text files are named by Bates number and end with the extension TXT.  These files are optional but recommended.

Folder Structure

These files reference the following: 

The Concordance load file grouping is located within the following folder structure, and must be present in this form as service input:

Level 1

Level 2

Level 3

Description

LOADFILES

VOL1.DAT


Concordance load file

LOADFILES

VOL1.TXT


Concordance load file with tab delimiter substitutions

ORIGINALS

/001, /002, etc

 XYZ 00177.DOCX

Original native file (entire multi-page document)


File Naming

Files are named by the Bates Title of the first page, including an optional Confidential suffix, and located inside the ORIGINALS folder in a sub-folder containing up to 5,000 files each.  The sub-folders use three digits and start with '.1'.  For example:
ORIGINALS/001/XYZ 000177.xlsx
ORIGINALS/001/XYZ 000180 Confidential.docx
ORIGINALS/001/XYZ 000181.jpg

Multi Page Extracted Text / OCR files
Text files are not required for incoming Native load files.

Concordance Load File Format

The Concordance Load File is named VOL1.DAT and should be located in the LOADFILES folder: LOADFILES/VOL1.DAT

The applicable fields in a Concordance Native load file format should be named as detailed in the LEP
 Standard Metadata Processing & Load File Fields document.

The first line contains the headers using the field names listed in the Standard Metadata Processing & Load File Fields document specification.  The text file should be delimited using the following character substitutions:


Text Character

ASCII Substitution

Comma

20

Quote

254

New line

174

Multi-Value

059

Nested Values

092


The size of each production to be loaded to LEP (native files, text files and load file) should be 50 GBs in size or less (before compression).  Productions larger than that should be split.  If a production is larger than 50 GBs, we treat it as a non-standard load file, and will split prior to load.
 
The number of Native files per directory should be limited to 5,000.