QC of Self-Uploaded TIFF Productions
This technical note provides suggestions for performing quality control on a TIFF upload in the Lexbe eDiscovery Platform (LEP).
Because production uploads are an automated process, if there is a failure caused by document corruption, non-standard load files, or other issues, LEP does not and often cannot capture the error.
Standard Metadata Processing and Load File Fields
During upload processing, LEP extracts and uses the information from standard load file output for DAT, OPT (Concordance, Relativity, Allegro, iPro, iConnect) and DII data files(Summation). See Standard Load File Fields for more information.
TIFF load files can be non-standard or corrupt for various reasons, including corrupt or missing TIFF files, corrupt or missing text files, corrupt or missing data, field inconsistencies, file count mismatches, misaligned or missing metadata, or other non-defined matters. When a non-standard or corrupt load file is uploaded, the available text files will index and be searchable, but data loss or inconsistency will affect metadata.
TIFFs with standard Concordance load file can also be uploaded. See TIFF Image Dat Load File for more information.
The Load File
When uploading a TIFF production, the user must also generate a load file to combine the files properly into the same record, create a proper case index, and avoid duplicates.
QC Steps Before Uploading TIFF Productions
The Images folder count should match the Produced Bates Range in pages.
Open selected TIFF images to confirm that the Bates number is properly stamped, quality is good, and the Bates file name matches the stamped Bates.
Compare selected TIFF images with text in the corresponding text file.
Create a proper folder structure including four sub-folders with their file titles all in CAPS, as follows:
IMAGES: Includes all the TIFF and other image files (image files are page based). When uploading TIFF productions, LEP will combine all the TIFF pages of a document into one record.
LOADFILES: Includes the file mapping OPT load file.
ORIGINALS: Includes all the native files (Word, Excel, JPG, PGN, etc.).
TEXT: Includes all the Text files that are document based.
Upload a production to LEP from the Case->Add Case Documents page using a compressed ZIP file format. The zipped file name must end with the extension ".lexbeupload.zip" (e.g. Prod001.lexbeupload.zip) including a file mapping Excel.
See Self Upload a TIFF Production for more information.
QC Steps After Uploading TIFF Productions
Check Original File Count. From the Case->Add Case Documents page select an upload batch job by the title to go to Browse and see the documents uploaded for the batch.
This will redirect to the Browse page and display the total number of files uploaded in that batch, to ensure that all documents from the original TIFF Production were processed during the upload. TIFF uploads expand minimally since the Normalized PDF is only slightly larger than the TIFF (they include OCR text).
The user may also check the number of rows in the .dat load file. Open the folder containing the original version of the production and locate the .dat file.
Open this file in Excel. The number of rows in the document (minus the title row, if present) is the number of documents in the production. This number should match the document count in LEP. Verify that the page count is correct for the upload batch as follows: Display the page count column by clicking "Show Fields" on the left side menu, check the "Pages" box in the pop up, and apply the change by clicking OK.
Pages is now a visible column in your Browse screen.
Click on Select All, to select all the documents in the batch and export a log to Excel.
Open the Excel log, use the =SUM Excel formula to add the numbers in the Pages column.
Compare this number to the number of pages in the source for the upload batch.
How do I know how many pages were in the upload batch?
The number of .tiff files in the source upload folder can be counted using a search within the folder. Since TIFF files are page based, this will be the page count for the upload. Open the source directory in Windows Explorer. Be sure to open the top level folder containing all .TIF files uploaded and all sub-folders containing .TIF files. This folder will most likely be titled "IMAGES" For purposes of the example below, double click to enter the folder titled IMAGES and search within that folder.
Within the folder search for: *.TIF
The search results will return a count of all .TIF files in the directory (at the bottom of the window). This is the number that should match the page count in the Platform (the page number counted in Excel).
Checking Documents for Accuracy of Information. View TIFF files via an online browser (required browser settings) or, alternatively, open the files from a link in Microsoft Office or other native applications. See Native View for more information.
Search Index. From the Document Viewer->Text or HTML tabs, verify the search index results for supported files. For TIFFs that include text in the file, the text is indexed into the application search engine for full-featured search and retrieval. See OCRed images for more information.
This helps detect page numbers in the index that do not match up with those in the text. This would indicate the document has data-related issues which should be addressed. For example, from Browse, a TIFF document is shown to have six pages, but the same document in the Document Viewer->HTML page displays only one page.
Spot Check Placeholder Files. Supported file types are converted to PDF or TIFF as part of an automated process. A placeholder is generated when a file is not converted. See Placeholder File for more information.
How often should quality control be applied?
Repeat all of the steps outlined above on each batch of TIFF productions self-uploaded.
Professional Support Services for Manual Conversions
We offer Project Management and Professional Services (billable hourly) should you need further assistance.