QC of Self-Uploaded Natives

This technical note provides quality control steps for a native upload to the Lexbe eDiscovery Platform (LEP).  Quality control as a service is available through Professional Services (billed hourly).  

An upload is part of automated processing.  If there is a failure caused by document corruption or other issues, LEP does not and often cannot capture the error.  Expert human review for systematic problems is advisable.

Supported File Formats

Before uploading native files to a case, check the list of Supported File Types  

QC Steps for Standard Native Uploads 

To test for quality, select a random sample of the native documents uploaded and follow the steps below.

Check Original File Count.  From the Case->Add Case Documents page select an upload batch by title.

The upload batch will open in Browse and display the total number of files uploaded in order to confirm that all documents from the original batch were processed during the upload. 

Check Specific File Count (Container File Expansion). As part of the self-upload processing service, LEP expands container files such as PST, MSG, RAR and others.  The automated conversion process automatically converts Outlook PST and MSG files.  The original file count will expand because for each MSG, the email body and attachments are extracted, and then associated with the MSG to allow integrated viewing.  The email body is converted to PDF as are the email attachments.

For upload batches that contain numerous PSTs, MSGs or other container files, apply filters from the Browse or Search pages as shown below.  

The first screenshot shows the original file count before upload and file expansion: 14 files compressed into one ZIP file = 15 files. When compressing the original batch into ZIP or RAR files for uploading or when uploading PSTs, the main compressed file (container file) must be included in the final file count. 

The second screenshot gives the total file count from the original batch uploaded after upload and file expansion, including email attachments extracted and original PST messages (20 documents).

The third screenshot matches the original number of documents uploaded because a filter was applied to exclude the attachments (14 natives compressed to one ZIP file = 15 files).  Compare this number to the number of documents in the source for this upload batch.  If the numbers match, all documents were uploaded.

PST messages should always be compared with MSGs extracted after upload and file expansion.  The original PST file usually provides the total number of emails.  If the original PST is unable to provide that information or does not have it, there are programs that count the number of MSG files within a PST (i.e., ScanPST).  See Repairing PSTs for more information.  

Check Search Index. From the Document Viewer->Text or HTML tabs, verify the search index results for supported native files.  For natives that include text in the file, the text is indexed in the application search engine for full-featured search and retrieval. 

This tool is used to detect page numbers in the index that do not match those in the text.  This indicates there are data-related issues which should be addressed.  For example, the Browse page shows that a native word document has six pages, but when opened in the Doc Viewer->HTML page, there is only one page.  

Compare Native and PDF  

How to Select the Sample.  It is important to select different types of files as part of QC.  If processing consists of ten emails, five Word documents, and two PDFs, then QCing only the two PDFs will not QC the emails or Word conversions.  It is important to QC a large enough sample from the upload to ensure all file types will be reviewed.  

How to Compare Native to PDF.  It is important to verify that the Normalized PDF version of the document (displayed on the PDF tab) matches the original file.

Click the Original tab to display the original document in the Document Viewer or to have the option to download the document and open locally.  Whether the document displays directly in the Document Viewer or requires a download depends on the document type and Browser settings.  The corresponding application must be installed on the local computer in order to open and view the downloaded document.

Compare the Original and the PDF version of the document to confirm that the processing was done correctly.  See Native View for more information.  

What to Look for when Comparing Native to PDF.  Check that the subject, to, from, cc, bcc and body of the email are identical. Verify that any attachments shown in the native are listed under the email family in the Document Viewer.  Verify that the time stamp on the PDF and the native open in Outlook is offset correctly (Outlook converts all times to local time).  If the batch was uploaded with a CST time offset and the native is opened in the EST zone, the time shown in Outlook will be one hour ahead of the time shown in the Document Viewer.

Another way to check the document index is to open a document in the batch and copy text that is unique to a document.

Navigate to the Search page.  Paste the copied text into the search bar and click search (choose The exact phrase option).  If the document appears in the search results, the documents were indexed correctly.

Spot-checking Placeholder Files.  LEP converts files to PDF or TIFF files as part of ESI automated processing for supported file types.  A placeholder page is generated for a file that is not converted.  The placeholder cannot be Bates-stamped. See Placeholder for more information.  

Ways of Identify and Review Placeholder Files

From the Case->Add Case Documents page, Report hyperlink: After a batch has finished uploading, select Report to generate and view reports based on the following options:  Doc Count By Master Date, Doc Count By Extension, Doc Count Unsupported by Extension and Doc Count Failed To Convert by Extension.

As an example, click the Doc Count Unsupported by Extension hyperlink to go to the Case Assessment Report showing the number of unsupported documents uploaded for the batch, based on the original file extension.

Browse and Search pages->Filter: To view files flagged as Not Converted or Failed to Convert, apply filters on the fields Placeholder = Unsupported OR Failed To Convert. 

Working with Placeholder Files

Create a Native Production.  Non-converted files can be viewed in their native format.  A production will include the Bates-stamped PDFs in a sub-folder named PDF, and include the natives in a sub-folder named ORIGINALS.  See Production for more information.   

Manually convert the natives to PDF and re-upload.  Convert the files outside of LEP, then rename them in a way to make them easy to track, associate the metadata, and then re-upload the PDFs to the case.  See following example:

>Monthly Invoice Report.bak (original)

>Monthly Invoice Report File Converted. pdf (PDF version renamed)

Create a custom tag.  Use the Management->Manage Custom Doc Fields page to manually associate the files converted to the natives already uploaded by adding a coding section called Manually Converted, and then a coding field (check-box) called Checked.

Admin Users have the ability to transfer metadata from the originals to the PDF versions (e.g. Date Sent, From, To, Cc, Bcc, Subject, etc.).  See Upload Metadata for more information.  

If a small set of documents is converted, use the Multi-Doc Edit feature from the Browse or Search pages to edit most fields other than metadata.  See Shared Functions for more information.  

Remove Responsive and Privilege tags from the original files as they are being handled with the PDF versions re-uploaded.  If creating a production, remove the original files from the production since the manually converted versions will be included.

How often should quality control be applied?

Complete all quality control steps discussed above on each batch of native files self-uploaded.

Professional Services for Manual Conversions

We offer Project Management and Professional Services (billable hourly).