Corrupt PDF Files
This technical note discusses how to detect and work with corrupt PDF files. Corruption is a data-related issue and occurs prior to upload to the Lexbe eDiscovery Platform (LEP).
Why do PDF Files Corrupt?
There are only a few PDF products available that actually create valid PDFs. The rest, freeware or homemade PDF creators, have flaws. The flaws often go undetected because PDF viewer applications detect and repair errors on the fly. The creator of the PDF may never know the PDF is corrupt. The fact that a software program can open or split a PDF does not mean it is not corrupt. Usually, Acrobat Pro can split files successfully.
PDF is a binary format. Most of its content is compressed. Editing a PDF file with a text-editor or transmitting a PDF in text mode instead of binary mode (e.g. FTP) corrupts the PDF. Partially transmitting a PDF file cuts off part of the document. The information cannot be recovered.
What does LEP do with corrupt PDF files?
See Placeholder File for more information.
How to Detect Corruptions?
From Browse or Search, select Fields->Show Fields->Placeholder column to display Placeholders.
PDFs with one page and more than 300 words and PDFs with many pages (thousands) often have corruption issues.
Unless the corruption directly impacts viewing the document, it is often ignored. If documents are being archived or must be good quality for other reasons, they can be analyzed using a PDF analysis tool (3-Heights™ PDF Analysis & Repair API). A simple way to test whether a document is valid or not is to open it in Adobe Acrobat Professional and close it again. If prompted to save the document, it can be an indication that the document was corrupt and the repaired document is now displayed to the user. This test does not provide information about what was repaired, nor does it always indicate corruption.
Can or Should the PDF be Repaired?
Whether a PDF requires repair is primarily dependent on the relevance of the document.
How can a PDF be Repaired?
Download corrupted PDFs to desktop and try the following options:
Print the file with a virtual driver.
Repair files with third party software.
Print the existing PDF to paper and re-scan in a PDF format.
Upon request, Professional Services (billable hourly) can attempt to convert the files manually.