Repairing Corrupted MSG Files
(Self-Upload)
This technical note describes how to repair MSGs before upload to a case in the Lexbe eDiscovery Platform (LEP). This includes how to handle possible data corruption issues in these file types.
What are MSG Files
MSG is the file extension for the Microsoft Outlook and Exchange mail documents. MSG files are often included inside PST or OST files, which contain multiple MSGs and metadata, principally including sender, receivers, date and time sent, and subject. Date and time are recorded in Universal Time (UT, formerly GMT) and a local time offset in the viewing computer is used to display the email message in local time. An MSG file may be encoded in either binary or ASCII. The message body may be encoded in formats including plain text, RTF, Word and HTML (with graphic links). MSGs may alternatively represent calendar items, contacts, and reminders.
LEP will extract the files and attempt to convert them to PDFs as part of automated file processing. We do not automatically repair corrupted MSGs because it also involves manual intervention that might stall the processing queue.
For each MSG, selected metadata fields are extracted to associate with the MSG. A UT (GMT) offset that the user inputs at the time of upload is used to set the time of the email to a designated local time, rather than UT(GMT). The email body and attachments are extracted and then associated with the MSG to allow integrated viewing. The email body is converted to PDF, as are the email attachments. Email bodies are named with Date, Time, Subject, as follows:
YYYY-MM-DD HH:MM:SSPM - Email Subject Line
If any files cannot be converted to PDF, then a placeholder is generated if the conversion failure is detected. A native version of the body and attachments is accessible in the Original tab in the document viewer, and in the Original tab of an LEP Briefcase Download or Production. For the email body, a PDF version is generated in addition to the Original native version as many emails now are encoded in HTML with numerous graphic file links and a pure native HTML version might not display well as links to web-served graphic files may be broken and lead to a poor display if not converted to PDF.
How are the Email Bodies and Attachment Relationship Represented in LEP
After a self-upload is complete, several versions of an MSG email body and attachments are available for view in the Document Viewer.
For the email body, a PDF version is available in the PDF view, and text from the PDF is available in the Text view. A download link to the MSG is available in the Original view and extracted text from the MSG is available in the HTML view. When viewing an Email body in the Document Viewer, each attachment is available in the related document window and the same or a new Document Viewer (for a new Document Viewer right click and select New Window). When viewing an attachment, the other related attachments are available as well in the same fashion.
From the Browse or Search pages, MSGs may be viewed in isolation by filtering on File Extension = MSG. Also from the Browse page, you can see if documents are processed email attachments by showing the Is Attachment column.
If the uses wishes to see MSG email bodies or attachments in order, from Browse or Search, then either Bates numbers or Control numbers must be applied. This is because email MSGs are named with metadata information (see above) and attachments are named with the attachment file name, so a normal title sort puts the documents out of order. Control numbers can be applied to a case anytime. Bates numbers are applied to a case at the time of a production. In either instance, the Control or Bates numbers place the attachments in order after the MSG message body. So a Control number or Bates number sort from the Browse window allows MSG email bodies and their attachments to be reviewed in order.
Why Might MSGs Fail to Convert Properly
Like all native files (e.g., Word, Excel, PowerPoint, etc.) in processing, a certain number of MSGs may fail to convert properly. These reasons include MSG corruption and malformation. MSGs can be particularly prone to corruption, as many programs can convert files to MSGs and may do so incorrectly at times. There are many versions of MSGs in existence as Outlook has developed and some versions may not be supported by all programs reading or converting them. Also, the email body may be encoded in a number of different formats (e.g., HTML, Text in various formats, Microsoft Word in various formats, RTF), and this can lead to encoding and conversion problems and corruption. Finally, emails can corrupt in transit over the Internet or as part of file copying.
Examples of email failing to convert properly include:
>Email attachments failing to extract.
>Email metadata failing to extract and field correctly.
>Email bodies failing to extract.
Email Header Looks OK, but not the Body
The rendition of an email body in Outlook or another email viewing program, with To, From, Date, etc. at the top of the page, and the email body underneath may look like a standard Word document. However, in actuality the header fields are pulled from metadata and the view is constructed in the email viewer. If the body is corrupt and cannot be extracted, then the header may show but not the body. Often text can be extracted from a corrupt or partially corrupt email and viewed in the HTML tab in the LEP Document Viewer and is searchable.
It is possible that an email might display in a version of Outlook and not be extractable in LEP, and vice-versa. Or an email might be viewable in one version of Outlook and not another.
What Can Be Done with Outlook Emails that Fail to Convert
If an email fails to convert, it may be possible to download the original MSG that did not convert, open and convert to PDF locally with Outlook or another software or utility. In this case, the user can download the MSG from the Original tab of the Document Viewer or as a Briefcase download from Browse.
Additionally, for an email that has failed to extract and convert an email body, often the search index has been able to extract a text version. This will be unformatted and may be incomplete, but often has all needed data. This version is available in the HTML view and can be downloaded as text or saved to PDF with a local PDF print driver.
Convert Native MSGs to PDF Locally and Re-Upload
Alternatively, the user may download email MSGs from the Original view, convert the email body or attachments as needed to PDF locally using a third-party software, and then re-upload the PDF version of the email to LEP as detailed below:
1-From the Browse or Search pages, select the email and download the native version of the file from the Original tab. This will download the MSG, which includes the email body and attachments.
2-Convert the body or attachments as needed to PDF, using PDF saving or printing software. This can be done by Acrobat Pro and numerous PDF print driver utilities. The email body and attachments should be saved/printed to PDF separately.
3-Name the email body and any attachments consistently so that they group in a title suite. E.g.:
>2010-01-01 2:00:00AM Email Re Upcoming Meeting.pdf (this is the email body)
>2010-01-01 2:00:00AM Email Re Upcoming Meeting, Meeting Agenda.pdf (this is attachment 1)
>2010-01-01 2:00:00AM Email Re Upcoming Meeting, Attendance List.pdf (this is attachment 2)
4-Re-upload the converted PDFs to the the case and transfer any metadata needed from the original email body to the replacement (e.g. Date Sent, From, To, Cc, Bcc, Subject, etc.)
5-We recommend doing these one at a time to keep emails and attachments consistent.
Lexbe's Professional Services team provides assistance with manual conversion (billable hourly). Contact your Sales Consultant for a quote.