Outlook Email Conversion
This technical note describes how Outlook PST and MSG are converted when processed in the Lexbe eDiscovery Platform (LEP) and data corruption issues encountered with these file types.
What are PST Files?
PST (Personal Storage Table) is a Microsoft proprietary file format used to store copies of email messages (MSGs), calendar events, and other items for use in Microsoft Outlook and Exchange. A related Microsoft file format is OST (Offline Storage Table). The PST file format is widely used as a delivery format (sometimes after conversion) in eDiscovery, even if an email was originally stored in another format.
What are MSG Files?
MSG is the file extension for Microsoft Outlook and Exchange mail documents. MSG files are often included inside PST or OST files. An MSG file may be encoded in either binary or ASCII. The message body may be encoded in formats including plain text, RTF, Word, and HTML (with graphic links). MSGs may also represent calendar items, contacts, and reminders.
How Does LEP Process PSTs and MSGs?
LEP runs an automated PST repair program against all PSTs uploaded for conversion. PSTs are often partially corrupted and this automatic conversion generally increases the quality of MSG extraction.
MSGs can also be imported individually or from another supported container (e.g., ZIP).
For each MSG, selected metadata fields are extracted to associate with the MSG. Upon upload, the user may set the time of the email to a designated local time rather than UT(GMT). The default offset is UT. The email body and attachments are extracted and associated with the MSG to allow integrated viewing. The email body is converted to PDF, as are the email attachments.
Email bodies are named with Date - Time - Subject, as follows:
YYYY-MM-DD HH:MM:SSPM - Email Subject Line If files cannot be converted to PDF, a placeholder is generated. A native version of the body and attachments is accessible under the Original tab in the Document Viewer, and in the Original tab of a Briefcase Download or Production. A PDF version is generated for the email body.
How are Email Bodies and the Attachment Relationship Represented in LEP?
After processing, several versions of an MSG email body and attachments are available for view in the Document Viewer.
For the email body, a PDF version is available in the PDF tab and text from the PDF is available in the Text tab. A download link to the MSG is available in the Original tab and extracted text from the MSG is available in the HTML tab.
When viewing an email body in the Document Viewer, each attachment is available in the Email Family display and may be opened from that display.
From the Browse or Search pages, MSGs may be viewed in isolation by filtering on File Extension = MSG. Also from the Browse page, view email attachments by showing the Is Attachment column.
A Control Number or Bates number sort from the Browse window allows MSG email bodies and their attachments to be reviewed in order. Control numbers can be applied to a case at any time. Bates numbers are applied to a case at the time of a production.
Why MSGs Might Fail to Convert Properly?
During processing, as with all native files (e.g., Word, Excel, PowerPoint, etc.), a certain number of MSGs may fail to convert properly. The reasons include MSG corruption and malformation. Emails can corrupt in transit over the Internet or as part of file copying.
Examples of email failing to convert properly include:
Email attachments failing to extract.
Email metadata failing to extract and field correctly.
Email bodies failing to extract.
Email header fields are pulled from metadata and the view is constructed in the email viewer. If the body is corrupt and cannot be extracted, the header may show but not the body. Often the text from a corrupt email can be extracted, viewed, and searched from the HTML tab of the Document Viewer.
It is possible for an email to display in a version of Outlook and not be extractable in LEP. The reverse is also true. An email can sometimes be viewed in one version of Outlook and not another.
Convert Native MSGs to PDF Locally and Re-Upload
The user may download MSGs from the Original tab, convert the email body or attachments as needed to PDF locally using a third-party software, and then re-upload the PDF version of the email to LEP as follows:
From the Browse or Search pages, select the the email to download, download the native version of the file from the Original tab. This will download the MSG, which includes the email body and attachments.
Convert the body or attachments as needed to PDF, using PDF saving or printing software. This can be done using Acrobat Pro or another PDF print driver utility. The email body and attachments should be saved/printed to PDF separately.
Name the email body and any attachments consistently so that they group in a title suite. E.g.:
2010-01-01 2:00:00AM Email Re Upcoming Meeting.pdf (email body)
2010-01-01 2:00:00AM Email Re Upcoming Meeting, Meeting Agenda.pdf (Attachment 1)
2010-01-01 2:00:00AM Email Re Upcoming Meeting, Attendance List.pdf (Attachment 2)
Re-upload the converted PDFs to the case, transfer metadata needed from the original email body to the replacement (e.g. Date Sent, From, To, Cc, Bcc, Subject, etc.)
Best practice is to do one email at a time.