Consistency Check (Near-Duplicates)

Consistency Check for Responsiveness, Privilege, Work-Product and Confidentiality

This technical note discusses identifying documents that potentially should be marked responsive, privileged, work-product or confidential, based on computer identification of near-duplicates.


Near-Duplicates contain contextually similar sentences or subject, but are not exact matches. They have significantly similar versions of documents that differ by, for example, a few sentences, words or paragraphs. The Lexbe eDiscovery Platform's Near Duplication technology identifies and groups documents that are at least 50% similar in text content.  See Near Duplication for more information.

Checking for Document Consistency after Applying Near Duplication

Near Duplication is a service provided by Professional Services.  Contact your Sales Consultant to request a quote.  Near deduplication will only mark near-duplicates, not delete them.  All existing files with similar contents will be grouped across the entire case (e.g. Group 1, Group 2, etc.)

The steps below show you how to check for Responsiveness, Privilege, Work-Product, and Confidentiality:

Open the Browse page, go to the Fields->Show Fields section and select the following headers: Extension, Subject, Senders, Receivers, Date Time Sent, Near Dup Group, Responsive, Work Product, and Privilege.

In the section Filter->Select Filter, apply filter on Extension = MSG and Near Dup Group->Show Near Dup Groups. This will display all emails within groupings.  Also, filter on Date Sent before (a date outside of range) and Date After.

Save and Share applied filters using the Filter Quick Links feature (e.g. Email Threads).  The saved and shared filter allows the user to access specific filters for further review and can be viewed by other users in the case. 

To narrow down the results and show only one specific Near Dup Group, apply filters by Near Dup Group No., for example 3472.

Sort by Master Date  and Near Dup Group No. to view emails within multiple groups.

Analyze and resolve, if appropriate, any inconsistencies in coding within a group.

Email Threading

See Email Threading for more information.  

How to Identify Large NearDup Documents Grouping 

See NearDup Grouping for more information.  

Mass Tagging Near-Duplicates

See Tagging Near Duplicates for more information.