Assisted Review+

Introduction

The below procedure will guide you through using Lexbe's Technology Assisted Review ("TAR") feature, Assisted Review+. At the bottom of this page you will find a list of key terms, and there is also a brief overview that includes basic insights and FAQs that can be downloaded.

Procedure

Setting up an Assisted Review Job in the eDiscovery Platform

1. Within the eDiscovery Platform, select ‘Assisted Review’ under the ‘Discovery’ tab.

2. To start a new assisted review job, select ‘Create New Job.’ Add a job title and notes as desired, then select 'Create.'

3. Change the number of documents in your seed set and the number of documents in each control set by selecting ‘Edit,’ then ‘Update’ once complete.

Adding Documents to the Assisted Review Job

1. Navigate to the ‘Browse’ tab to find the documents you will be adding to this Assisted Review Job.

2. Apply any filters needed until you are viewing only the documents you wish to add to the Assisted Review job. This will typically be all of the documents in a case.

3. Click ‘Select All’ at the top of the ‘Browse’ list to select all documents.

4. On the left side of the screen, under the section ‘Assisted Review Jobs’, select the specific Assisted Review job from the drop down that you wish to add documents to.

5. Confirm the number of documents is correct and click ‘Add Selected Docs’ in the pop-up confirmation screen.

Reviewing/Training the Seed Set

After setting up your Assisted Review job, determining the size of your seed set, and adding documents to the job, it is time to review the seed set documents.

1. Navigate back to ‘Assisted Review’ under the ‘Discovery’ tab, and select your Assisted Review job from the drop down menu on the left.

2. Select ‘Next’ in the upper right corner of the screen to proceed to ‘2. Seed Set.’

3. Selecting ‘View’ takes users to the Browse screen which is populated by the documents contained in the seed set.

4. With the documents from the seed set populated in the Browse screen, the reviewer may begin their review by clicking on the Title of the first document which will open the document in the document viewer.

5. Navigate to the ‘DISC’ tab within the document viewer, ensure that the ‘Propagate Coding’ checkbox is NOT selected. The ‘Auto Advance’ checkbox may be selected if desired.


6. Select the applicable type of responsiveness under the ‘Coding’ tab. Please note, all documents must be coded as Responsive or Non-Responsive before moving to the Control Set step in the Assisted Review process.

7. Clicking 'Save' will save your changes, and, if ‘Auto Advance’ is selected, you will be advanced to the next document.

8. When you are finished coding the entire seed set, return to your Assisted Review job by choosing ‘Assisted Review’ under the ‘Discovery’ tab and selecting your Assisted Review job from the drop down menu.

Review Control Sets

1. Once all documents within the seed set have been coded, it is time to apply the algorithm to a control set.

2. Selecting ‘Next’ on the seed set display will automatically generate a control set.

3. When the assisted review algorithm has completed its automated coding of the control set, you will be advanced to the control set display. Here you will find your first control set, the number of documents it contains, and several columns containing Assisted Review metrics.

4. Selecting ‘Review’ will direct you to the Browse screen populated with documents from the first control set.

5. Click on the title to open the first document in the control set. Ensure 'Propagate Coding' is deselected, and review the Responsive or Non-Responsive coding that has been automatically applied by the Assisted Review+ algorithm.

6. If the coding applied by the algorithm is correct, check the 'Document Reviewed by Me' box and click Save. To overturn the coding applied by the algorithm, select the appropriate coding (i.e. Responsive or Non-Responsive only) and click Save to advance to the next document.

7. When you are finished reviewing a control set, select ‘Add Control Set’ to generate another control set to review. If the next control set is not immediately generated, simply refresh the page. **Do not click 'Next' as this will apply the algorithm to all documents prematurely.**

8. Continue reviewing control sets until the F-score has stabilized. Stabilization is an indication that the metrics used to evaluate the Assisted Review+ algorithm are unlikely be unaffected by the continued review of control sets. Once stabilization is reached, continuing to review control sets will serve only to reduce the margin of error associated with the F-score.

Apply Assisted Review+ to the Remaining Documents

1. Once the F-score stabilized, select ‘Next’ to apply the Assisted Review+ algorithm to the remaining documents in the collection.

2. After the remaining documents have been reviewed and coded by the algorithm, a report detailing the outcome of the Assisted Review+ job is automatically generated .

Viewing the Assisted Review Report

1. Select ‘Download’ on the report display.

2. Open AssistedReviewReport.xlsx in Excel.

Understanding Your Results

Following the application of Assisted Review+, an Assisted Review Report will be generated. This report is helpful in describing the procedures used to generate the computer assisted review results.

Assisted Review Report

The following is a breakdown of the key elements of the Assisted Review Report.

1. Assisted Review Case Information: This area of the report identifies the case name, assisted review job title, the date and time the assisted review job was completed, the email address associated with the user who ran assisted review, and any comments added to the report.

2. Assisted Review Graph: This chart is a visual representation of key assisted review metrics and results. The x-axis identifies the number of control sets that have been reviewed and the y-axis is a percentile measure (0=0% and 1=100%). Three lines appear on the graph: A blue line representing the F-score, and two red lines representing the upper and lower measures of the margin of error. This graph allows you to visualize how the margin of error converged on the stabilizing F-score as control sets were reviewed.

3. Predictive Coding Results: This section of the report quantifies the proportion of responsive and non-responsive coding through the stages of assisted review. The number and proportion of documents coded responsive and nonresponsive in the seed set are 28 (56%) and 22 (44%), respectively. The number and proportion of documents coded responsive and non-responsive in the control set are 40 (57%) and 30 (43%), respectively.

4. Predictive Coding Statistics: The last section of the report identifies the final F-score (93%), precision measure (87%), recall measure (100%), and margin of error (±18%). This section sets forth the final statistical measurables available to evaluate the outcome of the predictive coding process in your case.

Frequently Asked Questions

What content is reviewed by the Assisted Review algorithm?

Only the OCR content from the PDF version of the document is reviewed by the algorithm.

What is a stabilized F-score, and how will you know it has been reached?

The F-score is the harmonic mean of Precision and Recall. An F-score, or F1 Measure of 1.0%, represents perfect precision and recall. A stabilized F-score is reached when you receive a similar F-score each time you review a control set. You will want to receive a similar F-score across several control sets in succession to consider it stabilized. A sample of what a stabilized F-score looks like in LEP is as follows:

Is there a certain number of Control Sets that should be reviewed to achieve a stabilized score?

Stabilization is highly dependent on the data set. As such, there is no specific, or predetermined number of control sets that will provide you with a stabilized F-score. It’s possible that an F-score may stabilize at an undesirable value which would indicate that the data set is likely not appropriate for TAR.

Key Terms

Seed Set. The Seed Set is created by compiling a random sampling of documents from the entire set. The seed set is then reviewed to train the predictive coding algorithm which will be responsible for reviewing and coding the remainder of the case documents. The predictive coding outcomes are heavily determined by the accuracy of the seed set review. The size of the seed set is determined by the number of documents in the entire data set. The seed set should be between approximately 2,000 and 2,400 documents.

Control Set. Control Sets serve a quality control function. Documents in the control set are released by the number established at the time the Assisted Review job was created. Reviewers either confirm or overturn how the document has been coded (i.e. changing responsive to non-responsive or vice-versa). The process of confirming or overturning document coding is what generates the Precision and Recall metrics that are used to calculate the F-score.

Classifications. The below classifications will assist in the understanding of Precision and Recall.

Precision. The Precision metric measures how accurate the algorithm's predictions are. More specifically, it is the percentage of documents that were predicted to be responsive AND actually are responsive. Mathematically, this metric is generated by the following formula:

Precision =

Total True Positives

______________________________________

Total Predicted Positives

(Total Predicted Positives = True Positives + False Positives)

As such, frequently overturning responsive coding (i.e. identifying False Positives) will negatively affect the precision metric by lowering its score; however, while a high precision score does indicate accuracy, it does not necessarily mean all responsive documents have been identified. That determination is made by the Recall metric.

Recall. The Recall metric measures how complete the algorithm's predictions are. More specifically, it's the percentage of responsive documents classified correctly. Mathematically, this metric is generated by the following formula:


Recall =

Total True Positives

______________________________________

Total Actual Positives

(Total Actual Positives = True Positives + False Negatives)

As such, frequently overturning non-responsive coding (i.e. identifying False Negatives) will negatively affect the Recall metric by lowering its score. Generally, the number of False Negatives is an indicator of how effectively the front end manual review has trained the predictive coding algorithm.

F2 Score. F scores are determined by considering the precision and recall of the predictive coding algorithm. As previously mentioned, a low precision score indicates an abundance of false positive identifications, or over-delivery. Whereas, a low recall score is an indication of under-delivery. The Lexbe eDiscovery Platform uses an F2 score which equally weighs precision and recall.

Margin of Error. The margin of error is a statistical measure of uncertainty based on the possibility that the data sampled was not an accurate representation of the entire data set, assuming a normal distribution of documents. As the amount of data sampled increases, the margin of error is reduced. In assisted review, the margin of error decreases as more control sets are reviewed to verify that the algorithm correctly coded the documents. The margin of error should be interpreted along with the final F-score. For example, if there is a final F-score of 0.75 and a margin of error of ± 5% , then there is 95% certainty that the harmonic mean of the recall and precision in this instance of assisted review is between 0.7 and 0.8.

More Information

For more information about technology assisted review, please consult Lexbe Assisted Review: Background & Key Concepts.