Wednesday, April 28, 2010

Inveno - Near Dupe Detection

Inveno identifies and groups near-duplicates and was developed with one goal in mind… simplify document reviews.

While it is common practice today to de-duplicate in electronic discovery to reduce the amount of data that needs to be reviewed, the process of de-duplication looks at the metadata as well as the content of the document but leaves behind many non-exact documents. This results in an over abundance of “Near-Duplicate” documents that are virtually the same with minor changes in content, formatting or metadata.

Inveno examines the content of each document with respect to the size, number of lines, and the key words to determine duplicates within the match thresholds setup by the user.

The output is a CSV report that contains the document sets or sort groups (documents that are near-duplicates of each other), along with other information such as the master document within the group, against which all the other documents are compared. This report can be imported into any Litigation Support tool such as Concordance, Summation, Law, etc. for review.

Inveno can be run against OCR text from scanned documents, or on extracted text, and the thresholds for determining near dupes can be adjusted automatically based on the source files.

  • Multiple client machines can be used to load documents
  • Text files can be at Page or Document level. Page level text files require a load file to identify document breaks
  • Projects can be broken down into volumes
  • Near Dupe processing across volumes. Control to select specific volumes to run near-dupe against.
  • Customized report formats available via Kensium
  • Manual review and tagging of documents to identify exact dupes if needed.

Inveno can be deployed at your Litigation Support organization, or offered as a service via Kensium.

No comments:

Post a Comment