Find similar documents with ease and accuracy

A simple library that can search efficiently for near-duplicates in massive sets, libraries, or databases with almost any kind of texts

Try it now!

99%
Quality Score
100%
Maintenance Score
0.03
Popularity Score
1052
Package Downloads

All the cool stuff you can do with modern TypeScript & React

Our Goal

To Help JS Developers Build Amazing And Profitable Stuff

As independent creators, we love creating cool things like websites, web apps, tools, templates, articles, and more. We've merged our passions for coding and writing with the goal of helping other people find prosperity through coding just like us.

So what are we doing about it?

We are in a constant search for stuff that can be created once and then given away repeatedly;
We create and support apps that can replace unreasonably expensive alternatives;
We make profits by offering affordable upgrades, support, affiliate products, and other goodies;
We build our apps and our business in public for maximum transparency;
We believe that the ongoing digital transformation must be accessible to all.

near-duplicate-docs

Exactly how easy is it to use the library?

You can find duplicate or near-duplicate documents in 3 steps if you don't need to process the texts in other ways before comparing them.


const {makeDuplicatesFinder} = require('near-duplicate-docs');

//Step 1: Create an object instance
const finder = makeDuplicatesFinder({
                                    minSimilarity: 0.75,
                                    shinglesSize: 5,
                                    shinglesType: "word",
                                    signatureLength: 100,
                                    rowsPerBand: 5,
                                });




//Step 2: Pass the documents' ids and texts    
finder.add(document1.id, document1.text);
finder.add(document2.id, document2.text);
finder.add(document3.id, document3.text);
finder.add(documentN.id, documentN.text);


//Step 3: Initiate a search
const duplicates = finder.search();

console.log(duplicates);

//Result

{
    document1: [[0.95, "document3"]],
    documentN: [[0.76, "document2"], [0.80, "document3"]]
}


const {makeAsyncDuplicatesFinder} = require('near-duplicate-docs');

//Step 1: Create an object instance
const finder = makeAsyncDuplicatesFinder({
  minSimilarity: 0.75,
  shinglesSize: 5,
  shinglesType: "char",
  signatureLength: 100,
  rowsPerBand: 5,
});


const promises = [];

//Step 2: Pass the documents' ids and texts   
promises.add(finder.add(document1.id, document1.text));
promises.add(finder.add(document2.id, document2.text));
promises.add(finder.add(document3.id, document3.text));
promises.add(finder.add(documentN.id, documentN.text));

//Step 3: Initiate a search, when all texts are added
Promise.all(promises)
  .then(() => finder.search())
  .then(duplicates => console.log(duplicates))
  .catch(errors => console.log(errors));

//Result

{
  document1: [[0.95, "document3"]], 
  documentN: [[0.76, "document2"], [0.80, "document3"]]
}

Try it now!

Latest Articles

Get fun and check out some of the latest articles in the Stream; they are free and publicly available, so click on the one that picks your interest and you can start reading now.