A simple library that can search efficiently for near-duplicates in massive sets, libraries, or databases with almost any kind of texts
Try it now!You can start comparing documents and get results in 3 steps
It's typed and well-designed with SOLID principles in mind
A handful of tests are run on every new commit or release
Suitable for big sets of text documents
As independent creators, we love creating cool things like websites, web apps, tools, templates, articles, and more. We've merged our passions for coding and writing with the goal of helping other people find prosperity through coding just like us.
So what are we doing about it?
We are in a constant search for stuff that can be created once and then given away repeatedly;
We create and support apps that can replace unreasonably expensive alternatives;
We make profits by offering affordable upgrades, support, affiliate products, and other goodies;
We build our apps and our business in public for maximum transparency;
We believe that the ongoing digital transformation must be accessible to all.
You can find duplicate or near-duplicate documents in 3 steps if you don't need to process the texts in other ways before comparing them.
const {makeDuplicatesFinder} = require('near-duplicate-docs');
//Step 1: Create an object instance
const finder = makeDuplicatesFinder({
minSimilarity: 0.75,
shinglesSize: 5,
shinglesType: "word",
signatureLength: 100,
rowsPerBand: 5,
});
//Step 2: Pass the documents' ids and texts
finder.add(document1.id, document1.text);
finder.add(document2.id, document2.text);
finder.add(document3.id, document3.text);
finder.add(documentN.id, documentN.text);
//Step 3: Initiate a search
const duplicates = finder.search();
console.log(duplicates);
//Result
{
document1: [[0.95, "document3"]],
documentN: [[0.76, "document2"], [0.80, "document3"]]
}
const {makeAsyncDuplicatesFinder} = require('near-duplicate-docs');
//Step 1: Create an object instance
const finder = makeAsyncDuplicatesFinder({
minSimilarity: 0.75,
shinglesSize: 5,
shinglesType: "char",
signatureLength: 100,
rowsPerBand: 5,
});
const promises = [];
//Step 2: Pass the documents' ids and texts
promises.add(finder.add(document1.id, document1.text));
promises.add(finder.add(document2.id, document2.text));
promises.add(finder.add(document3.id, document3.text));
promises.add(finder.add(documentN.id, documentN.text));
//Step 3: Initiate a search, when all texts are added
Promise.all(promises)
.then(() => finder.search())
.then(duplicates => console.log(duplicates))
.catch(errors => console.log(errors));
//Result
{
document1: [[0.95, "document3"]],
documentN: [[0.76, "document2"], [0.80, "document3"]]
}
Get fun and check out some of the latest articles in the Stream; they are free and publicly available, so click on the one that picks your interest and you can start reading now.