Digg Gets Dupe Detection Updates
The problem of similar stories floating on Digg has been around for quite some time and it poses problems on the Digg ecosystem as Digg is unable to differentiate these similar stories. But now it might just have the solution for it. Digg has now released some major updates to its dupe detection technology to eliminate duplicate submissions. Here’s how it works:
To better understand the nature of the problem, we analyzed the types of duplicate stories being submitted. Most common are the same stories from the same site, but with different URLs. Our R&D team came up with a solution that identifies these types of duplicates by using a document similarity algorithm. Look for a separate tech blog post on how this works, but it has proven to be a reliable way of identifying identical content from the same source.
There’s a length post on the Digg blog explaining all the major updates to Digg. You might want to read it if you are interested about this update.