Sponsored by Microsoft
Microsoft responds to the headlines. view!
microsoft.com/everybodysbusiness - Read our developers' points of view on the headlines making news.
32 Comments
- Akaji, on 10/12/2007, -0/+20Whenever you add stories, if there is any 'like' content already on Digg (including similar titles, content descriptor, or URL), it warns you. The real issue is that people ignore the warning and post their dupe anyway.
- inactive, on 10/12/2007, -1/+14Cliffosaka ALWAYS ignores the warning!
In fact he seems to post a LOT of stuff that was posted by others mere hours before.
Sheer class!! - manitoba98xp, on 10/12/2007, -2/+12People should learn to spell _their_ comments correctly!
- pinetree, on 10/12/2007, -0/+8Akaji,
Yes, when you try to submit it does warn you about the dupes, but at that point you have already (hopefully) invested time in trying to compose a good article summary. It would be nice if the search engine was good enough to let you find out if there is a similar story before you waste your time. I've plugged in really obvious search terms for an article I found, and gotten no matches, only to find after writing a summary that the exact same story (from AP) was already submitted with a different URL -- very annoying. - stevesearer, on 10/12/2007, -1/+7I agree that they contribute, but it is really up to the submitter to be sure they are not duping a story. However, with digg search on the fritz lately, searching has become a nightmare.
- merreborn, on 10/12/2007, -0/+5If digg considers those to be seperate URLs, that's an error on digg's part.
- zzz@tkz, on 10/12/2007, -0/+5Another easy way to submit dupes is simply by adding a GET value to the end of a URL, works with anything, even the simple declaration of a GET variable work.
ex:
http://www.moonmac.com/Mormon_masturbation.html
http://www.moonmac.com/Mormon_masturbation.html?
http://www.moonmac.com/Mormon_masturbation.html?dupe=lol
http://www.moonmac.com/Mormon_masturbation.html?a=b&c=d
all will go to the same webpage, no problems.
Limitless possibilities, limitless dupes. - Nick22, on 10/12/2007, -1/+6First off, this guy actually doesnt provide any idea for a solution, he just says "clean up your urls". Secondly, there isnt really a way you can solve the problem. No matter what, someone can just post a dupe by adding # at the end
- DownSyndrom3, on 10/12/2007, -0/+4zompus,
it's not that. It's like Akaji said. They just ignore the duplication warnings. - milomilomilo, on 10/12/2007, -0/+3I think it has gotten to a point where people complain far too much about duping.
Yes there may be some people who have seen the story before, but honestly the very argument against it is what makes it mean nothing.
Obviously enough people haven't seen the article that it makes it to the front page. To the reader, dupes can be a good thing.
Unless you spend every waking minute on digg, and memorize everything you read, dupes usually aren't noticed.
I have dugg a number of articles that , although there are people bitching and moaning that it's been posted before, the story may have 500 diggs. Thats a minimum of 500 people who didn't see this story.
I have a job, and some social life, and when i come onto digg more than likely i have missed a number of stories while I was gone. So I have personally never witnessed a dupe.
Chill out people. It's just a news site. - dyanacek, on 10/12/2007, -1/+3This article is completely ridiculous. How does "cleaning up your urls" change anything from digg's perspective? Like zzz said, you can add GET variables to the end of any URL. Digg can't ignore GET variables, because it could be important for the page (eg http://crapnews.com/displayDumbArticle.php?articleID=12345)
The point is that this isn't a content website's problem - it's the digg/discussion site's problem. If anything, content sites WANT their stories duped on digg so they get more hits. Other posts have suggested ways to solve the problem on digg's end, and there are plenty of additional heuristics that could be used. - merreborn, on 10/12/2007, -1/+3One potential solution would be for digg to come up with some sort of "thumbprint" hash of a site's content.
In the example given, retrieving all three URLs returns the exact same text. One would think that it should be fairly trivial to come up with a hashing algorithm that generates similar values for pages that have identical or nearly identical content. - MrBobDobolina, on 10/12/2007, -3/+5Hahahaha! It's a duplicate story!!!
... get it? ... duplicate story?... a story about duplicates?....
..okay, I'll shut up now... - merreborn, on 10/12/2007, -0/+2Digg should ignore anything after a # in a URL, if it doesn't already. The browser throws out the # when making the HTTP request, and interprets it itself after receiving the page. Any mutation of index.html#anything retrieves the exact same HTML, on any server.
The potential for gaming is in URL query strings -- ?a=1&b=2 might become ?b=2&a=1, or ?a=%31&b=2, or ?a=1&b=2&c=3 (assuming that the site simply ignores the value of c) or any combination thereof. - dreamlayers, on 10/12/2007, -0/+2I am far more concerned with blogspam than with duplicate stories. Following these guidelines does nothing about blogspam, and anyone who wants to submit duplicates can still have as many URLs as they want pointing to a story. I think Digg should try to do something about this. Perhaps hashes of excerpts of text (not including formatting) can be stored in a database when a story is submitted. Later these hashes can be used to see if a story is a duplicate.
- sbrickner, on 10/12/2007, -0/+2Well, the first two, anyway. The third is on a different host name.
- dyanacek, on 10/12/2007, -0/+2edit: dupe comment. oh the irony.
- inactive, on 10/12/2007, -0/+2With dupe content and comment spamming, Web 2.0 is really amazing!
- {{sPaz}}, on 10/12/2007, -0/+1As a content provider, my answer (if anyone asked me), would be two-fold:
1. This is not my site's problem - it's digg's. It's probably just not good enough at finding duplicate urls.
2. Why would I care about creation of _less_ dupes? More dupes mean I get more of a chance to get to the front page and people might re-submit my site over and over so that my content still remains fresh and therefore at the top. - jull1234, on 10/12/2007, -0/+1But what we're really saying here is that dan9876's name is Ryan?
- 13thfloor, on 10/12/2007, -0/+1I think there should be another 'dupe' button. The same story can be available from several websites. As long as they are not all cut and paste from AP articles, they are not technically dupes, but the same story from different sources.
But that's not really what this particular article is about or even the other 'dupe' comments, but since we're on the topic... - spiderland, on 10/12/2007, -0/+1Nice article, but it doesn't address the blog spam, which is probably more at fault for dupes these days.
- kingkong118, on 10/12/2007, -0/+1BOOOOOOOOOOOOOOOOOOOOOOOORING
- bioskope, on 10/12/2007, -0/+1when you have topics that have been posted just hours before being duped then it does kind of irritate you. Spend more time on digg and you will realize this.
I for one think the dupe police is a necessary evil because there are jackasses who try to post the sensational news from different urls in spite of the fact that the original had been posted mere moments before. Because of the constantly growing nature of digg chances are not so high about a majority of em having seen both the stories. But the fact is its a dupe. Whether done by accident or not , it deserves to be called.
Of course the other side of the coin would be the very irritable and commonly seen replies in digg videos section which include "This was funny 2 years ago when it first came out, not anymore" or "I saw this last year, So it shouldnt be on digg" . Now those I find to be irritating too - Miso117, on 10/12/2007, -0/+1Vote Bush in '08
Sign up here http://www.bush=palpatine.net
not. - liquidizer, on 10/12/2007, -0/+1Sites that tell others to re-architect their URLs should probably check the basics themselves, such as whether their comment form works or not.
- podgey22, on 10/12/2007, -7/+7It has nothing to do with the website owners.
Believe me when I say we developers have enough problems when it comes to URL structures without whiney bitches complaining about some noob taking a version of the URL with a random querystring on the end and submitting that. It's not our fault.
If somebody wants to do something about it, start a movement to have people put unique identifiers in the HTTP headers to describe pages. Otherwise, shaddup and blame the nonces that ignore the duplicate story warning.
Buried under "I need my blankie" - Browzer, on 10/12/2007, -0/+0Some webmasters actually like the dupes. It gives them a second chance to submit a site. Let's say you submit www.abc.com to Digg. If that doesn't do well, wait a couple weeks and submit www.abc.com/index.php.
It's also possible to karma-wh0re off the system, if you know what you are doing. Whenever a a story gets popular because it points to an image, you can re-submit the story a couple weeks later by linking directly to the image.
I'm not saying you should do these things. I'm just telling you how the game is played.
- dyanacek, on 10/12/2007, -1/+1edit: argh, sorry. This was one of my first comments, and I'm clearly having trouble
- huffman, on 10/12/2007, -0/+0Checking only URLs won't do any help. To prevent duplicate submissions, only comes to my mind is checking content of page, maybe CRC? But, there is one problem that when page is updated.
- m242, on 10/12/2007, -4/+3You know what's funny?
This guy's very site has pointers to that very page as:
http://www.themulife.com/?p=651 (the "real" url)
and
http://www.themulife.com/?p=651#more-651 (the url from his Digg categories page)
and
http://themulife.com/?p=651#respond (the url for "Comments" from his Digg categories page)
Oops. - chucali, on 10/12/2007, -6/+1that was the most boring thing I've ever read


What is Digg?