81 Comments
- RocketGib, on 09/18/2008, -19/+92DELETE * FROM DiggDB WHERE UserName="Mr. Babyman";
- t0mmmmmmm, on 09/18/2008, -3/+53DELETE FROM submissions WHERE title REGEXP 'xkcd|apple|mc\s?cain|palin';
- Monkofdoom, on 09/18/2008, -0/+28We have about 1.8x to 2.5x the theoretical minimum number of lolcats required to ruin Digg
- Farhankhan, on 09/18/2008, -0/+24Is it Normalized? :p
- deletedtheory, on 09/18/2008, -1/+22I would love to see some of the queries that these clusters execute.
- GumdoMike, on 09/18/2008, -0/+15"the key, the whole key, and nothing but the key"
...So help me Codd - MAdaXe42, on 09/18/2008, -0/+15I'd hope not. Normalization is pretty much the arch-enemy of performance - it'll save you space, but more often than not if you want fast data access, you have to store things in an .... esoteric ... fashion.
- Navicerts, on 09/18/2008, -0/+13I was hoping for a database diagram!
I imagine there is a lot of data mining concerning the new "recommendation" feature? Do you have anything like "current trends", "trends by location", "trends by age or sex". Hmm how about trends by income bracket after linking it up to a geographical database with salary/location. Maybe some off-the-wall things as well; "people born in May are subject to conspiracy theories", exc.?
Man I would love to get my hands on the data; it would be a blast. It's all public info right? Where can we begin downloading! - GhandicapXRS, on 09/18/2008, -1/+13Please!!!! Do WANT!!!!
- pandaro, on 09/18/2008, -10/+21DELETE FROM DiggDB WHERE UserName="Mr. Babyman";
- Raydr, on 09/18/2008, -0/+11Because Oracle is slow, overly complex, and too expensive for the needs of most sites.
I work with MySQL, MSSQL and Oracle, and I get the best performance, by far, from MySQL with large recordsets (>10M records) - sfrench, on 09/18/2008, -1/+12Actually, many of the queries are quite simple. Joe mentioned it in his first tech blog post, but we actually have more denormalization than you would find in most databases. As a side effect of that, we do lot of primary key lookups, and as few joins and complicated queries as possible.
- orlyfactor, on 09/18/2008, -0/+9Makes perfect sense. If you have the space, why not keep it simple? I work for a very large German car company that still uses a mainframe running DB2 and the DBAs are all > 50 years old, and have never heard of surrogate keys. It makes me cry every time I have to write complex keys for Hibernate. They just don't understand how much easier it makes development/lookups.
- leemac, on 09/18/2008, -2/+11I love these articles! Gives you a glimpse of how a high-traffic site runs. Thanks!
- scaper, on 09/18/2008, -1/+9I laughed
"One of the most common problems on Digg systems is a spike in load, often caused by large news events like Apple announcements or hurricanes or… well, anything newsworthy" - aguynamedben, on 09/18/2008, -1/+8sorry to burst your bubble but aside from your syntax error, our users table is not called 'DiggDB' =)
- SSUK, on 09/18/2008, -4/+10DELETE FROM this thread WHERE is lame
Returned: All posts. - LukeD, on 09/18/2008, -0/+6I'd assume its because early on its free. Initially you deploy on a LAMP architecture because it's what you can afford and its what you are most familiar with. Its not like you open your site anticipating it being a runaway success, so you knock it together the best and cheapest way you know how. Then, down the line, it would have made sense to use a different architecture because you now are doing something much more complicated than you were when you set out, but the total cost of moving the whole thing across is much higher than making it work with what you have.
- BXRWXR, on 09/18/2008, -1/+7SELECT *
FROM C_C_C_COMBO_BREAKER - LukeD, on 09/18/2008, -0/+4where decent sized means "larger than anything 99% of us are going to play with anytime soon"
- philovivero, on 09/18/2008, -0/+4Are you suggesting that a cat that can approach a camera without appearing to move isn't newsworthy?
ARE YOU?! - MAdaXe42, on 09/18/2008, -0/+4Also, of course, the simpler and more generic you keep your queries, the easier it is to bring in a caching layer, such as memcached - have you guys experimented with anything along those lines at all?
- scaper, on 09/18/2008, -0/+4this setup isn't really anything out of the norm for someone running a decent sized site with a mysql backend...
- BXRWXR, on 09/18/2008, -1/+5Because it's free.
- shadowspawn, on 09/18/2008, -0/+4I'm the same way. I love doing that stuff for clients when they have a large database. I use one of the big berthas and just run an insane query on a copy and poof: People who live II state in X zip tend to purchase X more than people in Y zip on fridays. Yet people in J zip tend to purchase more than people in X zip in the months of August.
Oh by the way here's the historical temperature table merged with the dow and census data.
Anyway, he didn't answer the important question... how many servers? Kinda disappointed that once again the answer was skirted. - infamousjeff, on 09/18/2008, -3/+6lol 3NF i remember an ole professor who instilled mnemonics like the phrase "the key, the key, and nothing but the key"
a friendly man but his love for the Ada language made me suspicious. It wasn't long after that I got a job at Best Buy and began working for the CIA, although inadvertently.
I got into one little fight and my mom got scared . . . - StevenBullen, on 09/18/2008, -0/+3I have a life thanks! Its a shame that all the power diggers dont.
- flyer, on 09/18/2008, -0/+3"We have about 1.8x to 2.5x the theoretical minimum number" so you know how many you have. Don't lie and tell us how many machines do you have?
- barbaragordon, on 09/18/2008, -0/+3Be great to get some further insight into the algorithm. Would sure be nice to stop all the blog spam that's out there and focus more attention on the original sources of information.
- kgdoom, on 09/18/2008, -0/+3Actually you have it backwards. The hardware is less relevant because the design is good. And as a DBA I can say this sounds like a decent design.
- papastout, on 09/18/2008, -2/+5Being that I am learning SQL (i'm just a puppy with it) I totally ate this article up, and would like to say to TIm Ellis - "I am so not worthy, but I will be someday soon" Thanks for the submission, it's truly fascinating.
And what's with all that MrBabyMan slog? Does Digg have a hall of shame for trolls? Hmmm, that could be a new feature that could serve to 'out' troll groups like political operatives, x-tian hate mongers, etc...
...I'm just sayin - K4emic, on 09/18/2008, -2/+5UPDATE news SET url = '2grls1cp.swf' WHERE title LIKE '%Apple%'
That's what you get for being a *****. - Hortnon, on 09/18/2008, -0/+2But there is a threshold where the cost to do a ground-up rebuild makes the maintenance and performance costs for the future x% lower.
It really comes down to whether you want to do the work or not. - CSharpSauce, on 09/18/2008, -0/+2i've noted a significant performance enhancment with my database (7m records a week) when i switched from distributed partitions on SQL Server to Oracle Clusterware
- macros, on 09/18/2008, -0/+2The design allows them to throw more hardware at the problem, which is a good thing. They are able to scale their DB layer pretty much horizontally which allows you to throw more lower powered boxes at the problem rather than big honking machines.
This model makes it easy to get replacements in a hurry when things fail or the load spikes and you can ride the sweet spot on the price performance curve. Pricing does not scale linearly when talking about ram and i/o systems. - lazyfanatic, on 09/18/2008, -2/+4It's not free. It costs money to write that custom code and find people that are brilliant enough to do so.
- Hortnon, on 09/18/2008, -3/+5What I'd like to know is the decision behind using MySQL over something a bit more robust, like Oracle? A lot of the things they talk about having to manually account for and custom code are done automatically and probably a lot better in other RDBMS systems...
- rickcarson, on 09/19/2008, -0/+2DELETE FROM employees WHERE designer = "the moron who decided that the page should use less than 50% of the horizontal space available";
- rickcarson, on 09/19/2008, -0/+2DELETE FROM replies WHERE language = "leetspeak" OR language = "lolcats";
- markstory, on 09/18/2008, -1/+3So which one gives you the best performance Raydr? Didn't quite get it :)
- Tehrab, on 09/18/2008, -0/+2While I know very little of digg's methods, I am familiar with larger scale web farms and the DB queries tend to be just procedures that execute fairly simple statements.
The complexity is in the server architecture and replication methodology. - elbekko, on 09/18/2008, -0/+2That's alot less than I was expecting, so good job on the optimisations ;)
- K4emic, on 09/18/2008, -1/+2I, too, lol'd a little inside.
- Dubbsacc, on 09/18/2008, -4/+5DELETE *, huh? You might want to refresh yourself on the DELETE clause.
Most of these queries would fail anyways, single quotes people. - ultrafez, on 09/18/2008, -0/+1Parameterized queries ftw.
- Jernej, on 09/18/2008, -0/+1drop * from * ?
- Xanium4332, on 09/18/2008, -0/+1where 1
- RajAtWork, on 09/19/2008, -1/+2It is very sophisticated setup but keep in mind this is mostly read-only, non-transactional site. I wish there were more info on how to scale databases for one of those. I do that kind of thing and it is even more pain than this.
- philovivero, on 09/18/2008, -0/+1Yeah. Things are in a pretty chaotic denormalised form for pretty much performance reasons. What's interesting is that it pretty much guarantees data inconsistency, so that's another of the problems of Digg's databases.
Fortunately people are forgiving if it says their story has 1,018 Diggs, but 1,019 people show up as digging it. Probably more forgiving than if their bank claimed they had $1,018 when they should have $1,019. -
Show 51 - 82 of 82 discussions



What is Digg?
Digg is coming to a city (and computer) near you! Check out all the details on our