Sponsored by Best Buy
Best Buy Employees Turn Carolers For A Day view!
www.youtube.com/bestbuy - Go behind the scenes to see real employees croon their way to star in Best Buy’s holiday campaign.
43 Comments
- hughtopia, on 02/13/2009, -4/+40Classic, an article on scaling crashes the web server it's running on.
- richardhenry, on 02/13/2009, -3/+29BREAKING NEWS: HIGHSCALABILITY.COM UNABLE TO SCALE
- grammarpolice, on 02/13/2009, -2/+21 Joe Stump, Lead Architect at Digg, gave this presentation at the Web 2.0 Expo. I couldn't find the actual presentation, but fortunately Kris Jordan took some great notes. That's how key moments in history are accidentally captured forever. Joe was also kind enough to respond to my email questions with a phone call.
In this first part of the post Joe shares some timeless wisdom that you may or may not have read before. I of course take some pains to extract all the wit from the original presentation in favor of simple rules. What really struck me however was how Joe thought MemcacheDB Will be the biggest new kid on the block in scaling. MemcacheDB has been around for a little while and I've never thought of it in that way. Well learn why Joe is so excited by MemcacheDB at the end of the post.
Impressive Stats
# 30 million users.
# 13,000 requests a second.
# Lots of servers.
Scaling Strategies
# Scaling is specialization. When off the shelf solutions no longer work at a certain scale you have to create systems that work for your particular needs.
# Web 2.0 sucks for scalability. Web 1.0 was flat with a lot of static files. Additional load is handled by adding more hardware. Web 2.0 is heavily interactive. Content can be created at a crushing rate.
# Languages don't scale. Bottlenecks aren't in the language when you are handling so many simultaneous requests. Making PHP 300% faster won't matter.
# Don’t share state. Decentralize. Partitioning is required to process a high number of requests in parallel.
# Scale out instead of up. Expect failures. Just add boxes to scale and avoid the fail.
# Database-driven sites need to be partitioned to scale both horizontally and vertically. Horizontal partitioning means store a subset of rows on a different machines. It is used when there's more data than will fit on one machine. Vertical partitioning means putting some columns in one table and some columns in another table. This allows you to add data to the system without downtime.
# Build a data access layer so partitioning is hidden behind an API.
# With partitioning comes the CAP Theorem: you can only pick two of the following three: Strong Consistency, High Availability, Partition Tolerance.
# Partitioned solutions require denormalization and has become a big problem at Digg. Denormalization means data is copied in multiple objects and must be kept synchronized.
# Use an asynchronous queuing architecture. Issuing 5 synchronous database requests slows you down. Do them in parallel. See Flickr - Do the Essential Work Up-front and Queue the Rest and The Canonical Cloud Architecture for more information.
# Run numbers before you try and fix a problem to make sure things actually will work.
# Files like for icons and photos are handled by using MogileFS, a distributed file system. DFSs support high request rates because files are distributed and replicated around a network.
# Cache forever and explicitly expire.
# Cache fairly static content in a file based cache.
# Cache changeable items in memcached
# Cache rarely changed items in APC. APC is a local cache. It's not distributed so no other program have access to the values.
# For caching use the Chain of Responsibility pattern. Cache in MySQL, memcached APC, and PHP globals. First check PHP globals as the fastest cache. If not present check APC, memcached and on up the chain.
# Digg's recommendation engine is a custom graph database that is eventually consistent. Eventually consistent means that writes to one partition will eventually make it to all the other partitions. After a write reads made one after another don't have to return the same value as they could be handled by different partitions. This is a more relaxed constraint than strict consistency which means changes must be visible at all partitions simultaneously. Reads made one after another would always return the same value.
MemcacheDB: Evolutionary Step for Code, Revolutionary Step for Performance
Imagine Kevin Rose, the founder of Digg, who at the time of this presentation had 40,000 followers. If Kevin diggs just once a day that's 40,000 writes. As the most active diggers are the most followed it becomes a huge performance bottleneck. Two problems appear.
You can't update 40,000 follower accounts at once. Fortunately the queuing system we talked about earlier takes care of that.
The second problem is the huge number of writes that happen. Digg has a write problem. If the average user has 100 followers that’s 300 million diggs day. That's 3,000 writes per second, 7GB of storage per day, and 5TB of data spread across 50 to 60 servers.
With such a heavy write load MySQL wasn’t going to work for Digg. That’s where MemcacheDB comes in. In Initial tests on a laptop MemcacheDB was able to handle 15,000 writes a second. MemcacheDB's own benchmark shows it capable of 23,000 writes/second and 64,000 reads/second. At those write rates it's easy to see why Joe was so excited about MemcacheDB's ability to handle their digg deluge.
What is MemcacheDB? It's a distributed key-value storage system designed for persistent. It is NOT a cache solution, but a persistent storage engine for fast and reliable key-value based object storage and retrieval. It conforms to memcache protocol(not completed, see below), so any memcached client can have connectivity with it. MemcacheDB uses Berkeley DB as a storing backend, so lots of features including transaction and replication are supported.
Before you get too excited keep in mind this is a key-value store. You read and write records by a single key. There aren't multiple indexes and there's no SQL. That's why it can be so fast.
Digg uses MemcacheDB to scale out the huge number of writes that happen when data is denormalized. Remember it's a key-value store. The value is usually a complete application level object merged together from a possibly large number of normalized tables. Denormalizing introduces redundancies because you are keeping copies of data in multiple records instead of just one copy in a nicely normalized table. So denormalization means a lot more writes as data must be copied to all the records that contain a copy. To keep up they needed a database capable of handling their write load. MemcacheDB has the performance, especially when you layer memcached's normal partitioning scheme on top.
I asked Joe why he didn't turn to one of the in-memory data grid solutions? Some of the reasons were:
# This data is generated from many different databases and takes a long time to generate. So they want it in a persistent store.
# MemcacheDB uses the memcache protocol. Digg already uses memcache so it's a no-brainer to start using MemcacheDB. It's easy to use and easy to setup.
# Operations is happy with deploying it into the datacenter as it's not a new setup.
# They already have memcached high availability and failover code so that stuff already works.
# Using a new system would require more ramp-up time.
# If there are any problems with the code you can take a look. It's all open source.
# Not sure those other products are stable enough.
So it's an evolutionary step for code and a revolutionary step for performance. Digg is looking at using MemcacheDB across the board. - bdickason, on 02/13/2009, -1/+11Great great great article. Not enough content is being posted about relevant scaling solutions. Big players like Facebook have started to open the door and explain how they went from 1k-10k-100k simultaneous users, and it's nice to see digg following suit :)
- mcprogrammer, on 02/13/2009, -0/+8It is evil in the same way that goto is evil -- don't do it unless you have a good reason that outweighs the downsides. In this case scalability. Almost everything in programming/CS involves a trade off.
Basically, rules are made to be broken, but only if you understand the consequences. Programming needs good judgment. - DigitalisAkujin, on 02/13/2009, -0/+8Memcache is freakin' awesome! We use on www.artician.com for pretty much everything. When you can cut 99.9% of all the SELECT queries down to a single SELECT query nothing else is worth even talking about.
As for who uses memcache?
Wikipedia
Facebook
Livejournal (They made it) - chuckDontSurf, on 02/13/2009, -0/+6WTF does Gandalf have to do with this?
- inactive, on 02/13/2009, -4/+9Shaking the Blues: HIGHFAILABILITY.COM UNABLE TO FAIL
- calebrown, on 02/13/2009, -2/+7Joe Stump is the man.
- bjoernz, on 02/13/2009, -1/+5unable to connect to database... how ironic ;)
- mmastrac, on 02/13/2009, -0/+4On a blog dedicated to high scalability... FTW
- inactive, on 02/14/2009, -0/+3Read the link. Serving up a billion web pages a month on 1 web server + 1 db server and no full-time admins (while sites of similar number of users run 100+ servers and 10+ sysadmins) is pretty good scaling. Since the article was written (2006) the site has grown to #13 in the US using the same platform, still with 0 full-time admins: http://plentyoffish.wordpress.com/2008/12/20/2008- ...
- amoro99, on 02/13/2009, -0/+3Unable to connect to database server
If you still have to install Drupal, proceed to the installation page.
If you have already finished installed Drupal, this either means that the username and password information in your settings.php file is incorrect or that we can't connect to the MySQL database server. This could mean your hosting provider's database server is down.
The MySQL error was: User highscal_admin already has more than 'max_user_connections' active connections.
Currently, the username is highscal_admin and the database server is localhost.
* Are you sure you have the correct username and password?
* Are you sure that you have typed the correct hostname?
* Are you sure that the database server is running?
For more help, see the Installation and upgrading handbook. If you are unsure what these terms mean you should probably contact your hosting provider.
PWNT - Stradenko, on 02/13/2009, -0/+3memcache is not memcachedb
- sten0257, on 02/13/2009, -2/+5Dugg for being way over my head.
- markstory, on 02/14/2009, -0/+2Worst drupal theme ever. What the heck is going on with the backpackers, and just plain wrong blue border at the top. This is a design fail. Good article though.
- brainnovate, on 02/13/2009, -1/+3That was before the web... I am guessing. Or your teachers have never had to scale a web app.
- TDDebug, on 02/13/2009, -0/+2If the website hadn't failed there'd actually be intelligent comments in this article... but instead you're stuck reading crap like this comment.
- wolfing, on 02/13/2009, -1/+3Things change. I remember my teachers saying that denormalization was evil lol
- cloudberries, on 02/13/2009, -0/+2Whu...?
- jellygraph, on 02/14/2009, -0/+2Well, ultimately you get down to the basics. Cache cache cache everywhere and anywhere you can. Buy lots of servers and fill them up with as much memory as you can and partition your servers using virtual machines. As the article states, use distributed Memcache (and use as much spare memory as you can squeeze free from any of your servers) and MogileFS where you can. Use a proxy, like Varnish, to serve as much as it can without touching your application logic or database. Use something like Sphinx for fast full text searching. Watch out for stupid mistakes, like using regex in MySQL (MySQL's query cache doesn't seem to cache them well) and laying out your database well (although caching queries in Memcache goes a long way). And, most importantly, install nagios or munin or something similar, so when you are running into problems, you can better diagnose whats going on (especially if you partition your systems using virtual machines, this can be a great help). There are many more things and it's really about not being afraid to getting your hands dirty when a problem occurs. this is my experience.
- covertbadger, on 02/13/2009, -1/+3They could save some of those requests per second by sorting out their comments. At the time of writing this page has 24 comments and needs 78 GETs to display. 78!
And when I load up a busier page (500 comments) it makes 358 requests. To load a frickin comments page!
This is because I have my preferences set to expand all comments, but the system behaves as if I was manually expanding those comments and so makes literally hundreds of ajax requests. This is nuts. There should be a complete cached copy of the page, expired every couple of minutes, so that when people with 'view all comments' option set visit the page it doesn't result in hundreds of requests and god knows how many DB hits. There can be some client-side post-processing to mark up the comments made by the user and their friends, and what's been dugg etc.
No wonder they have 13K hits per second if every page reload is 350 hits. Jeez. - DigitalisAkujin, on 02/13/2009, -0/+2We use both
- wolfing, on 02/13/2009, -0/+2that was about 1989, yes I'm an old digger. Back then when you said "Check in the web" you were probably looking for fish :)
- Bdog2g2, on 02/13/2009, -2/+4Awe, not making it to the front page as much as you'd like?
- xprojects, on 02/13/2009, -1/+3so wrong....
- thomashallock, on 02/13/2009, -0/+2This is hands-down the funniest experience I've had on Digg in ages.
- xprojects, on 02/13/2009, -1/+2Memcache - absolutely. Everyone I know running a major web application keeps telling me how important memcache servers are. It makes perfect sense, caching is the key to all scalability.
- xprojects, on 02/13/2009, -1/+2Many smaller queries is faster than one big query, in my experience. A lot easier to cache a single comment than to cache an entire page full of them - the one comment rarely changes, the full page is ever-changing.
The whole idea of memcache is to reduce DB hits because it's cached. - Koadan, on 02/14/2009, -1/+2Digg is bloated and doesn't have enough APIs.
- stevenhendee, on 02/14/2009, -0/+1Did a screen capture of the ironic image from highcapacity.com
http://digg.com/hardware/Digg_Crashes_Highscalabit ... - chiax, on 08/30/2009, -0/+1Web 2.0 was centralized on user experience more can be found at
http://digg.com/d311mDH
Web 3.0 is the next generation of WEB. Please read this article on Web 3.0
http://digg.com/d312Npv
Regards,
http://pagerankandalexa.com/blog - vackraord, on 02/16/2009, -0/+1Very interresting read. There are a couple of alternatives. We are currently evaluating different db-backends for a project at work where we are having big issues with oracle. I think we are going to use Gigaspaces because we need an enterprise friendly solution and I don't think memcacheDB is mature enough. Give MemcacheDB a year or two and I think it will be enterprise ready.
- anaesthetica, on 02/17/2009, -0/+1I was wondering if the Digg folks talked to Wikipedia or not. Wikipedia's a Top 10 site, with hideous content-creation and database issues. That it has scaled as well as it has should be a case study for every other firm dealing with scalability. Especially since it's so much faster than Digg to boot.
- persistence, on 08/12/2009, -0/+1If Digg is using it, I think it is a great upcoming tool.
- PimpWilly, on 02/13/2009, -2/+2Seeing as its already down, perhaps they should scale their webserver before offering tips on how to scale apps...
- MtheoryX, on 02/13/2009, -2/+1BAWWWWW!!!
I can't get my lame blog reposts to the front page, boo ***** whoo!
STFU already. - MrViklund, on 02/13/2009, -6/+4DiGG me UP!
- thezanman, on 02/13/2009, -11/+6IN SOVIET RUSSIA APPLICATION SCALES YOU
(I had to put something after my rant about the server came back up and my rant about the server being down no longer applied, please feel free to mod down) - inactive, on 02/13/2009, -10/+3The key to scaling up web applications? Move them to the Microsoft technology platform:
http://plentyoffish.wordpress.com/2006/06/10/micro ... - bri719, on 02/13/2009, -14/+3should figure out how to fix Digg algorithm first
- zemadeiran, on 02/13/2009, -12/+1That is highly amusing......
Sould have used JOOMLA!!!!! - copperfossil, on 02/13/2009, -13/+2Or you could just use Amazon S3 or Google App Engine and not worry about most of that stuff.



What is Digg?
Browsing Digg on your phone just got easier with our enhancements to the