44 Comments
- aroedl, on 10/12/2007, -1/+19Yeah, but isn't it extremely funny that the article is about a loadbalanced HA Apache cluster?
- avaksi, on 10/12/2007, -0/+8That's why god made http://www.duggmirror.com
- grizwald, on 10/12/2007, -4/+11digg is starting to become kinda annoying .. 75% of all of the front page stories get hammered so hard, you can't even read them. i think this is a real issue for digg. Could prevent this from becoming mainstream. The revision3 team needs to figure out a way to combat this .. i know they don't have control over the servers that these stories are hosted on ... but it is a real problem.
maybe some kinda cache system or something. one thing is for sure, digg will never grow much more if the only thing the stories are linking to are dead webservers. - sumrandommember, on 10/12/2007, -0/+5Or how 95% off digg stories today are ripped from Google's sidebar?
- expunged, on 10/12/2007, -0/+3I think you are talking about failover as opposed to high availability. One way to keep the servers in sync is to use drbd and heartbeat. drbd is the package relevant to your question- think of it as a network RAID that's used to keep two filesystems on separate machines in sync.
There is an open project that deals with this sort of thing: http://www.linux-ha.org/
It's pretty much a collection of freely available tools and instructions on how to get them to play together.
I hope this helps! - theprodigy, on 10/12/2007, -0/+3Why isn't duggmirror integrated into Digg? Seems like a good solution.
- tweakt, on 10/12/2007, -0/+3Cool, how much does that job pay? ;-)
- ElJefeGrande, on 10/12/2007, -0/+2This article appears not to have been duggmirror(ed).
- b7j0c, on 10/12/2007, -0/+2i strongly recommend a hardware load balancer for anyone who feels they need this functionality. i have been using these devices for years now and they make development very easy. rotating servers in and out of the live farm is trivial even for novice developers. these devices truly deliver.
- m242, on 10/12/2007, -0/+2I agree about the hardware load balancer. A BigIP from F5 labs is as bulletproof as you'll get, and makes all of this specialized software configuration moot.
- Pile, on 10/12/2007, -0/+2Who needs load balancing? A simple second entry in a DNS round robin will split traffic across multiple servers and it takes about 11 seconds to implement. This seems like a case of technology overkill.
- aroedl, on 10/12/2007, -0/+1What is a hardware load balancer? Don't you think that, under the hood, those LB appliances aren't software based?
Even Google - and they know what they are doing - is using it's own software based load balancing system: "Google uses three software systems built in-house to route queries, balance server loads and make programming easier."
http://www.internetnews.com/xSP/article.php/3487041 - tadorna, on 10/12/2007, -0/+1Maybe you should check your internet connection, the page loads fast for me.
- ElJefeGrande, on 10/12/2007, -1/+2Question about this article: they mention that sessions will be handled (by using a smart load balancer instead of round-robin dns). But what about in the event that one of the apache nodes dies?
- ElJefeGrande, on 10/12/2007, -1/+2Try Coral Cache if all else fails.
- ElJefeGrande, on 10/12/2007, -0/+1I realize that as I did read the article pretty carefully. My question is regarding sessions. If a session is started on one server (say, the user has logged in to some web-based application) then there exist session variables which can only be used between that user and his respective server. But if one of the apache nodes goes down, what happens to the sessions. I realize that if there are no sessions then the user will not notice any difference in useability.
- ElJefeGrande, on 10/12/2007, -0/+1What hardware load balancers do you recommend?
- sixspeed, on 10/12/2007, -1/+2It's loading ok for me.
- b7j0c, on 10/12/2007, -0/+1aroedl - i believe the discussion wrt google is about query balalncing for search. the websites i have been using hardware load balancing on have been media/content sites under a domain that gets the most traffic on the web (gee, guess), so i know they work under extremely high loads. as to the software inside these boxes....its simple, you never deal with it.
- Egoist, on 10/12/2007, -0/+1All depends. Are the sessions stored in files or in the database? Is the database a seperate box that both servers access?
Let's say that there's a single database server sitting behind the two web servers (bad design, but this is just an example) and all sessions are stored in the database. If web node 1 goes down while a user is using it, they will transfer seamlessly to node 2, assuming there are no SSL certificate issues to deal with. If sessions are stored as files, they'll have to restart the session when they transfer. - inactive, on 10/12/2007, -0/+1Hardware load balancers are much more easier to setup and maintain. Unless you are a pro at Networking setting one up is way too complex and time consuming and outwieghs the benifits. Trust me there is a lot of stuff such as SSL certs, Layer 7 Balancing, Ease of rotation, Dynamic Failover, Routing algorithms, Graphing and statistics that hardware load balancers provide in a very easy to use interface.
I have tried Coyote Points equalizer which is a really great package for medium sized businesses. - aroedl, on 10/12/2007, -1/+2@ElJefeGrande (WHY THE HELL are the comments not nested?)
The session is basically lost. You'd have to transfer the session information somehow. BUT:
Sometimes it is possible to retrieve the session data from the backend database. The client holds a cookie , so he can be identified and matched to a db entry. There are already solutions for Ruby on Rails and I think Java. The great disadvantage: you have to write back session data to the database more often. - aroedl, on 10/12/2007, -0/+1You mix up load balancing and HA. Does your clever DNS round robin know, when one of the web servers goes down? No. Are you awake when one goes down and able to change the DNS setting? No.
- adamsitting, on 10/12/2007, -0/+1So any one have hardware loadbalancer suggestions? I only saw one.
- delton, on 10/12/2007, -0/+1I don't need a server, but I was wondering if there are any other applications for this kind of cluster. Like, can you run any Linux program, or just Apache, or what??? Would it be possible to run mulitthread windows applications via virutalware, and then distribute the process threads across the cluster with this kindof scheme? Does that make any sense at all?? I've been intrested in building that sort of supercomputer for quite some time, since I have a lot of p2s and p3s laying around that could be used.
- rasterbator, on 10/12/2007, -0/+1If you're here, grizwald, digg is mainstream.
- Democritus2, on 10/12/2007, -0/+1http://wiki.linuxquestions.org/wiki/LVS_with_HA_for_Win2k_Terminal_Servers
Shows you how to use this to balance win2k term servers. Really you can use it for about anything. Several years back I built a system for a University to balance their incredibly inefficient and buggy MSsql/Citrix/Term server application. - vigil, on 10/12/2007, -0/+1You can load balance all sorts of network protocols. SMTP, HTTP, etc.
I work on load balanced Postfix and Domino boxes in my environment. The balancer pitches SMTP traffic around, though in this case it's round robin style. Our web team obviously deals with more of the 80 and 8080/443 traffic.
But, in answer to your question the info page on the load balancing software they are talking about in this article lists a few protocols it can balance for, not just apache. - rasterbator, on 10/12/2007, -0/+1I have no problem with high availability, but the hard part is balancing your load. ;-)
- inactive, on 10/12/2007, -0/+1Perhaps the people that run Digg could be smart and rewrite all non-cache URLs from a submitted story by using a rotating scheme of Coral Cache and other such caching systems.
- grizwald, on 10/12/2007, -1/+2"Yeah, but isn't it extremely funny that the article is about a loadbalanced HA Apache cluster?"
yeah .. lol
that is why i finally posted about it .. i thought it was funny that the server was down - geronimo, on 10/12/2007, -0/+1I have used Linux Virtual Server for years, I love it. It is to me the most underrated open source project around, or at least it used to be until not so long ago. Instead of ultramonkey I use keepalived ( http://www.keepalived.org/ ) , which provides a robust C solution vs a perl solution. Keepalived allows for failover/backup, handles syncing TCP states, does the ARP stealing for you. I'm sure ultramonkey does the same thing.
- inactive, on 10/12/2007, -0/+1And to answer my own question, here is a Firefox extension for doing this very thing...
https://addons.mozilla.org/firefox/2570/
For Slashdot, there is the Slashdotter extension for Firefox, that automatically inserts the Coral Cache link after each story link. - geronimo, on 10/12/2007, -0/+0Then what happens if one of your machines go down. Removing that IP from DNS rotation isn't so easy, it takes a while sometimes weeks for propagations to fully permeate, and not everyone obeys by the same DNS timeout rules. Woops, better put another machine in place of it or make sure your other machine takes over the IP, doubling its load. Or what if you decide you no longer want to use that IP for load balancing, you now run into the problem of your DNS propagation taking a long time to propagate.
Why do that when you can use a load balancer (with a backup) which you can instantaneously update to modify which servers get traffic, how much traffic, etc etc. With DNS load balancing you cannot have one machine reliably get more traffic than another, with a load balancer you can use one of many load balancing algorithms, some based on CPU for example, to distribute the load.
It may be overkill for most sites which arent very concerned about uptime/traffic, for bigger ones it can be a lifesaver. - expunged, on 10/12/2007, -0/+0I think the solution in this article applies to web applications and services. Most high availability servers are built around providing services.
I think these links are more relevant to your question:
http://www-unix.mcs.anl.gov/mpi/
http://www.lam-mpi.org/about/overview/ - fragPacket, on 10/12/2007, -0/+0I've used F5 Big-IPs and 3DNS load balancers for the last 6 years and they've continued to make a good product better -- The ones I've worked with ran a derivative of BSDi and as noted in other comments are really a software-based solution, even though they run on dedicated hardware. the problem with them is that they're EXPENSIVE and you're locked into a single vendor's release / upgrade schedule.
I would think that with a web-based interface (for those who like that sort of thing) and some hooks into a metrics-based monitoring solution (MRTG or similar) to ensure your load balancer is doing what it's paid to, you could have a nice little HA package that is really cost effective for a start-up or small business. - aroedl, on 10/12/2007, -2/+2Just read the article *carefully*:
"In addition to that, if one of the Apache nodes goes down, the load balancer realizes that and directs all incoming requests to the remaining node which would not be possible with round robin DNS." - inactive, on 10/12/2007, -1/+1We need more of these why well ( The digg effect!)
- XStatic, on 10/12/2007, -1/+1A digg cache would be welcomed!
- dc2447, on 10/12/2007, -0/+0The problem with network style loadbalancers is that in my experience you have increased functionality - gslb etc but reduced diagnostics compared to Linux based appliances. When you have a problem on a linux based applinace you can use whatever tools your OS provides to debug the problem whereas on switches you can be restricted to ping, traceroute and few others.
It is in everyones best interest for more development wotk on Linux based loadbalanacing. - mrhaines, on 10/12/2007, -1/+1This story reminds me why I didn't go into Computer Programming. I'll play World of Warcraft, you handle the load balanced high availability apache clusters!
- idreamincode, on 10/12/2007, -0/+0I agree with the DNS entry idea. It's not a bad idea, and for VERY simple 11 second load distributing, its GREAT! If one of the servers goes down, then just get it back up! I have texts sent to me when a server goes down. Very simple.
- shawnh, on 10/12/2007, -0/+0Several solutions exist for managing session state across an http server farm. Common topic on PHP forums.
- nOOBert, on 10/12/2007, -2/+0Useful.. Now to get a site big enough to have to implament. :)
Btw the site looks pretty useful too.
Good post because of the fact that good howtos are hard to find.


What is Digg?
Check out the new & improved