Sponsored by HTC
Who knows you better than your phone? view!
youtube.com - See you from the perspective of your phone.
43 Comments
- Otto, on 10/12/2007, -0/+16>>>"The problem with wordpress is that the connection is opened at the beginning of every page. Multiple times. (There's a separate connection for the body of the page, and the sidebar, and the individual includes in the sidebar, etc etc)."
No, that's not true at all, nor what he was talking about, nor what his patch does.
First off, Wordpress only uses one connection for everything. Really. Unless you have a plugin or some other code making its own connection, all Wordpress queries are handled through the same connection.
The way it works is that, at the beginning of the execution, an instance of the wpdb class is created. Every query Wordpress does is through this one instance. It has one connection and it maintains it for the life of the execution cycle (until you see the generated page, basically).
What he's talking about is "lazy loading". See, when the instance of this class is created, it connects to the database right then and there. If the rest of the code then goes on and never uses that instance, you wasted your time connecting, yeah? His solution is to wait to actually connect until you need that connection. Basically, his patch eliminates the connection from the class constructor and creates a separate connect() function. Then, the query function is modified to check that a connection exists, and if not, call the connect() function to build one.
The benefit here is that if your page never hits the database for anything, then it never connects at all. This is smarter than the current Wordpress code.
However, it's also unnecessary, really. With Wordpress in particular, it would be extremely difficult to imagine a scenario where it doesn't actually hit the database. Everything comes from the database. Posts, sidebar content, anything dynamic, it all hits it. So this really isn't saving you anything for your average blog. Yes, he is correct that making the connection lazy makes more sense. However, it's a poor example, because Wordpress virtually *always* uses that database connection. Several times.
He also goes on about caching, and yes, caching is good. He doesn't talk about caching with Wordpress, but there are caching hooks in there and plugins which can use them (WP-Cache, for example). This sort of thing implements caching in a very smart manner... smarter than what he's talking about in his code snippets there, certainly. The upshot is that if you use something like WP-Cache, you get everything he's talking about and then some, making Wordpress extremely quick indeed.
Takes some setup, but what doesn't? - Otto, on 10/12/2007, -0/+5Oh, his patch also breaks some Wordpress plugins that use various mysql_* functions. Some of these require a database connection to be previously established, and one of the assumptions a plugin makes is that the wpdb object works and the connection is there. So in addition to not buying anything useful (for the specific case of Wordpress), it breaks existing functionality.
Don't get me wrong, the man does have a point. He simply picked a poor example. - GoodOlClint, on 10/12/2007, -1/+5Works here.
and that makes sense too. I'm gonna have to do that with my PHP scripts from now on. - inactive, on 10/12/2007, -0/+3You're saying that because you don't understand when func_get_args() can be a ridiculously useful tool to use :)
- solodesignz, on 10/12/2007, -1/+4>>>If you're serving semi-static content that is cached indefinately and that cache is invalidated only when the content is updated (like one of the examples), why bother storing them in a database at all? I think it's a neat idea.
So how exactly would you build a content management system, or any basic "dynamic" website? You could have a site without a database and use this cache system... but then what is the point of the cache system?... you could just use static pages.
Only way around that is to store the pages in XML or some similar file storage mechanism. For a dynamic site you have to have some type of database to store the data.
Honestly, your not making much sense there. It looks like you simply looked up the definition of what a database is and pasted it to your first paragraph =) - sonofagunn, on 10/12/2007, -0/+3I think y'all are missing tamarind's point completely. For some content, it would be much more efficient for the "update" step to just write out a file and any reads just read that file and skip the database completely.
Now, there are plenty of other reasons to use a database, but in the narrow context of this example (writing out blog html), it would make sense. - SuperSloth, on 10/12/2007, -4/+6You clearly have no idea what a database is actually used for.
- iamcam, on 10/12/2007, -0/+2Actually... when you make a MySQL connection in PHP, any subsequent connections made with the same parameters ride on top of the first connection (shared). In other words, you can create a bunch of database objects as needed, and they'll all share the same connection.
http://us2.php.net/manual/en/function.mysql-connect.php
from the page:
"new_link
If a second call is made to mysql_connect() with the same arguments, no new link will be established, but instead, the link identifier of the already opened link will be returned. The new_link parameter modifies this behavior and makes mysql_connect() always open a new link, even if mysql_connect() was called before with the same parameters." - karmakillernz, on 10/12/2007, -2/+4This advice is not related to what database you use. Replace mysql_connect with pg_connect, and the advice still applies.
- psyon, on 10/12/2007, -0/+2@tamarind Using the database lets you regenerate the pages with new layouts and styles. If scripts just dumped the content to static files in say HTML 4.0, then if you wanted to transition to XHTML in the future, it would be a long tedious task to go and convert all the files. It also makes searching the data alot simpler.
- aftk2, on 10/12/2007, -0/+1I think everyone is missing the point, or at the very least glossing over the very pertinent "invalidate the cache" section of this comment.
I'm a web developer. Our primary product, a PHP-based CMS, is similar to Wordpress, in that everything comes from a database, and uses a complex, flexible permissions system that can be applied at the page level, the area of the page level, and even the "block-of-content" level (for lack of better terminology.) This type of flexibility in presentation requires a database. However, it _doesn't_ require a database _all the time_. That's why you have a cache.
Think about it this way. I'm requesting /about/company/. Before I even hit the database to find the pseudo-page that fits this path, I check to see if
a) the user who is requesting this page is logged in
b) this page exists in the cache - the cache is basically HTML that's been written to the filesystem.
If the user isn't logged in, then I try to grab the page from the cache. If the page exists in the cache, the user gets that HTML, end of connection. No database, no nothing. If the user _is_ logged in, then I have to check permissions, etc... So I proceed with the full page generation from the database, checking against the user's groups, etc... If the user isn't logged in, but there's no cache entry (I'll get back to this later), then we re-gen the page for the visitor group, -and- save a version of it in the cache, for everyone later.
How does content management come into play? Well, when someone uses the CMS to modify this page, the cache entry is invalidated. Basically, it gets removed from the filesystem. Then, the next visitor who comes to the page won't find a cached entry, and the page will be regenerated for them.
Of course, this isn't perfect - the cache can really only run for users who aren't logged in, because then we know their permissions and cached view states correspond to the view states for the "visitor" group (rather than users and groups who are logged in.) But that'll get past the Slashdot/digg effect in most cases, and make the site far faster, even for users who are logged in. - DangerousDave86, on 10/12/2007, -0/+1In an Apache multithreaded environment it has to be setup so that the max connections of the mysql server equals the max servers that apache can spawn (threads) + some. I don't know if its a bad thing or not. But people often run out of database handles using this approach because they dont understand that each request a user sends to apache is not always processed by the same thread, so pconnect gets called over and over. pconnect makes apache connect to mysql, not PHP to mysql, so when the script ends the connection is still there, and you don't need to reconnect again for the life of that thread to see this small benefit? I really couldn't say what he meant exactly.
The article states that mysql connection times are much quicker than other databases, so perhaps they mean its just not worth it because you have to setup apache to not overload mysql. - domr, on 10/12/2007, -0/+1A useful and well written article, but it's worth noting that there's also an overhead in reading/writing cache files to/from the filesystem.
You could also consider storing cached content in the database. If the content is generated using complicated queries, cacheing to the database will still be much quicker than re-running all the queries queries used to generate the content. - akkuma, on 10/12/2007, -1/+2@tamarind
>If you're serving semi-static content that is cached indefinately and that cache is
>invalidated only when the content is updated (like one of the examples), why bother storing
>them in a database at all? I think it's a neat idea.
You are able to cache results of your db in ram. The guys who manage the db at my company were talking about this. The possible advantage of using Select * over specific Selects if you are going to use a Select from a table numerous times and have the same data in it for quite some time. Instead of 'hitting' your db everytime to process the query it could grab it out of memory since there were no new results.
Now I'm not a db guru by any means, I just know how to do my queries and let those guys handle upping performance. - kayoone, on 10/12/2007, -0/+1This is quite interesting, i wonder however how big the real benefit is if you have lets say 30.000 uniques per day. Does transferring the load from the DB to the filesystem really make it so much faster or does it just create another bottleneck ?
Lazy loading is a nice thing, however useless when using it without a caching mechanism because software like wordpress or any forum needs the DB on EVERY page.
Id like to read more on this topic, also memcached an mysql clustering / load balancing, anyone has some nice links for that ?
memcached also sounds really good, so what it basically does is implement caching similar to this article but also has the ability to spread that cache onto multi machines or what ? - zoom1928, on 10/12/2007, -0/+1One trick to make storing cached pages faster is to create a set of 25 to 1,000 (depending on the ratio of writes to reads) tables named with a hash of the name of the content. MySQL's query cache is very fast, but it gets invalidated on every table write. When you have multiple tables, the chance that your query is no longer cached drops proportional to the number of tables. On our system with 50 tables, our cache hit rate went-up by almost a factor of 20 because the chance any write would invalidate the cached query went down by a factor of 50.
Obviously you want to use InnoDB when doing this so you don't have to open all of the extra individual files that MyISAM tables do.
Since then we've moved our cached HTML pages into memcached. It's simpler and faster than using database server tricks to get more speed. - inactive, on 10/12/2007, -0/+1PHPAccelerator is something completely different - that's just something that caches php scripts so that php doesn't need to compile your .php each and every time you make a request
- zoom1928, on 10/12/2007, -1/+2memcached from http://www.danga.com/memcached/ is what you're looking for. It's amazing. I'm doing about a billion cache look-ups a day with it on some older hardware. I'm doing with one old server what previously took four nice big new servers.
- aftk2, on 10/12/2007, -0/+1DangerousDave: What about websites with pages where certain content should be displayed for certain users, or users in certain groups, but other content needs to be displayed for other users/groups?
- prockcore, on 10/12/2007, -3/+4Sharing a DB connection between PHP instances can be done with pconnect.. but it's not a good idea.
The problem with wordpress is that the connection is opened at the beginning of every page. Multiple times. (There's a separate connection for the body of the page, and the sidebar, and the individual includes in the sidebar, etc etc).
He solved this problem by only opening a connection if a page is actually going to make a request.
The other problem (which he didn't address), is that wordpress holds open a connection for too long, and does too many queries. Loading the frontpage of a standard configuration wordpress blog issues 27 queries over 3 separate connections. Basically, wordpress is a mess. They went include crazy (just look at how many different includes there are for the header alone.. he went 3 includes deep to find the mysql connection). - m242, on 10/12/2007, -0/+1Wow, lazy initialization for database queries. Maybe PHP is finally starting to catch up to Java (specifically, the Hibernate ORM).
- rolosworld, on 10/12/2007, -2/+2OK, I think I see what he is explaining, he has a point on the wordpress constructor... they should create the resource static and check if it has been defined already... he focuses on the reuse of the DB object, so if you create a new DB object only use 1 connection. I was hoping for sharing DB connection between sessions.
- inactive, on 10/12/2007, -0/+0I am going to go out on a limb and bet that there are probably quite a few entry level programmers who saw this article who hopefully learned a thing or two...thats never a bad thing.
- DangerousDave86, on 10/12/2007, -0/+0@kayoone, no website needs database access on every page. If you were to go back to the old days before dynamic content, the database efficiency problem can be solved. You use PHP to generate the website, rather than generate each page from a database, you keep the database of course, as a source to rebuild the pages from scratch when a change is made.
Any blog could be built easily on this principle, and no user interface would need to be changed at all. When a user performs an action, the PHP does its thing, like data manipulation, then also, generates HTML right there and then, rather than at request time, and puts it on the website. The whole site doesn't need rebuilding every time someone makes a comment, just the associated pages where such data is used.
This sort of approach would work just fine on small applications. On larger applications I can see file count and disk space coming into question, as instead of just one template+lots of database data. The site will contain all the HTML for all the pages it has.
While this would be easy to manage, because the user never interacts with the cached files, just the PHP app and the DB. It may be a strain on disk seek times and other caching mechanisms - FeherTigris, on 10/12/2007, -0/+0pls. somebody explain, why pconnect is a bad idea, as prockcore wrote?
- kayoone, on 10/12/2007, -0/+0i dont think its a good idea to generate static pages, that might for for online shops or blogs where the data doesnt change too often, but for lets say a forum it would be a bad idea, you might end overwriting other peoples data and whatnot, generating statics is ok when its done on the admin side, but not if there is user interaction going on.
I read alot on that topic today and found out that regarding caching, memcached is the way to go, add to that some nice MySQL and HTTP Cluster and you have a super efficient anf fast system.
Also memcached stores in RAM not on the FS which is alot faster and better. - rolosworld, on 10/12/2007, -3/+2can someone explain how he makes the single DB connection?
I don't see any difference on what he does and what wordpress do... (unless he defines the resource variable as static), and he explains this as if he could share connections between sessions, how this would work..? (using persistent?) I don't see how he shares the connection between session, would be nice if he explain's in more detail. - BluKnight, on 10/12/2007, -3/+2That's pretty impressive. I've never thought of that way to connect to the database.
Definate DIGG! - DangerousDave86, on 10/12/2007, -1/+0I didn't really read the article, but I think most PHP software is backwards in this department, I mean if you want truly efficient database usage then you want your website to generate pages whenever changes are made.. e.g. The site appears to be static HTML, but really this HTML is being built whenever a change occurs. Currently, pages are built when a request is made, not a change.
Also I might add I've never used any caching software.. I'm sure PHPAccelerator (or other software) automatically does this sort of thing without changing the architecture of the software in use. - DangerousDave86, on 10/12/2007, -1/+0@kayoone a cluster is totally inefficient, its just quick. A well designed static cache would never overwrite 'other peoples data' a badly designed one probably wouldn't even do that.
With the forum 'thing' I did mention a static caches drawbacks in reply to one of your own posts a little further up the comment section.
"This sort of approach would work just fine on small applications. On larger applications I can see file count and disk space coming into question, as instead of just one template+lots of database data. The site will contain all the HTML for all the pages it has."
@bitrich, thanks for that info, I've never used the software before but I've seen production sites that have it installed. - aj3289, on 10/12/2007, -2/+1He is very creative in keeping loads down. That's important when the application needs to scale up to a larger level.
Very good read for web programmers.
Digg+++ - drbaggy, on 10/12/2007, -1/+0Interesting - but he's failed on one minor problem - that his caching algorithm uses the file system - a bigger problem than the one he is solving - the only way that this will work efficiently is if the blocks he is including a very complex (either in the SQL queries used OR in the processing that is performed on the output)
It is admittedly easy to produce and reuse the files - but there are problems with large directories if the site is getting into the sort of traffic that requires this sort of caching...
On a test version of our work server we were using file caching (and having to store temporary images) - we managed to break the file system by creating millions of files overnight - the systems team worked out that it would be quicker to reformat the system disk - and re-install the operating system that deleting all the files with rm (calculated time was somewhere near 32 days to remove files created in 12 hours!) - gharding, on 10/12/2007, -3/+1Why? So you can have an even slower response time? Oh snap!
- timdorr, on 10/12/2007, -4/+2Anyone who uses func_get_args() like that needs to look at how they code. That's some seriously ugly code.
But the general concept is completely right and should be used my most apps. - estei, on 10/12/2007, -2/+0So now that we got the caching idea into our heads what would be the most effective medium for storing your cache?
Is it filesystem? or could we maybe use something like SQlite for it? - spyres, on 10/12/2007, -4/+2I believe Drupal already uses the techniques discussed.
- suomi, on 10/12/2007, -5/+2Pretty entry level material - 90% 'common sense' stuff. It would be good to see more programming articles on Digg - but I fear that they would never go anywhere, and never educate anyone - simply because the majority would just ignore them... Shame.
- dprevite, on 10/12/2007, -6/+2That's fun, he (the author) was just showing me this the other day. It's interesting stuff.
- sonofagunn, on 10/12/2007, -5/+0Oops, hit wrong reply link...
- inactive, on 10/12/2007, -8/+2err, pretty basic stuff.
and please do yourself a favour and use postgresql instead. - tamarind, on 10/12/2007, -8/+2And you clearly have no idea who I am or what I do, right?
Databases are used to hold data in a structured manner, often indexed on important aspects of the data, in a relational or hierarchical form so that it can be quickly inserted, retrieved, queried, and modified. That good enough? I use MySQL in my own projects, though I have no philosophical objection to other database engines.
If you're serving semi-static content that is cached indefinately and that cache is invalidated only when the content is updated (like one of the examples), why bother storing them in a database at all? I think it's a neat idea.
Sheesh, what pedants! - Lifelogger, on 10/12/2007, -13/+4How ironic... a broken website... "Fatal error: session_start(): Failed to initialize storage module: user (path: /tmp) in /home/sites/www.jpipes.com/web/serendipity_config.inc.php on line 6"
- tamarind, on 10/12/2007, -13/+2That caching system is pretty nice. You know, you might not need a database at all (beyond your filesystem, that is) if you use it.


What is Digg?