33 Comments
- geminitojanus, on 10/12/2007, -0/+1"how it not using a database functions a good thing? your php code vs a database function ... guess which one will return your search result faster."
Simple; database code is database-gnostic, that is to say, if I write code for MySQL, there is absolutely no guarentee it'll work in Oracle, DB2, Firebird, or any other database, or even a different kind of table within a single database environment (most databases come with an option on what kind of table to use).
In case you have no choice in the matter, it's better to make your code as self-sufficient as possible, as long as it isn't braindamagingly complex and as long as a library doesn't already exist to do the same thing.
In fact, in the article he comments that (paraphrasing) "Since the right tools are unavailable (ie Lucene's native PHP port), this is the best alternative".
I fully condone what he's done, and I've implemented a simplier system in my own software in the past. Oh, and as for RDBMS's not being designed for speed... yeah, you might want to re-think that one. Relational Databases are designed first and foremost to be as fast as humanly possible, because they are so often the bottleneck of any system. That's why so much has gone into working on indexing and relational algebra. On the other hand, Full Text indexes are slow mainly because of how much manipulation of strings has to be done, and database providers try to stay away from them as much as possible and will recommend that you do it in your application's layer simply because it executes faster there.
And clevershark; you're right, in MySQL fulltext indicies are great, but only if you're using a more modern MySQL version, and only if you're using MyISAM tables (InnoDB doesn't support them yet I don't believe, and with the head developer's recent acquisition, they may never). MyISAM tables are *notoriously* slow after they've gained a certain amount of heft. Something that would normally take a tenth of a second may take over two seconds with MyISAM tables after you've got a half million entries (which is entirely possible when you're talking about indexes and search engines).
So in finality, this is a great tutorial (if not a bit bloated, but IMO the whole symfony framework is a bit fat). Good for your blog, or even a Digg-clone. - silentcollision, on 11/27/2007, -0/+1Nice =/
- francois, on 10/12/2007, -0/+1The technique explained in this tutorial is completely portable, and was developed with that constraint. From that point of view, it is better not to rely on specific mySQL or Oracle commands.
Of course, if you choose to adapt it and get rid of the portability, you can still add database system dependant optimizations. - jgchristopher, on 10/12/2007, -0/+0hometoast said "thats what PEAR is for. You CAN use databases have portability. change "mysql://user:pass@server/database" to "oracle://..." or "MSSQL://"
ya, I'd be willing to bet this is too slow for mass consumption.."
Abstraction of your database into configuration changes isn't going to fix your portability problem if you are using proprietary functions for a given database. That is the type of portability that Francois was referring to.
Also, while this may not be a google killer, I will bet that it is more than capable of performing the action intended in high traffic situations. - monalami, on 11/17/2008, -0/+0All the best search engines piled into one. Including Google, Yahoo, sport search engines, science and medical search engines, encylopedia search engines, government and legal search engines, education search engines, news search engines, meta search engines.....
http://www.allthebestsearchengines.blogspot.com - jesusphreak, on 10/12/2007, -0/+0>>> I don't get this fascination with Rails. Seems to me that it's the Basic of scripting languages. How many people here laugh at people who use Visual Basic, but talk about Rails like it's the scripting language of the gods? >>>
1) Rails is a framework, not a scripting language
2) There are many many Java/.NET professionals switching over to Rails. I assure you they wouldn't be doing it if it was just a VB
Ruby is an extremely powerful scripting language. Rails is a Ruby framework that helps you build good web apps fast.
Symfony is definitely pretty. The website is very well done. Never used the framework itself, though, so I can't comment on it. - inactive, on 10/12/2007, -1/+1how it not using a database functions a good thing? your php code vs a database function ... guess which one will return your search result faster.
- hometoast, on 10/12/2007, -0/+0francois said: ""The technique explained in this tutorial is completely portable, and was developed with that constraint. From that point of view, it is better not to rely on specific mySQL or Oracle commands.
Of course, if you choose to adapt it and get rid of the portability, you can still add database system dependant optimizations.""
thats what PEAR is for. You CAN use databases have portability. change "mysql://user:pass@server/database" to "oracle://..." or "MSSQL://"
ya, I'd be willing to bet this is too slow for mass consumption.. - przemekg, on 02/01/2008, -0/+0How would database fulltext search work for HTML, BBCode content or PHP serialize data ? I think fulltext search is useful for database administrator, but to use it for a website search engine is suicide!
It is possible to apply the described method in symfony in 15 minutes or in whatever framework you are using (if it is well designed and have some sort of DAO for it's model). The only thing that would need more time to implement i a nice HTML frontend :)
There is however one problem, you need to find a stemming algorithm for your language or use a less perfect version of the search engine.
The only thing I don't like is the way of assigning additional weights based on text location. Using str_repeat to repeat a 40kb text isn't the best idea.
There is a lot of space for improvement:
- You could parse the content of HTML to find more relevant information and boost it's weight (like tags).
- If tags are often changed, you could cache the result of parsing the content and only apply to it parsed tags.
- For very busy sites, you could think of caching search results.
This is the best tutorial on search engines for sites I ever read - small, easy and working example. - byob, on 10/12/2007, -1/+1nice, a bit bloated for my use but well put together
- Echo5ive, on 10/12/2007, -0/+0Haha, just last night I was thinking about how to make a simple search engine, though I would write it in ruby. Dugg!
- chetan1, on 12/09/2008, -0/+0good search engine
- inactive, on 10/12/2007, -0/+0"timmarhy, a well-designed full-text-index will beat out a RDBMS query anyday. If you don't believe me then you need to learn more about the inner workings of both."
*****. maybe in some very very specific situation it would. do you think google is using full text? HA HA to you my friend.
writing all your sql in the appilcation is a ***** idea. it's not more bloody portable either, given for example that some db's treat NULL and '' differently. - snyy, on 10/12/2007, -0/+0google mini right next to m sever has me covered
- jgullickson, on 10/12/2007, -0/+0timmarhy, a well-designed full-text-index will beat out a RDBMS query anyday. If you don't believe me then you need to learn more about the inner workings of both.
RDBMS isn't designed for speed, in fact speed is probably third or fourth on the list... - CaughtThinking, on 10/12/2007, -0/+0That article, and this article title is so misleading. There is way more to being a "search engine" then just creating a rudimentary index. No Digg.
- yayson, on 10/12/2007, -0/+0I was interested to see that this is part of an advent calendar thing like the offering at http://www.24ways.org/
Regardless of whether you'd choose to implement the symfony framework for a project or not you gotta give these people kudos for the framework and this advent tutorial. Well done and dugg on both counts. - clevershark, on 10/12/2007, -0/+0You have to admit that if you're already using MySQL anyway, it's far easier to build fulltext indexes (indices?) on the relevant columns and use those. The results you get from the match... against query will also assign a relevancy score to the results which you can use for sorting, etc.
- inactive, on 10/12/2007, -1/+0its broken now :(
- headzoo, on 10/12/2007, -1/+0@paulypopex - I suppose. I guess it depends on how much you know.
- inactive, on 10/12/2007, -1/+0One hour my ass ...there has to be an easyer way of doing this surely?
- inactive, on 10/12/2007, -1/+0"In case you have no choice in the matter, it's better to make your code as self-sufficient as possible, as long as it isn't braindamagingly complex and as long as a library doesn't already exist to do the same thing."
what a load of nonsense, so many times i have seen crap implemented in php which could have been done with a built in function of the database, which would have run 2x faster and been 10x simpler to read. what you think your php hacked out in 15 minutes if going to be easier to read then a one liner calling a pre made db function?
and as i said above. database portablity is a MYTH - geminitojanus, on 10/12/2007, -1/+0"what a load of nonsense, so many times i have seen crap implemented in php which could have been done with a built in function of the database, which would have run 2x faster and been 10x simpler to read. what you think your php hacked out in 15 minutes if going to be easier to read then a one liner calling a pre made db function?"
Emphasis on the hacks you've seen; perhaps you aren't seeing the right "hacks". On top of this, you once again try to force your beliefs that database portability shouldn't be maintained onto people who work for a living doing just that; making sure code is agnostic and run-anywhere. So basically, you're trying to tell Toyota how to build cars.
Here's a suggestion: Don't tell Toyota how to make cars. They've been doing it for years, they know what they're doing, and they know how to make it work for them. Just because you believe a car should be built differently, doesn't mean that everyone else believes it, and everyone I know with a Toyota enjoys their cars dearly. And it's hard to find a more effecient car (in America especially).
So while you might have seen something that could have ran 2x faster and been 10x simpler, when I take it to Postgres and errors start overflowing, the egg's on your face, especially if you were trying to sell the product to someone.
Me? I use Database Abstraction for what I can (transactions, foreign keys), and write functions for what I can't (certain indexes, like the one used in this article for a prime example [and hey, who said you have to do it this way; why not write a statement that checks the database driver to see if the kind of index can be built, and if so use it, if not, use the application's layer's index? It's a simple solution.]). But if you want to say in the dark and be a lesser programmer simply because you can't work your way around a problem, that's fine with me too. Just hope to never run into your code anywhere, ever. - Clickerness, on 10/12/2007, -3/+1Trains definitely do run on rails. Coincidentally they run on asian women too.
Was that wrong? - inactive, on 10/12/2007, -2/+0Cool
- jnorris441, on 10/12/2007, -4/+2Who needs a search engine? ***** my visitors. They're scumbags.
- headzoo, on 10/12/2007, -3/+1I don't get this fascination with Rails. Seems to me that it's the Basic of scripting languages. How many people here laugh at people who use Visual Basic, but talk about Rails like it's the scripting language of the gods?
- BluParadox, on 10/12/2007, -2/+0No way in hell that this is a 15 minute project, even if all you are doing is cutting and pasting code it will take longer than that. That said, he seems to have mostly just taken other peoples code and applied it to pretty much the accepted search engine design (with a pretty poor sorting algorithm). This is the same basic design things like phpBB use.
- jboi, on 10/12/2007, -4/+1boazg
and rails are... ?
Google says trains run on it ... help me i'm lost! - paulypopex, on 10/12/2007, -3/+0I thought PHP was the new basic, lots of non programmers using it to do really simple things anyway. Maybe it's the new COBOL?
- petert101a, on 10/12/2007, -5/+1@boazg get a life
- syah, on 10/12/2007, -5/+0Ga da koment ah...
- boazg, on 10/12/2007, -7/+0step 1: LEARN RAILS
step 2: if you now know rails, this is trivial. else return to step 1


What is Digg?
The Digg Toolbar for Firefox lets you Digg, submit content, and keep track of Digg even when you're not on the Digg site. Download the official