symfony-project.com — No more database-dependent SQL queries, no more external library integration. At last, PHP developers will not dream of Lucene anymore. They can develop their own search engine, with results ordered by relevancy and smart indexing. Take a look at this fifteen-minutes tutorial from the symfony project.
Dec 21, 2005 View in Crawl 4
echo5iveDec 22, 2005
Haha, just last night I was thinking about how to make a simple search engine, though I would write it in ruby. Dugg!
geminitojanusDec 22, 2005
"how it not using a database functions a good thing? your php code vs a database function ... guess which one will return your search result faster."Simple; database code is database-gnostic, that is to say, if I write code for MySQL, there is absolutely no guarentee it'll work in Oracle, DB2, Firebird, or any other database, or even a different kind of table within a single database environment (most databases come with an option on what kind of table to use). In case you have no choice in the matter, it's better to make your code as self-sufficient as possible, as long as it isn't braindamagingly complex and as long as a library doesn't already exist to do the same thing.In fact, in the article he comments that (paraphrasing) "Since the right tools are unavailable (ie Lucene's native PHP port), this is the best alternative". I fully condone what he's done, and I've implemented a simplier system in my own software in the past. Oh, and as for RDBMS's not being designed for speed... yeah, you might want to re-think that one. Relational Databases are designed first and foremost to be as fast as humanly possible, because they are so often the bottleneck of any system. That's why so much has gone into working on indexing and relational algebra. On the other hand, Full Text indexes are slow mainly because of how much manipulation of strings has to be done, and database providers try to stay away from them as much as possible and will recommend that you do it in your application's layer simply because it executes faster there.And clevershark; you're right, in MySQL fulltext indicies are great, but only if you're using a more modern MySQL version, and only if you're using MyISAM tables (InnoDB doesn't support them yet I don't believe, and with the head developer's recent acquisition, they may never). MyISAM tables are *notoriously* slow after they've gained a certain amount of heft. Something that would normally take a tenth of a second may take over two seconds with MyISAM tables after you've got a half million entries (which is entirely possible when you're talking about indexes and search engines).So in finality, this is a great tutorial (if not a bit bloated, but IMO the whole symfony framework is a bit fat). Good for your blog, or even a Digg-clone.
bluparadoxDec 22, 2005
No way in hell that this is a 15 minute project, even if all you are doing is cutting and pasting code it will take longer than that. That said, he seems to have mostly just taken other peoples code and applied it to pretty much the accepted search engine design (with a pretty poor sorting algorithm). This is the same basic design things like phpBB use.
Closed AccountDec 22, 2005
its broken now :(
Closed AccountDec 23, 2005
"timmarhy, a well-designed full-text-index will beat out a RDBMS query anyday. If you don't believe me then you need to learn more about the inner workings of both."bulls**t. maybe in some very very specific situation it would. do you think google is using full text? HA HA to you my friend.writing all your sql in the appilcation is a s**t idea. it's not more bloody portable either, given for example that some db's treat NULL and '' differently.
geminitojanusDec 23, 2005
"what a load of nonsense, so many times i have seen crap implemented in php which could have been done with a built in function of the database, which would have run 2x faster and been 10x simpler to read. what you think your php hacked out in 15 minutes if going to be easier to read then a one liner calling a pre made db function?"Emphasis on the hacks you've seen; perhaps you aren't seeing the right "hacks". On top of this, you once again try to force your beliefs that database portability shouldn't be maintained onto people who work for a living doing just that; making sure code is agnostic and run-anywhere. So basically, you're trying to tell Toyota how to build cars.Here's a suggestion: Don't tell Toyota how to make cars. They've been doing it for years, they know what they're doing, and they know how to make it work for them. Just because you believe a car should be built differently, doesn't mean that everyone else believes it, and everyone I know with a Toyota enjoys their cars dearly. And it's hard to find a more effecient car (in America especially).So while you might have seen something that could have ran 2x faster and been 10x simpler, when I take it to Postgres and errors start overflowing, the egg's on your face, especially if you were trying to sell the product to someone. Me? I use Database Abstraction for what I can (transactions, foreign keys), and write functions for what I can't (certain indexes, like the one used in this article for a prime example [and hey, who said you have to do it this way; why not write a statement that checks the database driver to see if the kind of index can be built, and if so use it, if not, use the application's layer's index? It's a simple solution.]). But if you want to say in the dark and be a lesser programmer simply because you can't work your way around a problem, that's fine with me too. Just hope to never run into your code anywhere, ever.
silentcollisionNov 27, 2007
Nice =/
przemekgFeb 1, 2008
How would database fulltext search work for HTML, BBCode content or PHP serialize data ? I think fulltext search is useful for database administrator, but to use it for a website search engine is suicide!It is possible to apply the described method in symfony in 15 minutes or in whatever framework you are using (if it is well designed and have some sort of DAO for it's model). The only thing that would need more time to implement i a nice HTML frontend :)There is however one problem, you need to find a stemming algorithm for your language or use a less perfect version of the search engine.The only thing I don't like is the way of assigning additional weights based on text location. Using str_repeat to repeat a 40kb text isn't the best idea.There is a lot of space for improvement:- You could parse the content of HTML to find more relevant information and boost it's weight (like tags).- If tags are often changed, you could cache the result of parsing the content and only apply to it parsed tags.- For very busy sites, you could think of caching search results.This is the best tutorial on search engines for sites I ever read - small, easy and working example.
chetan1Dec 9, 2008
good search engine