Sponsored by Best Buy
He sings, he strums, and he works at Best Buy. view!
youtube.com - Musician and Best Buy employee, Keith Parsons, rocks his Best Buy holiday campaign audition.
44 Comments
- neutrino15, on 10/12/2007, -1/+28DIGG, ARE YOU LISTNING!?!?!
(You need a better search feature.. Yours is slow and stupid)
I even find that google.com (using site: is better) - gol706, on 10/12/2007, -1/+15You beat me to that comment. The digg search is awful. It takes forever, half the time it just returns a blank page instead of your search results. But the worst part is they took away the search your own diggs feature which I loved. Once you've dugg enough there's no way you can find old diggs in your profile just by paging through them.
- snlildude87, on 10/12/2007, -0/+12Mirrors? If you're gonna spam, do it right. Ass.
- sambo357, on 10/12/2007, -1/+11"Since one of the rules of the intranet was that all logic code should be written in-house, using an existing open source engine was not an option."
I don't understand why search is logic. What about just using fulltext search or lucene? Isn't PHP also logic if you define this loose? - gharding, on 10/12/2007, -0/+7*hugs his Google appliance*
As much as I like writing my own backend code, writing the XSLT to parse a GSAs search results is sooo much easier and gets much better results. - dwight0, on 10/12/2007, -0/+7he has been doing this all day. digg needs to ban his ip
- duey, on 10/12/2007, -1/+7Lucene ( http://lucene.apache.org/ ) kicks the pants of this search engine.
- cookiebearo, on 10/12/2007, -1/+72 cents from someone who didn't bother to read your comment,
read the article
these kinds of comments are useless, if your comment isn't about the article, you might as well have posted this on a different story - rhinez0r, on 10/12/2007, -4/+9Isn't MS Live Search and "infinitely more powerful" an oxymoron? How did your head not explode fathoming this logic?
- val8ntin, on 10/12/2007, -0/+5Reading this, I couldn't help but be amazed at how Google manages to store and index over 8 Billion pages.
- sambo357, on 10/12/2007, -0/+5 $result = mysql_query(" SELECT p.page_url AS url,
COUNT(*) AS occurrences
FROM page p, word w, occurrence o
WHERE p.page_id = o.page_id AND
w.word_id = o.word_id AND
w.word_word = "$keyword"
GROUP BY p.page_id
ORDER BY occurrences DESC
LIMIT $results" );
This will be slow because ORDER BY is using occurrences, which is the result of COUNT(*) so there is no index by which to sort. Mysql must use the filesort algorithm instead on an index sort. I don't think hundreds of megs would scale well. It wouldn't be long and you would have 50 second queries. Still looks like a fun experiment though. - chadu, on 10/12/2007, -1/+5Actually nothing about the rule makes sense - The manager might as well say... "Let's have the most expensive reinvent the wheel intranet around!"
- blackjack75, on 10/12/2007, -0/+3No, actually digg doesn't need to since users digg it down it for them.
- dakellog, on 10/12/2007, -1/+4"Lucene kicks the pants of this search engine."
Not so fast. I smile when I see startups that use Lucene. It limits their potential. Every once in a while a group sets out to make Lucene scale. I never hear back from them. I assume their projects did not succeed.
For this article, I'm impressed that the guy did not go by the MySQL FULLTEXT route, since fulltext is not scalable either. In the end you have some round-robin design limited by hard drives and RAID.
Scalability for some intranet site is not that important, but his little word database can be scaled out almost infinitely, which is not true for Lucene or FULLTEXT. I like the design. - dakellog, on 10/12/2007, -0/+3"Wikipedia uses it....
http://en.wikipedia.org/wiki/Lucene"
And it's no good.
From wikipedia for a VERY SIMPLE SEARCH:
"No page with that title exists."
Wow, if that's Lucene, that is certainly no full text search as wikipedia itself claims. Maybe Lucene is not so hot after all.
Enter any WORDS on a page into the search box and you will get a title search. Seems like a 1980s library title search to me.
My comments stand as correct. The guy who wrote the article produced a more scalable start than Lucene currently uses. He also wrote a better version of FULLTEXT search than MySQL currently uses. - LordVoldemort, on 10/12/2007, -1/+4The Wise Sage, stroking his beard: "Every once in a while a group sets out to make Lucene scale. I never hear back from them. I assume their projects did not succeed."
Wikipedia uses it....
http://en.wikipedia.org/wiki/Lucene - fatdog789, on 10/12/2007, -0/+2That's not necessary if you've escaped the data string. It helps, but it's like using a laser field to protect your house when you could just lock the door.
- muffinmanpoo, on 10/12/2007, -0/+2@podgey
The point of the article is to create your own search engine, not mooch off somebody else's. - bdmbdm, on 10/12/2007, -0/+2What would you suggest to overcome that problem?
- zoom1928, on 10/12/2007, -0/+2Nice troll there. PHP has never supported more than a single query in a mysql_query() call so that has never worked. It's interesting to watch the irrational PHP-haters post garbage like that. They've repeated that lie for years. The sad part is that some people have believed them so they now use Microsoft-garbage rather than PHP as the PHP-haters wanted. There are way too many pro-Microsoft people disrupting this site.
And this:
> $sql = $mysqli->prepare()
is just nutty. You lose the query cache if you do that. For a non-trivial site, increasing your query load 5 to 10 times will cause a problem. - ipearx, on 10/12/2007, -0/+2This article is pretty old (2002) and was written before MySQL full text search came into existence:
http://dev.mysql.com/doc/en/Fulltext_Search.html
Still might be handy if you need a custom built search system, for instance for technical terms. - BillyG123, on 10/12/2007, -0/+2After hovering and seeing I was headed back to 2002, I decided to graze the comments here first... good thing I didn't bother.
- joestump, on 10/12/2007, -1/+3I'm not sure how many documents he was indexing, but the guy could have saved himself a ton of time by using MySQL's built in functionality.
http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html
http://www.onlamp.com/pub/a/onlamp/2003/06/26/fulltext.html
http://www.joestump.net/content/files/MySQLFULLTEXTSearching.pdf
http://www.joestump.net/content/source/mysql-uc-2005-fulltext/ - flyjedi, on 06/11/2008, -0/+1using google to search sites is always better :)
- mrbandersnatch, on 10/12/2007, -1/+2That page is from TWO THOUSAND AND FRIKKIN **TWO**.
I did something similar in Python recently. The two things that hit me immediately about the code in the article are use of regexp for parsing out HTML, which is prone to break and leaves you with lots of garbage text to index (however the alternatives are not easy to impliment); and the lack of a word stemming. There are implimentations of Porter easily available so its almost inexcusable to not use now. In fact Im going to be writting a simple search engine in .Net this week so Ill be finding out how true that is myself *grin* - cope, on 10/12/2007, -1/+2the php code was below par for any average programmer..
guys if your going to use this, make sure you check every string being put into the data, and don't just "addslashes" its old and not a very nice little function to use... theres better stuff out there. - Automatthias, on 10/12/2007, -0/+1The SQL code examples show a very bad practice leading to SQL injection vulnerability. This is wrong:
$sql = "SELECT foo FROM bar WHERE id = '$keyword';";
You just ask this search for this term:
'; DELETE FROM occurrences; SELECT foo FROM bar where id = '1
...and the occurrences table is empty in a finger snap. SQL statement handling should look like this:
$sql = $mysqli->prepare("SELECT foo FROM bar WHERE id = ?;";
$sql->bind_param("id", $keyword);
$sql->execute(); - inactive, on 09/14/2008, -0/+1http://tophomemortgageloan.com
Mortgage rates, home loans, mortgage, mortgages, refinance, second mortgages, tophomemortgageloan, top mortgages, best mortgage rates.
http://tophomemortgageloan.com - addicted68098, on 10/12/2007, -0/+1Awhile ago I created a system that could pick out keywords from text, so for example this article would come up with things like myisam database PHP etc. It was really effective, but it took 5 seconds to analize a paragraph, when I was half way done.
- flyjedi, on 06/11/2008, -0/+1no it doesn't. however someone has written a lucene engine that searches wikipedia:
http://schmidt.devlib.org/software/lucene-wikipedi ... - zoom1928, on 10/12/2007, -0/+1> requires one to use MyISAM, which is a big minus
Not really. Typically you'll store your production data in InnoDB then write a sanitized version of the original to search from in a separate MyISAM table. If your users are using HTML then you'll save space and make the engine much better if you strip HTML. For example, if you have a source document that contains "test string" and the user searches for "test string" you will definitely want to strip the tags to make the engine do what the user expects. We also store sanitized, text versions of many documents including PDF, ODP, scanned documents that are OCR'd, and to a lesser extent Microsoft Word. We haven't found any software yet that does a good job with the Microsoft Word garbage. Of course, even Word will only open about 1/2 of the Word documents we have so I don't understand why users expect us to be able to do better with Word than Microsoft does. - zoom1928, on 10/12/2007, -1/+1How would you do either fulltext search or lucene when they didn't exist at that time? Are you claiming the guy that wrote it should have had a time machine so he could travel five years into the future to see what solution were best in the future? What a wacky post. It is interesting to see the Digg-haters that gave you points for that garbage.
- shivssb, on 10/12/2007, -0/+0hi , any one can tell me seriously no joking, is there other method of sorting which is faster than qsort in terms of managing data type, size and speed.
if so why it is not so spoken about.
i have a method which handle any size of data of any type ie string, byte, int float etc, and speed is almost 100 times of qsort.
memory usage is also very less.
pls comment on this - vinophp, on 01/06/2009, -0/+0Hai
i need to create YAML format using PHP .
Anybody knowing please help me !!! - yogastore, on 06/27/2008, -0/+0http://astore.amazon.com/flowtron.mosquito-20
http://astore.amazon.com/flowtron.insect.killer-20
http://astore.amazon.com/evaporative.air.cooler-20
http://astore.amazon.com/air.swamp.cooler-20
http://astore.amazon.com/braun.electric.kettle-20
http://astore.amazon.com/cordless.electric.kettle- ...
http://astore.amazon.com/canon.battery.charger-20
http://astore.amazon.com/12.volt.battery.charger-2 ...
http://astore.amazon.com/furniture.chaise.lounge-2 ...
http://astore.amazon.com/outdoor.chaise.lounge-20 - tagawa, on 10/12/2007, -2/+2You've got to give him points for honesty, though.
- vinophp, on 01/06/2009, -0/+0Hai
i want to use lucene search engine using php in my project....
how to install lucene and how to use my database ..... - zacware, on 10/12/2007, -1/+0Despite it's date, this article is still a great find. First of all, for a company intranet, at least from my experience, mysql's fulltext option, swish-e and lucene are not practical in many ways. People want to be able to search notes placed in database notes fields, and mysql's fulltext search requires one to use MyISAM, which is a big minus compared to InnoDB for scalability and reliability (no tranactions, frequent optimizations needed, no row level locking). And as far as lucene and swish-e are concerned, they are great for many applications, but to my knowledge (and I could be wrong here), you can't easily do live updates to the index. If I edit an existing note in my MySQL table, how do I update the index so I can find the new text in that note????????? Most of what I read (and this is just what i've read not what I know) is that it's basically faster to rebuild the entire index than to try to append it? If one of my users enters a note at 11AM, another user might want to search for a keyword in that note 3 hours later. Using this method in the article it might be possible. I don't see how that would work with something like swish-e. For a small company intranet, this solution outlined in the article sounds like the best solution available as of now
- unsolicited, on 10/12/2007, -3/+1Check http://www.aspseek.com/ and its source http://www.aspseek.org/
- podgey22, on 10/12/2007, -4/+2Because the choice is Yahoo, MS Live or your own ***** algorithm. Google retracted their API.
It's amazing how people see something in the name that is Pro-MS and bury. Grow up. - eluusive, on 10/12/2007, -4/+0Poor person has never heard of swish-e I bet. =/ It's most excellent. I setup a search engine for my work in a few hours, it's extremely fast too.
http://swish-e.org/ - nevas, on 10/12/2007, -7/+2@stupidppl
it should have been http://spamrules24.blogspot.com/ - omghi2u2, on 10/12/2007, -10/+3Digg must be using php and the example from this story then.
- podgey22, on 10/12/2007, -18/+2Here's a better (and simpler) tutorial that teaches you to make an AJAX search engine that uses MS Live Search as the backend -- making it infinitely more powerful:
http://www.thepcspy.com/articles/programming/ajax_search_with_ms_live_and_mootools


What is Digg?