Sponsored by Dragon Age: Origins
Join the Dragon Age: Origins development team on Facebook view!
facebook.com/DragonAgeOrigins - EA presents BioWare's new dark fantasy epic Dragon Age: Origins. '9/10' from Game Informer.
26 Comments
- erkokite, on 10/12/2007, -0/+23Linux ® ?
- inactive, on 10/12/2007, -2/+14Yeah, but that last person didn't have enough people in their "click" to make it to the front page. Its much easier for people like bonlebon to just submit an article that someone else has submitted and have all their friends digg it to the front page instead of just digg the other person's article and have their friends digg that one too.
- jiminoc, on 10/12/2007, -3/+13considering a good number of users are programmers I'd say yes, now STFU
- Arevos, on 10/12/2007, -0/+7The author of the article uses Ruby and Python to create his web spiders. Perhaps he should have chosen languages he was more familiar with, because his code is absolutely awful.
For instance, in order to check whether the HTTP headers hash has a particular value, he _iterates_ through every item. It's a hash, dummy! You don't need to iterate over a hash to find a value - that's the whole bloody point!
Another good example of how not to do things is in the Python example. Instead of using urllib.urlopen to open a page, he builds his own HTTP GET function. And instead of using urlparse, he uses a whole bunch of regular expressions. And he uses nested ifs with the same else clause repeated three times - use boolean logic, man! Boolean logic!
Newbies wishing to learn how to make their own spider program should take this article with a large pinch of salt. It covers the basics, but with the shoddiest programming I've ever seen in an IBM developerworks article. - TheSiz, on 10/12/2007, -1/+7This is a very good tutorial actually. Would be great to tie it into a "how to build a search engine" tutorial.
Either way, its better than the tons of "how to web 2.0 your life" tutorials. :-/ - BlackAdderIII, on 10/12/2007, -1/+6Didn't you know?
Linux is a registered trademark of Linus Torvalds.
No, seriously. :-) - unreal32, on 10/12/2007, -2/+6LOL at the "Linux ®"
Welcome to the new world of open source - now with trademarks for your comfort and protection. - BlackAdderIII, on 10/12/2007, -0/+3Hmm? Linux is a Registered Trademark, belonging to Torvalds.
- Derrekito, on 10/12/2007, -1/+4yeah... wtf?
- spooq, on 10/12/2007, -0/+1He probly had the problem of not being sure if the key existed or not, and didn't know about the Hash.has_key? method.
- breakaway, on 10/12/2007, -0/+1404
- placidified, on 10/12/2007, -0/+0Ahhh that's what seemed wrong to me also with the ruby code.
LOL at using a Hash as a Array.
Ahhh nested ifs in Python...ugliness central - gateway, on 10/12/2007, -1/+1hmm i have been looking for a good web spider to crawl the net, i want to build my own mini search engine for a project im looking at.. anyone know of any avail packages?
- inactive, on 10/12/2007, -1/+1Anyone know stuff for multiple rtsp stream downloading? I looking for something like wget where you define the page and the script or program start to ripp automatically podcasts from the site with an external application e.g mplayer. Don't have time to write this now :// So who was so tough to write something like this?
- daftman, on 10/12/2007, -0/+0yea if you pay me for it.
pirating p0rn cost money you know - toddcw, on 10/12/2007, -2/+1screen-scraper (http://www.screen-scraper.com/) runs fabulously on Linux, and integrates well with most modern programming languages. It can save all kinds of time over writing Perl and Python scripts. There's a free (as in beer) version available, and a pro version if more features are wanted.
- oxyrubber, on 10/12/2007, -5/+3WGET and CURL are web spiders. This article goes a little further and shows newbies how to parse the pages and extract inportant info from them.
Are there any Linux newbies that can understand this article but don't already know how to do this? - jenny867, on 10/12/2007, -2/+0I would love to see this extended to include practicle uses such as monitoring a web site for tracking data and other important information.
- inactive, on 10/12/2007, -5/+3Yes, yes it is.
In our next lesson we'll look at how to make a spider post data to a form on a web page, how to defeat CAPTCHA and then we'll discuss mass-marketing your cool new site to 1000000s of blogs in minutes! - inactive, on 10/12/2007, -3/+0yep, your right, it is a duplicate and you are an idiot.
- jbardt, on 10/12/2007, -5/+1It is much easier to build web spiders or scapers using wget.
Even the big news sites use this method.
J. Bardt
jbardt@insidertraders.info
http://bardt-links.com - TheSiz, on 10/12/2007, -5/+1edit: meant to reply to masaoster
- juicygossip, on 10/12/2007, -6/+1I can't stand spiders. But when it comes to finding stuff on the web it helps, but they can also be dangerous. Watch out for spiders.
- inactive, on 10/12/2007, -7/+2good tutorial for beginner spammers and bot writers, digg ;)
- Alexius, on 10/12/2007, -13/+5Duplicate http://digg.com/linux_unix/Build_a_Web_Spider_on_Linux
- masaoster, on 10/12/2007, -19/+0is digg a tutorials site?
What is Digg?
Browsing Digg on your phone just got easier with our enhancements to the