Wednesday, November 15, 2006

Build a Web spider on Linux


M. Tim Jones has written a useful article on building a Web spider which can be found on IBM’s website. Spiders are programs that craw the Web and gather information. Search engines use spiders to build their search databases.

The article has some additional resources that are worth reading. Of particular note is an article that suggests several ways to protect yourself from the e-mail harvesting spiders that collect e-mail addresses for spammers.

No comments: