Spider bugfix

There were two issues with version 0.4.0 of Spider, both caught by Henri Cook. These are now fixed in 0.4.1: As documented, you use IncludedInMemcached like this: require ‘spider/included_in_memcached’ . Sometimes HTTP redirects assume a base URL; this is now handled.

Spider with memcached

The problem with Spider has been that it can use all your memory. The reason is that the Web is a graph, and to avoid cycles Spider stores each URL it encounters. Since the Web is a really, really, really gigantic graph, you eventually run out of memory. Now you can use memcached to use […]

Proxied Spider

Aha: if you need to proxy your Spider calls, look no further than the HTTP Configuration gem. I didn’t write this, and have yet to use it, but I think it goes like this: So next up will be a tutorial with stuff like this and other cool stuff, plus a way to use memcached […]

Spider: API changes, setup and teardown, HTTP headers

The newest version of Spider, 0.3.0, is hitting your gem tree Real Soon Now. This release features: Set the headers to a HTTP request. This can be used to set the cookies, user agent, and many other fine things. setup and teardown handlers. Seems like a good place to set the headers if the headers […]

Spider bug fix release

John Nagro immediately reported errors with the Spider Ruby gem, so I’ve fixed them in 0.2.1. You should upgrade, especially if you want support for: URLs without any path component (e.g. http://example.com?s=1). HTTP redirects. HTTPS. John also had some good ideas, so here is what is in the works: The ability to construct a complete […]

An updated way to spider the Web with Ruby

I’ve released version 0.2.0 of Spider. Everything has changed: Use RSpec to ensure that it mostly works. Use WEBrick to create a small test server for additional testing. Completely re-do the API to prepare for future expansion. Add the ability to apply each URL to a series of custom allowed?-like matchers. BSD license. The new […]

Spider the Web with Ruby

I wrote a Ruby library for crawling the Web. Use it to take down The Man, like so: I used it to gets people’s addresses from around the Web. I plan to put them on a map. I like putting things on maps. It once took obscene amounts of memory, until I discovered that Ruby […]

Follow

Get every new post delivered to your Inbox.