November 10, 2007 – 1:35 am
There were two issues with version 0.4.0 of Spider, both caught by Henri Cook. These are now fixed in 0.4.1:
As documented, you use IncludedInMemcached like this: require ’spider/included_in_memcached’ .
Sometimes HTTP redirects assume a base URL; this is now handled.
November 2, 2007 – 6:14 pm
The problem with Spider has been that it can use all your memory. The reason is that the Web is a graph, and to avoid cycles Spider stores each URL it encounters. Since the Web is a really, really, really gigantic graph, you eventually run out of memory.
Now you can use memcached to use not [...]
November 1, 2007 – 6:01 pm
Aha: if you need to proxy your Spider calls, look no further than the HTTP Configuration gem.
I didn’t write this, and have yet to use it, but I think it goes like this:
http_conf = Net::HTTP::Configuration.new(:proxy_host => ‘localhost’, :proxy_port => 8881)
http_conf.apply do
Spider.start_at(‘http://example.com/’)
end
So next up will be a tutorial with stuff like this and other [...]
November 1, 2007 – 12:00 am
The newest version of Spider, 0.3.0, is hitting your gem tree Real Soon Now. This release features:
Set the headers to a HTTP request.
This can be used to set the cookies, user agent, and many other fine things.
setup and teardown handlers.
Seems like a good place to set the headers if the headers are conditional on the [...]