Powerset in Ruby Using the List Monad

While trying to think of good, quick interview questions I stumbled across power set (thanks to Mike DiStaula for suggesting it). After 10 minutes of hacking in Ruby I discovered that it’s not a reasonable replacement for our current starter question (it takes too long), but it’s also a fun problem to solve.

Then I wandered across this definition of power set in Haskell:

powerset = filterM (const [True, False])

Well, that was easy. So I set out to write that in Ruby. That is, I wanted to write this:

class Array
  def powerset
    self.filterM {|x| [true, false]}

First thing I need is a list monad:

class Array
  def self.unit(x)

  def bind(&f)
    self.inject([]) do |acc, e|
      acc + f.call(e)

I quickly hit my weekly reminder that Ruby does not have Array#rest, so I added that:

class Array
  def rest
    r = self[1..-1]
    r.nil? ? [] : r

Whew; now things are easier. Next I needed filterM, so I added it to Enumerable as a faithful copy from Haskell (expanding the do notation into the >>= notation):

module Enumerable
  def filterM(&p)
    return self.class.unit([]) if self.empty?
    p.call(self.first).bind do |flg|
      self.rest.filterM(&p).bind do |ys|
        self.class.unit(flg ? ([self.first] + ys) : ys)

Now my original Array#powerset method works:

irb(main):013:0> [1,2,3].powerset
=> [[1, 2, 3], [1, 2], [1, 3], [1], [2, 3], [2], [3], []]

I expect you to write that quickly in your next interview.


Correctly Handle OpenID Updates

OpenID is pretty well established as a login infrastructure, but a topic often ignored is correctly letting the user change his saved OpenID. Don’t just save the text they enter! Verify that they’ve entered their valid OpenID.

The user model needs to pass this test:

should_have_db_column :new_openid_identity, :type => 'string'

The view on /account/edit needs to pass this test:

context "logged in" do
  setup { session[:user_id] = Factory(:user).id }

  context "GET to edit" do
    setup { get :edit }

    should "have a form for their OpenID info" do
      assert_select 'form[action=?][method=post]', openid_path do
        assert_select 'input[type=hidden][name=_method][value=put]'
        assert_select 'input[type=text][name=?]',
        assert_select 'input[type=submit]'

And, most importantly, the OpenidsController needs to pass the test suite in gist 23084.

(All these tests assume FactoryGirl and Shoulda; re-write in RSpec or PyUnit or whatever as you see fit.)

Save Those Receipts!

Over the weekend I was on a team of (potential) winners participating in the 2008 Rails Rumble. The application idea was something that I had prototyped for myself years ago and quickly lost, and it solves the problem I had at the time:

I was living near Fenway, and I suffered from too many choices of where to purchase milk (and other groceries). The best solution, to me, was to track the price of milk at various shops so I could have an accurate measurement of which was cheapest, or if there were any correlations I could use to predict the price.

That’s it.


So Chad, Joe, Micah, and myself met the day before it started and mapped out a gameplan for Where’s The Milk At. We watched Blur’s “Coffee & TV” for visual inspiration, drew mocks on paper, and discussed just how awesome we’re gonna rock.

Chad, Micah, and myself set to work at 8PM EDT on Friday (midnight GMT), lasting until 2:30AM. Fueled by pizza, we knocked out most of the server setup, sign up and sign up (both password and OpenID), page layout and design, and the logo.

The next day Joe joined in to hack from 9AM to 9PM (Chad stayed until midnight), fixing some bugs and propelling forward with some crazy JavaScript. More pizza was ordered, and Joe’s girlfriend stopped by with some delicious, large cookies.

I felt unrested and failed to pull my own weight on Saturday, but on Sunday we all worked full-steam toward the 8PM EDT deadline. We found browser incompatibilities with two hours to go, and two bugs in our code with an hour to go; we were down to the wire, but we got everything in with 30 minutes to spare. (Joe used those 30 minutes to improve the JavaScript.)

All of this was caught on video.

Inspired by this weekend of hacking for 32.5 hours, I’ve started work on another idea I suggested during the brainstorming (word blogging), with the intent of doing a 12-hour code sprint on Saturday with anyone who is willing to join. Find me on IRC to join in.

Uniquify an array of hashes in Ruby

Array#uniq is a handy Ruby method that produces an array of distinct elements. So, for example, [1,2,3,3,3].uniq produces [1,2,3].

However, this fails on an array of hashes. For example, [{}, {}].uniq actually produces [{},{}]. That’s not what I wanted, recently.

It turns out that #uniq uses the methods #hash and #eql? to determine uniqueness, and Hash#hash produces the object_id, and {}.eql?({}) produces false.

To solve this, I redefined #hash and #eql? for the hash instances before I put them in the array:

h = {}
class <#dup on this hash will then screw things up, but that’s common to all Ruby eigenclass fun.

However, you could just use a proper class instead of a hash, ’cause this isn’t Perl.

Spider bugfix

There were two issues with version 0.4.0 of Spider, both caught by Henri Cook. These are now fixed in 0.4.1:

  • As documented, you use IncludedInMemcached like this: require 'spider/included_in_memcached' .
  • Sometimes HTTP redirects assume a base URL; this is now handled.

Spider with memcached

The problem with Spider has been that it can use all your memory. The reason is that the Web is a graph, and to avoid cycles Spider stores each URL it encounters. Since the Web is a really, really, really gigantic graph, you eventually run out of memory.

Now you can use memcached to use not only all the memory on one computer, but all the memory on many computers!

require 'spider'
require 'spider/included_in_memcached'
SERVERS = ['','','']
Spider.start_at('http://mike-burns.com/') do |s|
  s.check_already_seen_with IncludedInMemcached.new(SERVERS)

Also new in this version is a tutorial on the main Spider Web site. Yeah!

Proxied Spider

Aha: if you need to proxy your Spider calls, look no further than the HTTP Configuration gem.

I didn’t write this, and have yet to use it, but I think it goes like this:

http_conf = Net::HTTP::Configuration.new(:proxy_host => 'localhost', :proxy_port => 8881)
http_conf.apply do

So next up will be a tutorial with stuff like this and other cool stuff, plus a way to use memcached with Spider.

Spider: API changes, setup and teardown, HTTP headers

The newest version of Spider, 0.3.0, is hitting your gem tree Real Soon Now. This release features:

Set the headers to a HTTP request.
This can be used to set the cookies, user agent, and many other fine things.
setup and teardown handlers.
Seems like a good place to set the headers if the headers are conditional on the URL.
Say :every, not :any.
Makes more sense this way, I claim.
All the handlers take the same three arguments.
The URL, the response, and—new—the calling URL.

Next on my list: proxies, a better way to store whether an URL has been seen, then a tutorial.

Get it the usual way: gem install spider

Spider bug fix release

John Nagro immediately reported errors with the Spider Ruby gem, so I’ve fixed them in 0.2.1. You should upgrade, especially if you want support for:

John also had some good ideas, so here is what is in the works:

  • The ability to construct a complete graph of every node found.
  • Defeat cycles using memcached instead of memory (closely related to the above bullet).
  • A Net::HTTP abstraction that makes it easier to e.g. use a proxy or replace HTTP.

Coder for hire

Ah, the thrill of working for a Web startup: one day you think you might get an obscene raise; the next you discover that the company is out of business.

And yet, I’m addicted to them.

If you or someone you know needs a programmer for their Web startup (or any other job, really), I would be delighted to fill that position.

Unsure? Check out why you should not hire Mike Burns, my résumé, and the rest of this blog.

Edit: This is in and around Boston, MA.