MAY

30th

Experimenting and improving Ruby interface to Amazon SimpleDB

I have recently started playing with Amazon SimpleDB, part of the Amazon cloud computing offering, which is basically equivalent to Google Bigtable. Being a ruby enthusiast, I decided to use the right_aws gem from RightScale. The only problem, is that right_aws still does not support batch attribute insertion, which severely limit performances. Since I’m planning to use Amazon SDB for BayesFor, where we need performance, I implemented it. You can find the code in sdb_batchput.rb

Using it is extremely simple. By just including sdb_batchput.rb you’ll get a new method, batch_put_attributes on your SdbInterface. Here is an example:


require 'right_aws'
require 'sdb_batchput'

sdb = RightAws::SdbInterface.new(access_key, secret_key)
items = {}
25.times do |i|
  attributes = { 
    'foo' => 'bar',
    'baz' => 'bat'
  }
  items["item#{i}"] = attributes
end
sdb.batch_put_attributes("Test", items)

Grab sdb_batchput.rb from here

This file contains a bit longer example (embedded into the BayesFor infrastructure, unfortunately, but still readable) to benchmark the new method.

A simple benchmark, to insert 1000 items with 2 attributes each, shows an 18x improvement:


PutAttributes:
Total save time: 288s. Per item: 0.288s

BatchPutAttributes:
Total save time: 16.29s. Per item: 0.016s

Alternatively, if you don’t want to use my patch for right_aws, the ruby-aws project fork implemented the same feature (with a slightly different signature) just a few days ago

Riccardo Govoni, last modified on Oct 22, 2011 - 13:57


Tags

This page is tagged as: amazon simpledb cloud ruby right_aws tips and tricks

Hi there

The name is Riccardo Govoni. I’m a Software Engineer working in Google London. I have a passion for data visualizations, data mining, theoretical physics, xkcd, having lots of vi (or emacs, depending on the mood) buffers on screen and coding in general. Learn more about me.

Additional Resources

Have a look at the site blog, the Site map or a selection of some of the projects I work on (mostly in my spare time, likely to be out of date).

Featured Projects

Rhizosphere, is an innovative project to provide in-browser html-based spatial navigation and interaction on structured data.

Borg, brand new v3.2 is out!