I have recently started playing with Amazon SimpleDB, part of the Amazon cloud computing offering, which is basically equivalent to Google Bigtable. Being a ruby enthusiast, I decided to use the right_aws gem from RightScale. The only problem, is that right_aws still does not support batch attribute insertion, which severely limit performances. Since I’m planning to use Amazon SDB for BayesFor, where we need performance, I implemented it. You can find the code in sdb_batchput.rb
Using it is extremely simple. By just including sdb_batchput.rb you’ll get a new method, batch_put_attributes on your SdbInterface. Here is an example:
require 'right_aws'
require 'sdb_batchput'
sdb = RightAws::SdbInterface.new(access_key, secret_key)
items = {}
25.times do |i|
attributes = {
'foo' => 'bar',
'baz' => 'bat'
}
items["item#{i}"] = attributes
end
sdb.batch_put_attributes("Test", items)
Grab sdb_batchput.rb from here
This file contains a bit longer example (embedded into the BayesFor infrastructure, unfortunately, but still readable) to benchmark the new method.
A simple benchmark, to insert 1000 items with 2 attributes each, shows an 18x improvement:
PutAttributes:
Total save time: 288s. Per item: 0.288s
BatchPutAttributes:
Total save time: 16.29s. Per item: 0.016s
Alternatively, if you don’t want to use my patch for right_aws, the ruby-aws project fork implemented the same feature (with a slightly different signature) just a few days ago
You are here:
Tags