s3cmd timeout problems moving large files on S3 (> 250MB)

19 Mar

Recently we had a problem with s3cmd giving errors while moving or copying (large) files across buckets in S3. This had worked fine perfectly before, but our file sizes were increasing.

Gentle Introduction to HBase II – Amazon Elastic Map Reduce (EMR) HBase Cluster

26 Nov

Below is a WYSIWYG tutorial on how to set up a HBase cluster (and Thrift) on Amazon’s Elastic Map Reduce.

Gentle Introduction to HBase Part I – Data Structure

13 Nov

In this post (hopefully the first of more), I hope to provide a gentle introduction to HBase (since I never had one myself!) This specific post is more about the specifics of HBase’s data structure, but I hope to do more posts introducing HBase programming using a combination of HBase on Amazon’s Elastic Map Reduce (I like to call it Amazon’s HAAS or HBase-as-a-Service) and Python using the HappyBase library (this offers a really easy interface to the HBase Thrift interface.)

Using Amazon/AWS Glacier with Python boto

16 Oct

Amazon’s Glacier is a fairly new AWS service for low-cost storage/backups. But there is little documentation on how to actually use it.

s3cmd IAM problem and solving it

19 Sep

I had a problem with the popular s3cmd application and IAM permissions, and was able to solve it.

Graphical comparison of Amazon AWS EC2 instances’ CPU (cores), memory, cost

19 Jul

A graphical comparison of the various Amazon EC2 instances, based on CPU (cores), memory and cost. This includes the just announced hi1.4xlarge High I/O EC2 instances.

GeoIP on Amazon Elastic Map Reduce (EMR) using Hadoop Streaming (Python)

23 Apr

I wanted to be able to run geo-data calculations on Amazon Elastic Map Reduce using Hadoop streaming jobs – particularly in Python. While we cannot easily install required Python dependencies, this problem can be solved by using the cacheArchive feature of Hadoop.

