Recently we had a problem with s3cmd giving errors while moving or copying (large) files across buckets in S3. This had worked fine perfectly before, but our file sizes were increasing.
Below is a WYSIWYG tutorial on how to set up a HBase cluster (and Thrift) on Amazon’s Elastic Map Reduce.
In this post (hopefully the first of more), I hope to provide a gentle introduction to HBase (since I never had one myself!) This specific post is more about the specifics of HBase’s data structure, but I hope to do more posts introducing HBase programming using a combination of HBase on Amazon’s Elastic Map Reduce (I like to call it Amazon’s HAAS or HBase-as-a-Service) and Python using the HappyBase library (this offers a really easy interface to the HBase Thrift interface.)
Amazon’s Glacier is a fairly new AWS service for low-cost storage/backups. But there is little documentation on how to actually use it.
A graphical comparison of the various Amazon EC2 instances, based on CPU (cores), memory and cost. This includes the just announced hi1.4xlarge High I/O EC2 instances.
I wanted to be able to run geo-data calculations on Amazon Elastic Map Reduce using Hadoop streaming jobs – particularly in Python. While we cannot easily install required Python dependencies, this problem can be solved by using the cacheArchive feature of Hadoop.