Getting Whirr running on EC2 with Cloudera’s script

12 Jan

I haven’t fully used Hadoop yet, but it looks like a pretty amazing tool for crunching large datasets. Combine Hadoop and Amazon EC2, and it should be possible to crunch large datasets with ephemeral EC2 instances fast. But I had problems getting Hadoop up and running on EC2…

I followed Cloudera’s instructions for setting up CDH3 scripts on the Amazon EC2 instances I was testing.

Everything went great. Until I got to the Whirr installation (which seems the easiest way to start up a number of nodes at once and have them auto-magically configured.)

Following these instructions gave me this error: “Non-Windows AMIs with a virtualization type of ‘hvm’ currently may only be used with Cluster Compute instance types.”

Luckily a search for this error message lead to these helpful links:

After some trial and error, I added these lines to my hadoop.properties which worked:

==========
whirr.hardware-id=m1.large
whirr.image-id=us-east-1/ami-da0cf8b3
whirr.location-id=us-east-1
==========

Once there, everything worked great. One small thing is that when the whirr script completes setting up the instances, it will say that the web-based interfaces are live, but you will have to edit the security groups in AWS to accept incoming traffic from any IP address:
50030 0.0.0.0/0
50070 0.0.0.0/0
in order to actually see the web-based interface.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: