Gentle Introduction to HBase II – Amazon Elastic Map Reduce (EMR) HBase Cluster

26 Nov

Below is a WYSIWYG tutorial on how to set up a HBase cluster (and Thrift) on Amazon’s Elastic Map Reduce.

1. Click on “Create Job Flow” in Amazon’s Elastic Map Reduce screen.

2. Choose HBase in the type of job flow.

3. Choose “No” for restore from backup, and backup schedule.

4. Choose the number of master/data nodes that you want. You can choose “spot” instances if you would like (if you are using this cluster only for testing.)

5. Make sure you choose a key-pair in the “Amazon EC2 Key Pair” option – you will need this to start Thrift on the HBase cluster.

6. You do not need any bootstrap actions

7. Review, and click “Create Job Flow”.

8. Go to EC2, and click on “Security Groups”, and then add port 9000 and the necessary IP (or group) to the list of nodes allowed to access port 9000 (for Thrift).

9. Go to EC2, choose your HBase master node, right click and choose “Connect”. Make sure to replace the default “root” user with “hadoop”.

10. Connect to the master node through SSH, and run this command: “hbase-daemon.sh start thrift”. That will start Thrift on the master node.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: