Gih's Blog

只言片语

A complete brief notes on setting up 3 nodes HBase cluster.

2014-09-17 by gihnius, tagged as linux

Here is an experience on setting up a 3-nodes-hbase-cluster(1 name-node and 2 data-node)  on AWS EC2.

Launch and setup the master(name) node.

  • Start an instance on EC2 
    • set the hostname: echo 'master' >> /etc/hostname
    • setup hosts: echo 'IP_ADDRESS master' >> /etc/hosts
    • add a user 'hadoop' and setup ssh login without password
    • create data directory: mkdir /mnt/hadoop/ 
    • create zookeeper data directory: mkdir -p /mnt/hadoop/zookeeper/data
    • set the directories permissions: chown -R hadoop:hadoop /mnt/hadoop
  • Install JAVA and the HBase components
    • get hadoop from: http://archive.apache.org/dist/hadoop/common/
      • extract to: /home/hadoop/hadoop
    • get hbase from: http://archive.apache.org/dist/hbase/
      • /home/hadoop/hbase
    • get zookeeper from: http://archive.apache.org/dist/hadoop/zookeeper/
      • /home/hadoop/zookeeper
    • install java ( get it from oracle)
  • set the shell environments, put all the following shell commands to /etc/profile or the profile of hadoop user's login shell. Also save them to a standalone shell script: /home/hadoop/hadoop_cluster_env.sh 
    • export PATH=/usr/local/mvn/bin:$PATH
    • export JAVA_HOME=/usr/lib/jvm/jdk
    • export ZOOKEEPER_HOME=/home/hadoop/zookeeper
    • export ZOOCFGDIR=/home/hadoop/zookeeper/conf
    • export HADOOP_HOME=/home/hadoop/hadoop
    • export HBASE_HOME=/home/hadoop/hbase
    • export PATH=$JAVA_HOME/bin:$ZOOKEEPER_HOME/bin:$PATH
    • export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
    • export  HBASE_MANAGES_ZK=false

Hadoop configure

add or change the following in hadoop/conf/core-site.xml

  <property>
    <name>hadoop.tmp.dir</name>
    <value>/mnt/hadoop/tmp</value>
  </property>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://master:9000</value>
  </property>

add this line to hadoop/conf/hadoop-env.sh

source /home/hadoop/hadoop_cluster_env.sh

add or change the following in hadoop/conf/hdfs-site.xml

<property>
  <name>dfs.name.dir</name>
  <value>/mnt/hadoop/name</value>
</property>
<property>
  <name>dfs.data.dir</name>
  <value>/mnt/hadoop/data/</value>
</property>
<property>
  <name>dfs.replication</name>
  <value>2</value>
</property>

add or change the following in hadoop/conf/mapred-site.xml

<property>
   <name>mapred.job.tracker</name>
   <value>hdfs://master:9001/</value>
</property>

add or change the following in hadoop/conf/slaves

slave1-ip
slave2-ip

add or change the following in hadoop/conf/masters

master-ip

zookeeper configure

copy zoo_sample.cfg to zoo.cfg and add:

dataDir=/mnt/hadoop/zookeeper/data
clientPort=2181
server.1=master:2888:3888
server.2=slave1:2888:3888
server.3=slave2:2888:3888

HBase configure

add this line to hbase/conf/hbase-env.sh

source /home/hadoop/hadoop_cluster_env.sh

add or change the following in hbase/conf/hbase-site.xml

  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://master:9000/hbase</value>
  </property>
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>
  <property>
    <name>hbase.master</name>
    <value>master</value>
  </property>
  <property>
    <name>hbase.zookeeper.quorum</name>
    <value>master,slave1,slave2</value>
  </property>
  <property>
    <name>zookeeper.session.timeout</name>
    <value>60000000</value>
  </property>
  <property>
    <name>hbase.zookeeper.property.clientport</name>
    <value>2181</value>
  </property>

add or change the following in hbase/conf/regionservers

slave1
slave2

    Launch and setup the slave(data) node.

    start 2 instances on EC2

    • add a user 'hadoop'
    • setup ssh login without password
    • install JAVA (not hadoop/hbase/zookeeper)

    on slave1

    echo slave1 > /etc/hostname
    echo 'IP slave1' >> /etc/hosts
    echo 'IP slave2' >> /etc/hosts
    echo 'IP master' >> /etc/hosts
    mkdir /mnt/hadoop
    mkdir -p /mnt/hadoop/zookeeper/data
    chown -R hadoop:hadoop /mnt/hadoop
    

    on slave2

    echo slave2 > /etc/hostname
    echo 'IP slave1' >> /etc/hosts
    echo 'IP slave2' >> /etc/hosts
    echo 'IP master' >> /etc/hosts
    mkdir /mnt/hadoop
    mkdir -p /mnt/hadoop/zookeeper/data
    chown -R hadoop:hadoop /mnt/hadoop
    

    Copy hadoop/hbase/zookeeper from master

    on master

    copy components to slave1 and slave2:

    ccd ~/
    scp -r hbase slave1:
    scp -r hbase slave2:
    scp -r hadoop slave1:
    scp -r hadoop slave2:
    scp -r zookeeper slave1:
    scp -r zookeeper slave2:
    

    set myid of zookeeper (see zoo.cfg) (the data dir path in /mnt/)

    echo 1 > /mnt/hadoop/zookeeper/data/myid
    ssh slave1 echo 2 > /mnt/hadoop/zookeeper/data/myid
    ssh slave1 echo 3 > /mnt/hadoop/zookeeper/data/myid
    

    Start the components

    start hadoop on master

    cd /home/hadoop/hadoop/bin
    ./hadoop namenode -format
    ./start-all.sh
    

    start zookeeper on master and slaves

    cd /home/hadoop/zookeeper/bin
    ./zkServer.sh start
    

    start hbase on master

    cd /home/hadoop/hbase/bin
    ./start-hbase.sh
    

    Check the running status

    for h in master slave1 slave2 ; do
      ssh $h 'hostname; jps'
    done
    


    All done!