A complete brief notes on setting up 3 nodes HBase cluster.
2014-09-17 by , tagged as
Here is an experience on setting up a 3-nodes-hbase-cluster(1 name-node and 2 data-node) on AWS EC2.
Launch and setup the master(name) node.
- Start an instance on EC2
- set the hostname:
echo 'master' >> /etc/hostname
- setup hosts:
echo 'IP_ADDRESS master' >> /etc/hosts
- add a user 'hadoop' and setup ssh login without password
- create data directory:
mkdir /mnt/hadoop/
- create zookeeper data directory:
mkdir -p /mnt/hadoop/zookeeper/data
- set the directories permissions:
chown -R hadoop:hadoop /mnt/hadoop
- set the hostname:
- Install JAVA and the HBase components
- get hadoop from: http://archive.apache.org/dist/hadoop/common/
- extract to: /home/hadoop/hadoop
- get hbase from: http://archive.apache.org/dist/hbase/
- /home/hadoop/hbase
- /home/hadoop/hbase
- get zookeeper from: http://archive.apache.org/dist/hadoop/zookeeper/
- /home/hadoop/zookeeper
- /home/hadoop/zookeeper
- install java ( get it from oracle)
- get hadoop from: http://archive.apache.org/dist/hadoop/common/
- set the shell environments, put all the following shell commands to /etc/profile or the profile of hadoop user's login shell. Also save them to a standalone shell script: /home/hadoop/hadoop_cluster_env.sh
- export PATH=/usr/local/mvn/bin:$PATH
- export JAVA_HOME=/usr/lib/jvm/jdk
- export ZOOKEEPER_HOME=/home/hadoop/zookeeper
- export ZOOCFGDIR=/home/hadoop/zookeeper/conf
- export HADOOP_HOME=/home/hadoop/hadoop
- export HBASE_HOME=/home/hadoop/hbase
- export PATH=$JAVA_HOME/bin:$ZOOKEEPER_HOME/bin:$PATH
- export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
- export HBASE_MANAGES_ZK=false
- export PATH=/usr/local/mvn/bin:$PATH
Hadoop configure
add or change the following in hadoop/conf/core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/mnt/hadoop/tmp</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
add this line to hadoop/conf/hadoop-env.sh
source /home/hadoop/hadoop_cluster_env.sh
add or change the following in hadoop/conf/hdfs-site.xml
<property>
<name>dfs.name.dir</name>
<value>/mnt/hadoop/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/mnt/hadoop/data/</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
add or change the following in hadoop/conf/mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>hdfs://master:9001/</value>
</property>
add or change the following in hadoop/conf/slaves
slave1-ip
slave2-ip
add or change the following in hadoop/conf/masters
master-ip
zookeeper configure
copy zoo_sample.cfg to zoo.cfg and add:
dataDir=/mnt/hadoop/zookeeper/data
clientPort=2181
server.1=master:2888:3888
server.2=slave1:2888:3888
server.3=slave2:2888:3888
HBase configure
add this line to hbase/conf/hbase-env.sh
source /home/hadoop/hadoop_cluster_env.sh
add or change the following in hbase/conf/hbase-site.xml
<property>
<name>hbase.rootdir</name>
<value>hdfs://master:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.master</name>
<value>master</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>master,slave1,slave2</value>
</property>
<property>
<name>zookeeper.session.timeout</name>
<value>60000000</value>
</property>
<property>
<name>hbase.zookeeper.property.clientport</name>
<value>2181</value>
</property>
add or change the following in hbase/conf/regionservers
slave1
slave2
Launch and setup the slave(data) node.
start 2 instances on EC2
- add a user 'hadoop'
- setup ssh login without password
- install JAVA (not hadoop/hbase/zookeeper)
on slave1
echo slave1 > /etc/hostname
echo 'IP slave1' >> /etc/hosts
echo 'IP slave2' >> /etc/hosts
echo 'IP master' >> /etc/hosts
mkdir /mnt/hadoop
mkdir -p /mnt/hadoop/zookeeper/data
chown -R hadoop:hadoop /mnt/hadoop
on slave2
echo slave2 > /etc/hostname
echo 'IP slave1' >> /etc/hosts
echo 'IP slave2' >> /etc/hosts
echo 'IP master' >> /etc/hosts
mkdir /mnt/hadoop
mkdir -p /mnt/hadoop/zookeeper/data
chown -R hadoop:hadoop /mnt/hadoop
Copy hadoop/hbase/zookeeper from master
on master
copy components to slave1 and slave2:
ccd ~/
scp -r hbase slave1:
scp -r hbase slave2:
scp -r hadoop slave1:
scp -r hadoop slave2:
scp -r zookeeper slave1:
scp -r zookeeper slave2:
set myid of zookeeper (see zoo.cfg) (the data dir path in /mnt/)
echo 1 > /mnt/hadoop/zookeeper/data/myid
ssh slave1 echo 2 > /mnt/hadoop/zookeeper/data/myid
ssh slave1 echo 3 > /mnt/hadoop/zookeeper/data/myid
Start the components
start hadoop on master
cd /home/hadoop/hadoop/bin
./hadoop namenode -format
./start-all.sh
start zookeeper on master and slaves
cd /home/hadoop/zookeeper/bin
./zkServer.sh start
start hbase on master
cd /home/hadoop/hbase/bin
./start-hbase.sh
Check the running status
for h in master slave1 slave2 ; do
ssh $h 'hostname; jps'
done
All done!