Adventures in Data

Hadoop Cookbook – 4, How to run multiple hadoop data nodes on one machine.

Although Hadoop is designed and developed for distributed computing it can be  run on  a single node in pseudo distributed mode  and with multiple data node on single machine . Developers often run multiple data nodes on single node to develop and test distributed features,data node behavior, Name node interaction with data node and for other reasons.

If you want to feel Hadoop’s  distributed data node – name node working and you have only one machine then you can run multiple data nodes on single machine. You can see how Name node stores it’s metadata , fsimage,edits , fstime  and how data node stores data blocks on local file system.

Steps

To start multiple data nodes on a single node first download / build hadoop binary.

  1. Download hadoop binary or build hadoop binary from hadoop source.
  2. Prepare hadoop configuration to run on single node (Change Hadoop default tmp dir location from /tmp to some other reliable location)
  3. Add following script to the $HADOOP_HOME/bin directory and chmod it to 744.
  4. Format HDFS – bin/hadoop namenode -format (for Hadoop 0.20 and below), bin/hdfs namenode -format (for version > 0.21)
  5. Start HDFS  bin/start-dfs.sh (This will start Namenode and 1 data node ) which can be viewed on http://localhost:50070
  6. Start additional data nodes using bin/run-additionalDN.sh

run-additionalDN.sh


#!/bin/sh
# This is used for starting multiple datanodes on the same machine.
# run it from hadoop-dir/ just like 'bin/hadoop' 

#Usage: run-additionalDN.sh [start|stop] dnnumber
#e.g. run-datanode.sh start 2

DN_DIR_PREFIX="/path/to/store/data_and_log_of_additionalDN/"

if [ -z $DN_DIR_PREFIX ]; then
echo $0: DN_DIR_PREFIX is not set. set it to something like "/hadoopTmp/dn"
exit 1
fi

run_datanode () {
DN=$2
export HADOOP_LOG_DIR=$DN_DIR_PREFIX$DN/logs
export HADOOP_PID_DIR=$HADOOP_LOG_DIR
DN_CONF_OPTS="\
-Dhadoop.tmp.dir=$DN_DIR_PREFIX$DN\
-Ddfs.datanode.address=0.0.0.0:5001$DN \
-Ddfs.datanode.http.address=0.0.0.0:5008$DN \
-Ddfs.datanode.ipc.address=0.0.0.0:5002$DN"
bin/hadoop-daemon.sh --script bin/hdfs $1 datanode $DN_CONF_OPTS
}

cmd=$1
shift;

for i in $*
do
run_datanode  $cmd $i
done

Use jps or Namenode Web UI to verify if additional data nodes are started.

I started total 3 data nodes ( 2 additional data nodes) on my single node machine which are running on ports 50010,50011 and 50012 as shown in screen shot below.



About these ads

Written by Ravi

May 27, 2010 at 1:21 am

Posted in Hadoop cookbook, HDFS

Follow

Get every new post delivered to your Inbox.