
The Hadoop-YARN single node installation
In a single node installation, all the Hadoop-YARN daemons (NameNode, ResourceManager, DataNode, and NodeManager) run on a single node as separate Java processes. You will need only one Linux machine with a minimum of 2 GB RAM and 15 GB free disk space.
Prerequisites
Before starting with the installation steps, make sure that you prepare the node as specified in the above topic.
- The hostname used in the single node installation is
localhost
with127.0.0.1
as the IP address. It is known as the loopback address for a machine. You need to make sure that the/etc/hosts
file contains the resolution for the loopback address. The loopback entry will look like this:127.0.0.1 localhost
- The passwordless SSH is configured for
localhost
. To ensure this, execute the following command:ssh-copy-id localhost
Installation steps
After preparing your node for Hadoop, you need to follow a simple five-step process to install and run Hadoop on your Linux machine.

The current version of Hadoop is 2.5.1 and the steps mentioned here will assume that you use the same version. Login to your system using a Hadoop dedicated user and download the Hadoop 2.x bundle tar.gz
file from the Apache archive:
wget https://archive.apache.org/dist/hadoop/core/hadoop-2.5.1/hadoop-2.5.1.tar.gz
You can use your home
directory for the Hadoop installation (/home/<username>
). If you want to use any of the system directories such as /opt
or /usr
for installation, you need to use the sudo
option with the commands. For simplicity, we'll install Hadoop in the home directory of the user. The commands in this chapter assume that the username is hduser
. You can replace hduser
with the actual username. Move your Hadoop bundle to the user's home
directory and extract the contents of the bundle file:
mv hadoop-2.5.1.tar.gz /home/hduser/ cd /home/hduser tar -xzvf hadoop-2.5.1.tar.gz
Configure the Hadoop environment variables in /home/hduser/.bashrc
(for Ubuntu) or /home/hduser/.bash_profile
(for CentOS). Hadoop requires the HADOOP_PREFIX
and home
directory environment variables to be set before starting Hadoop services. HADOOP_PREFIX
specifies the installation directory for Hadoop. We assume that you extracted the Hadoop bundle in the home
folder of hduser
.
Use the nano editor and append the following export commands to the end of the file:
export HADOOP_PREFIX="/home/hduser/hadoop-2.5.1/" export PATH=$PATH:$HADOOP_PREFIX/bin export PATH=$PATH:$HADOOP_PREFIX/sbin export HADOOP_COMMON_HOME=${HADOOP_PREFIX} export HADOOP_MAPRED_HOME=${HADOOP_PREFIX} export HADOOP_HDFS_HOME=${HADOOP_PREFIX} export YARN_HOME=${HADOOP_PREFIX}
After saving the file, you need to refresh the file using the source
command:
source ~/.bashrc
Next, you need to configure the Hadoop site configuration files. There are four configuration files that you need to update. You can find these files in the $HADOOP_PREFIX/etc/Hadoop
folder.
The core-site.xml
file contains information for the namenode
host and the RPC port used by NameNode. For a single node installation, the host for namenode
will be localhost
. The default RPC port for NameNode is 8020
. You need to edit the file and add a configuration property under the configuration tag:
<property> <name>fs.defaultFS</name> <value>hdfs://localhost:8020</value> <final>true</final> </property>
The hdfs-site.xml
file contains the configuration properties related to HDFS. In this file, you specify the replication factor and the directories for namenode
and datanode
to store their data. Edit the hdfs-site.xml
file and add the following properties under the configuration tag:
<property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:///home/hduser/hadoop-2.5.1/hadoop_data/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:///home/hduser/hadoop-2.5.1/hadoop_data/dfs/data</value> </property>
The mapred-site.xml
file contains information related to the MapReduce framework for the cluster. You will specify the framework to be configured as yarn
. The other possible values for the MapReduce framework property are local
and classic
. A detailed explanation of these values is given in the next chapter.
In the Hadoop configuration folder, you will find a template for the mapred-site.xml
file. Execute the following command to copy the template file to create the mapred-site.xml
file:
cp /home/hduser/hadoop2.5.1/etc/Hadoop/mapred-site.xml.template /home/hduser/hadoop2.5.1/etc/Hadoop/mapred-site.xml
Now edit the mapred-site.xml
file and add the following properties under the configuration tag:
<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
The yarn-site.xml
file contains the information related to the YARN daemons and YARN properties. You need to specify the host and port for the resourcemanager
daemon. Similar to the NameNode host, for a single node installation, the value for a ResourceManager host is localhost
. The default RPC port for ResourceManager is 8032
. You also need to specify the scheduler to be used by ResourceManager and auxiliary services for nodemanager
. We'll cover these properties in detail in the next chapter. Edit the yarn-site.xml
file and add the following properties under the configuration tag:
<property> <name>yarn.resourcemanager.address</name> <value>localhost:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>localhost:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>localhost:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>localhost:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>localhost:8088</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.scheduler.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value> </property>
The Hadoop daemons require Java settings to be set in the Hadoop environment files. You need to configure the value for JAVA_HOME
(the java installation directory) in the Hadoop and YARN environment files. Open the hadoop-env.sh
and yarn-env.sh
files, uncomment the export JAVA_HOME
command, and update the export command with the actual JAVA_HOME
value. To uncomment the export command, just remove the #
symbol from the line.
After configuring Hadoop files, you need to format the HDFS using the namenode
format command. Before executing the format command, make sure that the dfs.namenode.name.dir
directory specified in the hdfs-site.xml
file does not exist. This directory is created by the namenode
format command. Execute the following command to format NameNode:
hdfs namenode –format
After executing the preceding command, make sure that there's no exception on the console and that the namenode
directory is created.
Start the Hadoop services using Hadoop 2 Scripts in the /home/hduser/hadoop-2.5.1/sbin/
directory. For a single node installation, all the daemons will run on a single system. Use the following commands to start the services one by one.

Execute the jps
command and ensure that all Hadoop daemons are running. You can also verify the status of your cluster through the web interface for HDFS-NameNode and YARN-ResourceManager.

You need to replace <NameNodeHost>
and <ResourceManagerHost>
with localhost
for single node installation such as http://localhost:8088/
.