
Chapter 2. Setting up a Hadoop-YARN Cluster
YARN is a subproject of Apache Hadoop at the Apache Software Foundation, introduced in the Hadoop 2.0 version. YARN replaces the old MapReduce framework of the Hadoop 1.x version and is shipped with the Hadoop 2.x bundle. This chapter will provide a step-by-step guide for Hadoop-YARN users to install and configure YARN with Hadoop.
A Hadoop-YARN cluster can be configured as a single node as well as a multi-node cluster. This chapter covers both types of installations along with the troubleshooting guidelines. This chapter helps YARN beginners and the cluster administrators easily configure Hadoop-YARN clusters and understand how YARN components interact with each other.
Apache, Hortonworks, and Cloudera are the main distributors of Hadoop. These vendors have their own steps to install and configure Hadoop-YARN clusters. This chapter uses the Apache tar.gz
bundles for setting up Hadoop-YARN clusters and gives an overview of Hortonworks and Cloudera installations.
In this chapter, we will cover the following topics:
- The supported platforms, hardware and software requirements, and basic Linux commands
- How to prepare a node while setting up a cluster
- A single node installation
- Overview of Hadoop HDFS and YARN ResourceManager web-UI
- Testing your cluster
- Multi-node installation
- Overview of Hortonworks and Cloudera installations