HADOOP CLUSTER THROUGH ANSIBLE

Rishabh Jain
5 min readJan 12, 2021

--

Hola! Engineers here comes a new blog on setting up a whole Hadoop Cluster in the system via very mature Automation tool named as Ansible. Its a very famous tool can be used for Configuration Management, Provisioning , Automation, Integration and for many other purposes and also on almost other technologies which are defined in today’s world.

If someone is very new to Ansible one can go through this blog mentioned below.

So, let us commence to create the setup.

NOTE : I will be providing the GitHub repository link after the blog completes. You can have code or can take reference from there.

For setting up Ansible and as an exposure one can take the reference through the blog given below.

HADOOP MASTER NODE

We will be first setting up the Hadoop Master Node via Ansible for which we are going to create the playbook as shown below.

I will be explaining the playbook as small parts so that one can understand the code.

1. LOCALHOST

The above syntax is used to tell Ansible where to perform the configuration, Here localhost means the operation to performed in the local system.

2. PIP

The above syntax uses tasks which means the operations to be performed by Ansible

pip is a module which will install python module via Ansible.

name is used for the name of Python module which is to be installed.

state is used to tell Ansible what to do with the Python Module whether to install or uninstall it. Here gdown is used which will download the software from the Google Drive.

3. COMMAND

In the above syntax command module is to run those command which we have to run manually to fulfil the requirement. Here in this case the gdown is used to download the softwares and rpm to install the software.

— — id is an argument which takes the link to download the software fro Google Drive.

— — output is an argument used to rename the downloaded file and nam eit as written in output argument.

rpm is a package manager which is used to install the softwares.

Here in this case we have downloaded and installed hadoop and jdk software in the system.

4. FILE

In this syntax the file module of ansible is used which is either used to create the directory, file or to delete the file or folder.

state is used to tell ansible whether to create or delete the directory or file.

path is used to tell the path of file or folder to be created or to be deleted in the system.

Here a /nn directory is created and two files core-site and hdfs-site.xml file are deleted.

5. COPY

In this syntax the copy module is used which enables us to copy the content from the source to the destination told in the arguments as shown above.

src is used to in the module as it takes the source location or path of the file to be copied.

dest is used to in the module as it takes the destination location or path of the file to be copied.

Here is a code for core-site.xml

Here is a code for hdfs-site.xml

6. HADOOP START

In this syntax again command module is used and we have just formatted and started the Hadoop Name node.

Now the playbook is setup and is ready to be executed. We can run the playbook by the command shown below.

ansible-playbook <filename>.yml

You can check the hadoop master node is running by the command jps as shown below.

Now Hadoop Master node is set and up. Let us move to the Hadoop slave node.

HADOOP SLAVE NODE

The slave node includes the same procedure as done in the master node.

The changes are

1. Host

The host is now the remote IPs which will be the slave nodes in the Hadoop Cluster.

2. Directory Name

The name is changed to dn (one can give any name).

3. Core-site.xml

4. Hdfs-site.xml

5. Hadoop Start

We do not format the data node. we just start by the following command.

Else all the things are same. One can proceed and execute the file in the same way as done with Hadoop master node and also shown below.

The playbook is now executed successfully.

One can check with the jps command as shown below.

Now one can also check with the command hadoop dfsadmin -report as shown below.

The Hadoop cluster is now setup and is ready to use.

Here is a GitHub Repository Link :

Hope you learned and enjoyed creating the setup.

--

--

Rishabh Jain
Rishabh Jain

Written by Rishabh Jain

I am a tech enthusiast, researcher and an integration seeker. I love to explore and learn about the right technology and right concepts from its foundation.

No responses yet