Hadoop Inside Docker, the easiest way in 5 minutes

 Apr, 24 - 2017   no comments   BigDataClouderaHadoopMapReduce


 

Note: This article suppose the reader is already familiar with docker and how it is working, if not please refer to these articles Docker Simplified and Most Common Commands in Docker

 

In this article, I am going to describe the easiest ways to start with hadoop in a dockerazied environment:

There are 2 images that are pretty good and famous in spinning up new hadoop container, the easiest way as expected from Cloudera as it provides all hadoop features and its eco system in just one single box. let’s start with Cloudera quickstart box first:

First Approach (cloudera/quickstart)

Just run the following command that should spin up new container from image named “cloudera/quickstart” with exposing the most important ports (in the example only cloudera examples, hue interface, and cloudera manager are being exposed) to be accessible in the host machine.

docker run 
--hostname=quickstart.cloudera 
--privileged=true 
-t 
-i  
-p 8888:8888 
-p 7180:7180
-p 80:80 
-p 50070:50070 
-v $(pwd):/home/cloudera
-w /home/cloudera
cloudera/quickstart 
/usr/bin/docker-quickstart

Here are the most of the common ports in hadoop are listed below, if you need more you can just add it in this format -p hostPort:containerPort

- 8888      expose hue interface
- 7180      expose cloudera manager
- 80        expose cloudera examples
- 8983      expose port of Web UI solr search 
- 50070     expose name node web ui interface
- 50090     expose secondary name node
- 50075     expose data  node
- 50030     expose job tracker
- 50060     expose task trackers
- 60010     expose hbase master status
- 60030     expose hbase region server
- 9095      expose hbase thrift server
- 8020      expose hdfs port
- 8088      expose job tracker port
- 4040      expose port of spark
- 18088     expose history server web interface

It should give you a hash of container (its container id), just grab only the first 3 chars from it and use it as below to attach to the container or in other words to get access inside the container (same concept as if you are ssh or remote login to a machine), then start cloudera manager

# docker attach 4f0
# sudo su
# cd /home/cloudera/
# ./cloudera-manager

Now you can magically explore or browse home of cloudera manager as if it’s inside your localhost (http://localhost:7180), make sure to start all services:

Your hadoop node is now ready for map-reduce jobs, you can check my next article how to build and submit jobs to mapreduce from Intellij Idea.

Second Approach

It is pretty straight forward to use image named sequenceiq/hadoop-docker however it only contains hdfs and mapreduce with none of hadoop ecosystems being installed there (it’s only hadoop system, no other systems like hive, pig, flume, hbase … etc)

Open your terminal or power shell in windows and simply run the following command:

docker run -t -i sequenceiq/hadoop-docker /etc/bootstrap.sh -bash

you will get the direct access inside the container just type the following to make sure everything is fine:

hdfs dfs -ls /

 

Congrats!


Related articles