Data Processing (Sahara)

We use OpenStack Liberty, Sahara (Data Processing) - Vanilla Apache Hadoop Plugin (2.7.1)

Creating cluster

Create a template for hadoop master node

We start creating a Node Group Template by going to Project | Data Processing | Node Group Templates and clicking +Create Template. Choose Vanilla Apache Hadoop Plugin name in version 2.7.1. and click Next.

OpenStack Data Processing Create Node Group Template

Fill the parameters in Configure Node Group Template as below

OpenStack Data Processing Create Node Group Template

In Node Processes select the services as below and click Create

OpenStack Data Processing Create Node Group Template

Create a template for hadoop worker node

Repeat the process for a worker node

OpenStack Data Processing Create Node Group Template

For the services, choose just datanode and nodemanager as you see below

OpenStack Data Processing Create Node Group Template

After this process, you should see a two templates created

OpenStack Data Processing Create Node Group Template

Cluster Templates

Go to Project | Cluster Templates and click +Create Template. Leave the default parameters and click Next

OpenStack Data Processing Create Cluster Template

OpenStack Data Processing Create Cluster Template

OpenStack Data Processing Create Cluster Template

After the process, you should see the created cluster template in the list

OpenStack Data Processing Create Cluster Template

Setup the security groups to enable all input traffic and then launch the cluster from the template. Please specify your keypair during the cluster creation

OpenStack Data Processing Create Cluster

You should see the cluster has been created

OpenStack Data Processing Cluster Created

Now we should be able to see the web interfaces of the services on the master server:

  • HDFS: http://<master_public_ip>:50070
  • Oozie: http://:11000`
  • MapReduce JobHistory: http://<master_public_ip>:19888
  • YARN: http://<master_public_ip>:8088

Data Sources - HDFS

Go to Project | Data Processing | Data Sources and click +Create Data Source. Enter the parameters for data-source-input-hdfs and click Create.

OpenStack Data Processing Data Source

Repeat similar procedure for creating data-source-output-hdfs

OpenStack Data Processing Data Source

After this step, you should see two data sources have been created.

OpenStack Data Processing Data Source

Data Sources - Swift

  • create container
  • container name: pig
  • container access: public

  • upload object

  • file: input

  • object name: input

Creating a Job

Go to Project | Data Processing | Job Binaries and click +Create Job Binary.

Enter the name example.pig and upload the file example.pig

OpenStack Data Processing Create Job Binary

After entering the parameters, click Create.

Go to Project | Data Processing | Job Templates and click +Create Job Template.

Enter the name job-template-pig, select example.pig as main binary and go to Libs.

OpenStack Data Processing Create Job Binary

Choose library udf.jar and click Create

  • launch on existing cluster (pig-job)
  • input: datasource-input
  • output: datasource-output
  • cluster: cluster-hadoop

(wait for job finish…)

Last modified: 2017-05-25