We use OpenStack Liberty, Sahara (Data Processing) - Vanilla Apache Hadoop Plugin (2.7.1)
Create a template for hadoop master node
We start creating a Node Group Template by going to Project | Data Processing | Node Group Templates and clicking +Create Template. Choose
Vanilla Apache Hadoop Plugin name in version
2.7.1. and click Next.
Fill the parameters in Configure Node Group Template as below
In Node Processes select the services as below and click Create
Create a template for hadoop worker node
Repeat the process for a worker node
For the services, choose just datanode and nodemanager as you see below
After this process, you should see a two templates created
Go to Project | Cluster Templates and click +Create Template. Leave the default parameters and click Next
After the process, you should see the created cluster template in the list
Setup the security groups to enable all input traffic and then launch the cluster from the template. Please specify your keypair during the cluster creation
You should see the cluster has been created
Now we should be able to see the web interfaces of the services on the master server:
- Oozie: http://:11000`
- MapReduce JobHistory:
Data Sources - HDFS
Go to Project | Data Processing | Data Sources and click +Create Data Source. Enter the parameters for
data-source-input-hdfs and click Create.
Repeat similar procedure for creating
After this step, you should see two data sources have been created.
Data Sources - Swift
- create container
- container name: pig
container access: public
- file: input
- object name: input
Creating a Job
Go to Project | Data Processing | Job Binaries and click +Create Job Binary.
Enter the name
example.pig and upload the file example.pig
After entering the parameters, click Create.
Go to Project | Data Processing | Job Templates and click +Create Job Template.
Enter the name
example.pig as main binary and go to
udf.jar and click Create
- launch on existing cluster (pig-job)
- input: datasource-input
- output: datasource-output
- cluster: cluster-hadoop
(wait for job finish...)
- Oozie: http://PUBLIC_IP:11000
- object store: output (directory)
Last modified: Nov. 7, 2017