data analytics with pig latin #10

3 machines, 3 units

Data Analytics with Pig Latin

Big Data Analytic solution is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs.
Pig's infrastructure layer consists of a compiler that produces sequences of Map-Reduce programs, for which large-scale parallel implementations of HDP hadoop 2.4.1 is used.

This deploys a Hortonworks HDP 2.1 hadoop master node, which includes yarn resourcemanager and hdfs namenode servers, and compute nodes as cluster. By default this bundle uses two units, one for the master, and one for the compute.

Usage

from bundle home:
juju quickstart bundles.yaml

Scale Out Usage

In order to increase the amount of slaves, you must add units, to add one unit:

juju add-unit compute-node

Or you can add multiple units at once:

juju add-unit -n4 compute-node

Smoke Test after deployment

1) juju ssh hdp-pig/0
2) sudo su $HDFS_USER
3) hadoop version             <= verifies if hadoop client is installed 
4) hdfs dfsadmin -report      <= verifies if Pig client has been connected to the
                                 remote HDFS server
5) yarn rmadmin -getGroups    <= verifies if Pig client has been connected to the
                                 remote ResourceManager server
Run a Pig Script Test:
1) hdfs dfs -mkdir -p /user/hduser 
2) hdfs dfs -copyFromLocal /etc/passwd /user/hduser/passwd
3) vim /tmp/id.pig
4) add following Pig script commands, save and exit:
   A = load '/user/hduser/passwd' using PigStorage(':');
   B = foreach A generate \$0 as id; store B into '/tmp/id.out';
5) pig -l /tmp/pig.log /tmp/id.pig
6) hadoop fs -cat /tmp/id.out/part-m-00000   <= check the result on the hadoop cluster

Following the Development of this Charm:

By default this bundle will deploy the stable version of the hadoop charm, but if you want to follow development you can:

    juju-quickstart bundle:hadoop/yarn-hdfs-cluster

Bundle configuration