Apache Analytics with SQL

This bundle is an 8 node cluster designed to scale out. Built around Apache
Hadoop components and MySQL, it contains the following units:

  • 1 HDFS Master
  • 1 HDFS Secondary Namenode
  • 1 YARN Master
  • 3 Compute Slaves
  • 1 Hive
  • 1 Plugin (colocated on the Hive unit)
  • 1 MySQL

Usage

Deploy this bundle using juju-quickstart:

juju quickstart apache-analytics-sql

See juju quickstart --help for deployment options, including machine
constraints and how to deploy a locally modified version of the
apache-analytics-sql bundle.yaml.

Testing the deployment

Smoke test HDFS admin functionality

Once the deployment is complete and the cluster is running, ssh to the HDFS
Master unit:

juju ssh hdfs-master/0

As the ubuntu user, create a temporary directory on the Hadoop file system.
The steps below verify HDFS functionality:

hdfs dfs -mkdir -p /tmp/hdfs-test
hdfs dfs -chmod -R 777 /tmp/hdfs-test
hdfs dfs -ls /tmp # verify the newly created hdfs-test subdirectory exists
hdfs dfs -rm -R /tmp/hdfs-test
hdfs dfs -ls /tmp # verify the hdfs-test subdirectory has been removed

Exit from the HDFS Master unit:

exit

Smoke test YARN and MapReduce

Run the terasort.sh script from the Hive unit to generate and sort data. The
steps below verify that Hive is communicating with the cluster via the plugin
and that YARN and MapReduce are working as expected:

juju ssh hive/0
~/terasort.sh
exit

Smoke test HDFS functionality from user space

From the Hive unit, delete the MapReduce output previously generated by the
terasort.sh script:

juju ssh hive/0
hdfs dfs -rm -R /user/ubuntu/tera_demo_out
exit

Hive + HDFS Usage:

From the Hive unit, start the Hive console as the hive user:

juju ssh hive/0
sudo su hive -c hive

From the Hive console, create a table:

show databases;
create table test(col1 int, col2 string);
show tables;
exit;

As the ubuntu user, verify connection to the HDFS cluster, and verify a
test subdirectory has been created on the remote HDFS cluster:

hdfs dfsadmin -report
hdfs dfs -ls /user/hive/warehouse

Exit from the Hive unit:

exit

HiveServer2 + HDFS Usage:

From the Hive unit, start the Beeline console as the hive user:

juju ssh hive/0
sudo su hive -c beeline

From the Beeline console, connect to HiveServer2 and verify sample commands
execute successfully:

!connect jdbc:hive2://localhost:10000 hive password org.apache.hive.jdbc.HiveDriver
show databases;
create table test2(a int, b string);
show tables;
!quit

As the ubuntu user, verify connection to the HDFS cluster, and verify a
test2 subdirectory has been created on the remote HDFS cluster:

hdfs dfsadmin -report
hdfs dfs -ls /user/hive/warehouse

Exit from the Hive unit:

exit

Scale Out Usage

This bundle was designed to scale out. To increase the amount of Compute
Slaves, you can add units to the compute-slave service. To add one unit:

juju add-unit compute-slave

Or you can add multiple units at once:

juju add-unit -n4 compute-slave

Contact Information

Help