Apache Analytics with SQL
This bundle is an 8 node cluster designed to scale out. Built around Apache
Hadoop components and MySQL, it contains the following units:
- 1 HDFS Master
- 1 HDFS Secondary Namenode
- 1 YARN Master
- 3 Compute Slaves
- 1 Hive
- 1 Plugin (colocated on the Hive unit)
- 1 MySQL
Usage
Deploy this bundle using juju-quickstart:
juju quickstart apache-analytics-sql
See juju quickstart --help for deployment options, including machine
constraints and how to deploy a locally modified version of the
apache-analytics-sql bundle.yaml.
Testing the deployment
Smoke test HDFS admin functionality
Once the deployment is complete and the cluster is running, ssh to the HDFS
Master unit:
juju ssh hdfs-master/0
As the ubuntu user, create a temporary directory on the Hadoop file system.
The steps below verify HDFS functionality:
hdfs dfs -mkdir -p /tmp/hdfs-test
hdfs dfs -chmod -R 777 /tmp/hdfs-test
hdfs dfs -ls /tmp # verify the newly created hdfs-test subdirectory exists
hdfs dfs -rm -R /tmp/hdfs-test
hdfs dfs -ls /tmp # verify the hdfs-test subdirectory has been removed
Exit from the HDFS Master unit:
exit
Smoke test YARN and MapReduce
Run the terasort.sh script from the Hive unit to generate and sort data. The
steps below verify that Hive is communicating with the cluster via the plugin
and that YARN and MapReduce are working as expected:
juju ssh hive/0
~/terasort.sh
exit
Smoke test HDFS functionality from user space
From the Hive unit, delete the MapReduce output previously generated by the
terasort.sh script:
juju ssh hive/0
hdfs dfs -rm -R /user/ubuntu/tera_demo_out
exit
Hive + HDFS Usage:
From the Hive unit, start the Hive console as the hive user:
juju ssh hive/0
sudo su hive -c hive
From the Hive console, create a table:
show databases;
create table test(col1 int, col2 string);
show tables;
exit;
As the ubuntu user, verify connection to the HDFS cluster, and verify a
test subdirectory has been created on the remote HDFS cluster:
hdfs dfsadmin -report
hdfs dfs -ls /user/hive/warehouse
Exit from the Hive unit:
exit
HiveServer2 + HDFS Usage:
From the Hive unit, start the Beeline console as the hive user:
juju ssh hive/0
sudo su hive -c beeline
From the Beeline console, connect to HiveServer2 and verify sample commands
execute successfully:
!connect jdbc:hive2://localhost:10000 hive password org.apache.hive.jdbc.HiveDriver
show databases;
create table test2(a int, b string);
show tables;
!quit
As the ubuntu user, verify connection to the HDFS cluster, and verify a
test2 subdirectory has been created on the remote HDFS cluster:
hdfs dfsadmin -report
hdfs dfs -ls /user/hive/warehouse
Exit from the Hive unit:
exit
Scale Out Usage
This bundle was designed to scale out. To increase the amount of Compute
Slaves, you can add units to the compute-slave service. To add one unit:
juju add-unit compute-slave
Or you can add multiple units at once:
juju add-unit -n4 compute-slave