Description

Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use!

Hortonworks Storm Overview

Hortonworks (HDP 2.1.3) Apache Storm is a free and open source distributed
real-time computation system. Storm makes it easy to reliably process unbounded
streams of data, doing for real-time processing what Hadoop did for batch processing
Storm has many use cases: real-time analytics, on-line machine learning, continuous
computation, distributed RPC, ETL, and more. Storm is fast: a benchmark clocked
it at over a million tuples processed per second per node. It is scalable,
fault-tolerant, guarantees your data will be processed, and is easy to set up
and operate.
This charm will build a storm cluster consistent of:
1. Nimbus master node with following daemons will configured and loaded

storm-drpc
storm-logviewer
storm-nimbus
storm-ui

  1. Storm worker node(s) with following daemons will configured and loaded:

storm-logviewer
storm-supervisor

Deployment

start a 3 node Hortonworks zookeeper quorum:

juju deploy hdp-zookeeper hdp-zookeeper
juju add-unit -n 2 hdp-zookeeper

NOTE: Zookeeper must be loaded and active, to verify:

 $echo ruok | nc {hdp-zookeeper/0 IP address} 2181

imok # I'm ok must be the reply

$ echo stat | nc {hdp-zookeeper/0 IP address} 2181

Node count: 4 # check for node count

start Apache Storm:

juju deploy hdp-storm nimbus-server     
juju deploy hdp-storm storm-worker
juju add-relation nimbus-server:zookeeper hdp-zookeeper:zookeeper
juju add-relation storm-worker:zookeeper hdp-zookeeper:zookeeper
juju add-relation nimbus-server:nimbus storm-worker:slave

To verify a successful deployment:

http://{nimbus-server ip address}:8080

Real-time usage

Example - Deploying and Managing Apache Storm Topologies:
Following steps demonstrates how to deploy a Storm WordCount application .
WordCount application has two parts- Spout randomly generates data
streams and Bolts processes generated stream.

 - $juju ssh nimbus-server/0
 - $storm jar /usr/lib/storm/contrib/storm-starter/storm-starter-0.9.1.2.1.3.0-563-jar-with-dependencies.jar  storm.starter.WordCountTopology WordCount

How to monitor deployment:

 - go to http://{nimbus-server ip address}:8080   
 - Under "Topology  summary", click on "WordCount"  
 - Monitor Spouts & Bolts tasks

Scale out usage

Example, adding 5 more worker nodes

 juju add-unit -n 5 storm-worker

To verify a successful scale:

 - http://{nimbus-server ip address}:8080
 - Under "Topology summary", click on "WordCount"
 - Click on "Spout" link in "Spouts (All time)" section
 - Note "Host" list under "Executors (All time)" section
 - Go back to "Topology summary"
 - Click on "Rebalance" in "Topology actions" section
 - Click on "Spout" link in "Spouts (All time)" section
 - Refresh, notice re-balancing of job as more storm-worker threads become available

Contact Information

Amir Sanjar amir.sanjar@canonical.com

Apache & Hortonworks Storm