kubernetes
elasticsearch
kafka
zookeeper
ceph
cassandra
percona-cluster
glance
mariadb
spark
Uses a Twitter source, memory channel, and Avro sink in Apache Flume to ingest Twitter data.
Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. It uses a simple extensible data model that allows for online analytic application. Learn more at flume.apache.org.
This charm provides a Flume agent designed to process tweets from the Twitter Streaming API and send them to the apache-flume-hdfs agent for storage in the shared filesystem (HDFS) of a connected Hadoop cluster. This leverages the TwitterSource jar packaged with Flume. Learn more about the 1% firehose.
apache-flume-hdfs
The Twitter Streaming API requires developer credentials. You'll need to configure those for this charm. Find your credentials (or create an account if needed) here.
Create a secret.yaml file with your Twitter developer credentials:
secret.yaml
flume-twitter: twitter_access_token: 'YOUR_TOKEN' twitter_access_token_secret: 'YOUR_TOKEN_SECRET' twitter_consumer_key: 'YOUR_CONSUMER_KEY' twitter_consumer_secret: 'YOUR_CONSUMER_SECRET'
This charm leverages our pluggable Hadoop model with the hadoop-plugin interface. This means that you will need to deploy a base Apache Hadoop cluster to run Flume. The suggested deployment method is to use the apache-ingestion-flume bundle. This will deploy the Apache Hadoop platform with a single Apache Flume unit that communicates with the cluster by relating to the apache-hadoop-plugin subordinate charm:
hadoop-plugin
apache-hadoop-plugin
juju quickstart u/bigdata-dev/apache-ingestion-flume
Alternatively, you may manually deploy the recommended environment as follows:
juju deploy apache-hadoop-hdfs-master hdfs-master juju deploy apache-hadoop-yarn-master yarn-master juju deploy apache-hadoop-compute-slave compute-slave juju deploy apache-hadoop-plugin plugin juju deploy apache-flume-hdfs flume-hdfs juju add-relation yarn-master hdfs-master juju add-relation compute-slave yarn-master juju add-relation compute-slave hdfs-master juju add-relation plugin yarn-master juju add-relation plugin hdfs-master juju add-relation flume-hdfs plugin
Now that the base environment has been deployed (either via quickstart or manually), you are ready to add the apache-flume-twitter charm and relate it to the flume-hdfs agent:
apache-flume-twitter
flume-hdfs
juju deploy apache-flume-twitter flume-twitter --config=secret.yaml juju add-relation flume-twitter flume-hdfs
That's it! Once the Flume agents start, tweets will start flowing into HDFS via the flume-twitter and flume-hdfs charms. Flume may include multiple events in each file written to HDFS. This is configurable with various options on the flume-hdfs charm. See descriptions of the roll_* options on the apache-flume-hdfs charm store page for more details.
flume-twitter
roll_*
Flume will write files to HDFS in the following location: /user/flume/<event_dir>/<yyyy-mm-dd>/FlumeData.<id>. The <event_dir> subdirectory is configurable and set to flume-twitter by default for this charm.
/user/flume/<event_dir>/<yyyy-mm-dd>/FlumeData.<id>
<event_dir>
To verify this charm is working as intended, SSH to the flume-hdfs unit and locate an event:
juju ssh flume-hdfs/0 hdfs dfs -ls /user/flume/<event_dir> # <-- find a date hdfs dfs -ls /user/flume/<event_dir>/<yyyy-mm-dd> # <-- find an event
Since our tweets are serialized in avro format, you'll need to copy the file locally and use the dfs -text command to view it:
avro
dfs -text
hdfs dfs -copyToLocal /user/flume/<event_dir>/<yyyy-mm-dd>/FlumeData.<id>.avro /home/ubuntu/myFile.txt hdfs dfs -text file:///home/ubuntu/myFile.txt
You may not recognize the body of the tweet if it's not in a language you understand (remember, this is a 1% firehose from tweets all over the world). You may have to try a few different events before you find a tweet worth reading. Happy hunting!
#juju
irc.freenode.net