Collect, aggregate, and move large amounts of data into HDFS.
Flume is a distributed, reliable, and available service for efficiently
collecting, aggregating, and moving large amounts of log data. It has a simple
and flexible architecture based on streaming data flows. It is robust and fault
tolerant with tunable reliability mechanisms and many failover and recovery
mechanisms. It uses a simple extensible data model that allows for online
analytic application. Learn more at flume.apache.org.
This charm provides a Flume agent designed to ingest events into the shared
filesystem (HDFS) of a connected Hadoop cluster. It is meant to relate to
other Flume agents such as apache-flume-syslog and apache-flume-twitter.
This charm requires Juju 2.0 or greater. If Juju is not yet set up, please
follow the getting-started instructions prior to deploying this charm.
This charm is intended to be deployed via one of the apache bigtop bundles.
juju deploy hadoop-processing
This will deploy an Apache Bigtop Hadoop cluster. More information about this
deployment can be found in the bundle readme.
Now add Flume-HDFS and relate it to the cluster via the hadoop-plugin:
juju deploy apache-flume-hdfs flume-hdfs
juju add-relation flume-hdfs plugin
The deployment at this stage isn't very exciting, as the flume-hdfs service
is waiting for other Flume agents to connect and send data. You'll probably
want to check out apache-flume-syslog or apache-flume-kafka
to provide additional functionality for this deployment.
When flume-hdfs receives data, it is stored in a /user/flume/<event_dir>
HDFS subdirectory (configured by the connected Flume charm). You can quickly
verify the data written to HDFS using the command line. SSH to the flume-hdfs
unit, locate an event, and cat it:
juju ssh flume-hdfs/0
hdfs dfs -ls /user/flume/<event_dir> # <-- find a date
hdfs dfs -ls /user/flume/<event_dir>/<yyyy-mm-dd> # <-- find an event
hdfs dfs -cat /user/flume/<event_dir>/<yyyy-mm-dd>/FlumeData.<id>
This process works well for data serialized in text format (the default).
For data serialized in avro format, you'll need to copy the file locally
and use the dfs -text command. For example, replace the dfs -cat command
from above with the following to view files stored in avro format:
hdfs dfs -copyToLocal /user/flume/<event_dir>/<yyyy-mm-dd>/FlumeData.<id> /home/ubuntu/myFile.txt
hdfs dfs -text file:///home/ubuntu/myFile.txt
Charms can be deployed in environments with limited network access. To deploy
in this environment, configure a Juju model with appropriate proxy and/or
mirror options. See Configuring Models for more information.