Uses a Syslog source, memory channel, and Avro sink in Apache Flume
to ingest log data.
Flume is a distributed, reliable, and available service for efficiently
collecting, aggregating, and moving large amounts of log data. It has a simple
and flexible architecture based on streaming data flows. It is robust and fault
tolerant with tunable reliability mechanisms and many failover and recovery
mechanisms. It uses a simple extensible data model that allows for online
analytic application. Learn more at flume.apache.org.
This charm provides a Flume agent designed to receive remote syslog events and
send them to the
apache-flume-hdfs agent for storage into the shared
filesystem (HDFS) of a connected Hadoop cluster. Think of this charm as a
rsyslog, sending syslog events to HDFS instead of writing
them to a local filesystem.
A working Juju installation is assumed to be present. If Juju is not yet set
up, please follow the getting-started instructions prior to deploying this
This charm is intended to be deployed via one of the apache bigtop bundles.
juju deploy hadoop-processing
Note: The above assumes Juju 2.0 or greater. If using an earlier version
of Juju, use juju-quickstart with the following syntax:
juju quickstart hadoop-processing.
This will deploy an Apache Bigtop Hadoop cluster. More information about this
deployment can be found in the bundle readme.
Now add Flume-HDFS and relate it to the cluster via the hadoop-plugin:
juju deploy apache-flume-hdfs flume-hdfs juju add-relation flume-hdfs plugin
Now that the base environment has been deployed, add the
charm and relate it to the
juju deploy apache-flume-syslog flume-syslog juju add-relation flume-syslog flume-hdfs
You are now ready to ingest remote syslog events! Note the deployment at this
stage isn't very useful. You'll need to relate this charm to any other service
that is configured to send data via the
Charms can be deployed in environments with limited network access. To deploy
in this environment, configure a Juju model with appropriate proxy and/or
mirror options. See Configuring Models for more information.
As an example use case, let's ingest our
namenode syslog events into HDFS.
rsyslog-forwarder-ha subordinate charm, relate it to
namenode, and then link the
juju deploy rsyslog-forwarder-ha juju add-relation rsyslog-forwarder-ha namenode juju add-relation rsyslog-forwarder-ha flume-syslog
Any syslog data generated on the
namenode unit will now be ingested into
HDFS via the
flume-hdfs charms. Flume may include multiple
syslog events in each file written to HDFS. This is configurable with various
options on the
flume-hdfs charm. See descriptions of the
roll_* options on
the apache-flume-hdfs charm store
page for more details.
Flume will write files to HDFS in the following location:
subdirectory is configurable and set to
flume-syslog by default for this
To verify this charm is working as intended, trigger a syslog event on the
monitored unit (
namenode in our deployment scenario):
juju ssh namenode/0 'echo flume-test'
Now SSH to the
flume-hdfs unit, locate an event, and cat it:
juju ssh flume-hdfs/0 hdfs dfs -ls /user/flume/<event_dir> # <-- find a date hdfs dfs -ls /user/flume/<event_dir>/<yyyy-mm-dd> # <-- find an event hdfs dfs -cat /user/flume/<event_dir>/<yyyy-mm-dd>/FlumeData.<id>
You should be able to find a timestamped message about SSH'ing into the
namenode unit that corresponds to the trigger you issued above. Note that
this workload isn't limited to ssh-related events. You'll get every syslog
event from the
namenode unit. Happy logging!
- (string) The maximum number of events the channel will take from a source or give to a sink per transaction.
- (string) The HDFS subdirectory under /user/flume where events will be stored.
- (string) Agent source type. Can be 'syslogudp' or 'syslogtcp'. If relating to the 'rsyslog-forwarder-ha' charm, this must be 'syslogudp'.
- (string) URL from which to fetch resources (e.g., Flume binaries) instead of S3
- (string) Port on which the agent source is listening. If relating to the 'rsyslog-forwarder-ha' charm, this must be '514'.
- (string) The maximum number of events stored in the channel.