Uses a Kafka source, memory channel, and Avro sink in Apache Flume
to ingest messages published to a Kafka topic.
Flume is a distributed, reliable, and available service for efficiently
collecting, aggregating, and moving large amounts of log data. It has a simple
and flexible architecture based on streaming data flows. It is robust and fault
tolerant with tunable reliability mechanisms and many fail over and recovery
mechanisms. It uses a simple extensible data model that allows for online
analytic application. Learn more at flume.apache.org.
This charm provides a Flume agent designed to ingest messages published to
a Kafka topic and send them to the
apache-flume-hdfs agent for storage in
the shared filesystem (HDFS) of a connected Hadoop cluster. This leverages the
KafkaSource jar packaged with Flume. Learn more about the
Flume Kafka Source.
This charm is intended to be deployed via the
juju quickstart apache-ingestion-kafka
This will deploy the Apache Hadoop platform with Apache Flume and Apache Kafka
communicating with the cluster via the
The default Kafka topic where messages are published is unset. Set this to
an existing Kafka topic as follows:
juju set flume-kafka kafka_topic='<topic_name>'
If you don't have a Kafka topic, you may create one (and verify successful
juju action do kafka/0 create-topic topic=<topic_name> \ partitions=1 replication=1 juju action fetch <id> # <-- id from above command
Once the Flume agents start, messages will start flowing into
HDFS in year-month-day directories here:
A Kafka topic is required for this test. Topic creation is covered in the
Configuration section above. Generate Kafka messages with the
juju action do kafka/0 write-topic topic=<topic_name> data="This is a test"
To verify these messages are being stored into HDFS, SSH to the
unit, locate an event, and cat it:
juju ssh flume-hdfs/0 hdfs dfs -ls /user/flume/flume-kafka # <-- find a date hdfs dfs -ls /user/flume/flume-kafka/yyyy-mm-dd # <-- find an event hdfs dfs -cat /user/flume/flume-kafka/yyyy-mm-dd/FlumeData.[id]
- (string) The Kafka topic to watch for messages
- (string) The maximum number of events the channel will take from a source or give to a sink per transaction.
- (string) The HDFS subdirectory under /user/flume where events will be stored.
- (string) URL from which to fetch resources (e.g., Flume binaries) instead of S3
- (string) Maximum number of messages written to channel in a single batch
- (string) The maximum number of events stored in the channel.