kafka twitter #1

Description

This charm will install a specific Kafka application dedicated to consuming the Twitter Streaming API and send all messages to a Kafka broker.


Overview

This charm will install a specific Kafka application dedicated to consuming the Twitter Streaming API and send all messages to a Kafka broker.

Note that this application is useless if not used in combination with a Kafka Server / Broker cluster

Usage

In essence, you will need a Kafka cluster to make it run. As Kafka requires a Zookeeper cluster to run, we are also going to deploy that.

Your first taks will be to select your favourite Hadoop distribution and select the Zookeeper from that distribution. The below example shows a vanilla version, but hdp-zookeeper would work.

:~$ juju deploy trusty/zookeeper
:~$ juju deploy trusty/kafka
:~$ juju add-relation kafka zookeeper

This will make sure you have a cluster available with an up & running Kafka broker.

Now install this Producer and add a relation between it and the Kafka Broker.

:~$ juju deploy trusty/kafka-twitter
:~$ juju add-relation kafka-twitter kafka

Scale out Usage

This charm doesn't need to scale out for demo purposes. In the current implementation of Twitter the top bandwidth you are going to get out of it is ~140Mbps uncompressed. Hence a single well configured machine is OK for this.

However, it can use a Kafka cluster so you can scale that out.

:~$ juju add-unit kafka -n 3

If the volume of tweet you expect is very high, you can also increase the number of partitions Kafka uses in the configuration file. This doesn't come as a configuration option in this charm so please notify if you need it and I can add this.

HA usage

Production environments usually run ZooKeeper in a HA mode with 3 nodes. You can do so by running:

:~# juju add-unit zookeeper -n 2

Known Limitations and Issues

The default configuration for Kafka doesn't work well for our purpose as it clearly assumes a complete and extensive production environment. So we are going to change some variables to make it sustainable for an AWS m1.small instance. Note that those changes are not mandatory if you run enough disk space.

First of all connect to your kafka server, edit the server details with your favorite editor. We'll use Nano for the purpose of this example, and only write what lines we change / update

:~$ juju ssh kafka/0
:~$ sudo nano /opt/kafka/config/server.properties

OK now let's modify

# log.retention.hours=168
log.retention.hours=24

# log.cleaner.enable=false
log.cleaner.enable=true

Then you should restart Kafka.

sudo service kafka restart

you can also run the same from your computer with

:~# juju run --service=kafka '(sed -i -e s/^log\.retention\.hours.*/log\.retention\.hours\=24/ -e s/^log.cleaner.enable.*/log\.cleaner\.enable\=true) && service kafka restart'

We should also check the ZooKeeper connection string. In some occasions if the ZooKeeper charm is not the one planed it may fail to expose the right URL. So let's have a look at zookeeper.connect and make sure it is OK.

Configuration

Those settings will apply to /etc/kafka-twitter/producer.conf

consumerKey = YourTwitterConsumerKey
consumerSecret = YourTwitterConsumerSecret
accessToken = YourTwitterToken
accessTokenSecret = YourTwitterTokenSecret
keywords = space separated list of hashtags to follow (without #)

Contact Information

Maintener of this charm: Samuel Cozannet samnco@gmail.com

Upstream Project Name

This project was inspired by NF LAbs project for a Kafka Twitter agent. Most of the java code come from them.

Configuration

keywords
(string) space separated list of words to filter in Twitter Streaming API.
twitter_access_token
(string) Twitter Access Token. See https://dev.twitter.com/docs/auth/tokens-devtwittercom
twitter_access_token
twitter_access_token_secret
(string) Twitter Access Token Secret. See https://dev.twitter.com/docs/auth/tokens-devtwittercom
twitter_access_token_secret
twitter_consumer_key
(string) Twitter Consumer Key. See https://dev.twitter.com/docs/auth/tokens-devtwittercom
twitter_consumer_key
twitter_consumer_secret
(string) Twitter Consumer Secret. See https://dev.twitter.com/docs/auth/tokens-devtwittercom
twitter_consumer_secret