apache drill #132

Description

Apache Drill Drillbit


Overview

Query any non-relational datastore (well, almost....)

Drill supports a variety of NoSQL databases and file systems,
including HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3,
Azure Blob Storage, Google Cloud Storage, Swift, NAS and local files.
A single query can join data from multiple datastores. For example,
you can join a user profile collection in MongoDB with a directory
of event logs in Hadoop.

Drill's datastore-aware optimizer automatically restructures a
query plan to leverage the datastore's internal processing capabilities.
In addition, Drill supports data locality, so it's a good idea to
co-locate Drill and the datastore on the same nodes.

Usage

To deploy this charm simply run:

juju deploy apache-zookeeper zookeeper
juju add-unit -n 2 apache-zookeeper (optional but recommended for a quorum)
juju deploy cs:~spiculecharms/apache-drill
juju add-relation apache-drill zookeeper
juju expose apache-drill

There is a webconsole running on http://:8047/

HDFS connectivity

If you are running a Hadoop setup, you can also test the HDFS connectivity.

juju add-relation apache-drill namenode

This will add a datasource entry for your Hadoop namenode. You can then query CSV/JSON/Parquet files.

Security

To enable security you need to first set and administrative user, inside the Drill console run the following:

ALTER SYSTEM SET `security.admin.users`='<myuser>'

Then in the config enable basic_auth and basic_security_auth. Once drill has restarted you should then see a user login
in the top corner. To create a user either create a standard unix user using useradd or use the following action:

juju run-action apache-drill/0 adduser username="<myuser>" password="<mypword>"

You should then be able to login with this user.

Scale out Usage

You can simply add new units and they will be added to the cluster automatically:

juju add-unit -n 2 apache-drill

Configuration

drill_url: Allows you to set an alternative download url for Apache Drill.

cluster_id: Allows you to set an alternative cluster id for Zookeeper.

Contact Information

Apache Drill

Charm Support

If you require commercial support for this charm or Apache Drill, please contact us and we'd be happy to help.
Email us at info@spicule.co.uk and we can arrange a call to discuss your requirements.

Configuration

hdfs_path
(string) Default path for HDFS connections.
/user/ubuntu
drill_max_direct_memory
(string) Drill Max Available RAM. By default we set this to 75% of total system ram. You can provide a percentage value or a Gigabyte value like: 8G
75%
basic_security_auth
(boolean) Enable/disable basic security authentication for drill
snap_proxy_url
(string) The address of a Snap Store Proxy to use for snaps e.g. http://snap-proxy.example.com
snap_proxy
(string) HTTP/HTTPS web proxy for Snappy to use when accessing the snap store.
hdfs_formats
(string)
'psv: type: text extensions: - tbl delimiter: "|" csv: type: text extensions: - csv delimiter: "," parquet: type: parquet json: type: json extensions: - json avro: type: avro sequencefile: type: sequencefile extensions: - seq csvh: type: text extensions: - csvh extractHeader: "true" delimiter: ","'
auth_profiles
(string) Authentication profile inputs for drill
sudo, login
basic_auth
(boolean) Enable/disable basic authentication for drill
auth_mechanism
(string) Alter the authentication mechanism for drill
PLAIN
package_status
(string) The status of service-affecting packages will be set to this value in the dpkg database. Valid values are "install" and "hold".
install
cluster_id
(string) Cluster ID for zookeeper
drill-cluster
snapd_refresh
(string) How often snapd handles updates for installed snaps. The default (an empty string) is 4x per day. Set to "max" to check once per month based on the charm deployment date. You may also set a custom string as described in the 'refresh.timer' section here: https://forum.snapcraft.io/t/system-options/87
hdfs_writeable
(boolean) Is the HDFS path writable?
True
extra_packages
(string) Space separated list of extra deb packages to install.
drill_heap
(string) Drill Max Heap Size. By default we set this to 25% of total system ram. You can provide a percentage value or a Gigabyte value like: 3G
25%
install_keys
(string) List of signing keys for install_sources package sources, per charmhelpers standard format (a yaml list of strings encoded as a string). The keys should be the full ASCII armoured GPG public keys. While GPG key ids are also supported and looked up on a keyserver, operators should be aware that this mechanism is insecure. null can be used if a standard package signing key is used that will already be installed on the machine, and for PPA sources where the package signing key is securely retrieved from Launchpad.
install_sources
(string) List of extra apt sources, per charm-helpers standard format (a yaml list of strings encoded as a string). Each source may be either a line that can be added directly to sources.list(5), or in the form ppa:<user>/<ppa-name> for adding Personal Package Archives, or a distribution component to enable.