Cassandra is a distributed (peer-to-peer) system for the management and
storage of structured data.

Overview

The Apache Cassandra database is the right choice when you need scalability
and high availability without compromising performance. Linear scalability
and proven fault-tolerance on commodity hardware or cloud infrastructure
make it the perfect platform for mission-critical data. Cassandra's support
for replicating across multiple datacenters is best-in-class, providing lower
latency for your users and the peace of mind of knowing that you can survive
regional outages.

See cassandra.apache.org for more information.

Editions

This charm supports Apache Cassandra 2.0, 2.1, 2.2 & 3.0, and
Datastax Enterprise 4.7, 4.8 & 5.0. The default is Apache Cassandra 3.0.

To use a particular Apache Cassandra release, specify the relevant
deb archive in in the install_sources config setting when deploying.

    install_sources:
      - deb http://www.apache.org/dist/cassandra/debian 22x main
      - ppa:openjdk-r/ppa   # For OpenJDK 8
      - ppa:stub/cassandra  # For Python driver

To use Datastax Enterprise, set the edition config setting to dse
and the Datastax Enterprise archive URL in install_sources (including your
username and password).

    install_sources:
      - deb http://un:pw@debian.datastax.com/enterprise stable main
      - ppa:openjdk-r/ppa   # For OpenJDK 8
      - ppa:stub/cassandra  # For Python driver

Deployment

Cassandra deployments are relatively simple in that they consist of a set of
Cassandra nodes which seed from each other to create a ring of servers:

juju deploy -n3 cs:trusty/cassandra

The service units will deploy and will form a single ring.

New nodes can be added to scale up:

juju add-unit cassandra

/!\ Nodes must be manually decommissioned before dropping a unit.

juju run --unit cassandra/1 "nodetool decommission"
# Wait until Mode is DECOMMISSIONED
juju run --unit cassandra/1 "nodetool netstats"
juju remove-unit cassandra/1

It is recommended to deploy at least 3 nodes and configure all your
keyspaces to have a replication factor of three. Using fewer nodes or
neglecting to set your keyspaces' replication settings means that your
data is at risk and availability lower, as a failed unit may take the
only copy of data with it.

Production systems will normally want to set max_heap_size and
heap_newsize to the empty string, to enable automatic memory size
tuning. The defaults have been chosen to be suitable for development
environments but will perform poorly with real workloads.

Planning

  • Do not attempt to store too much data per node. If you need more space,
    add more nodes. Most workloads work best with a capacity under 1TB
    per node.

  • You need to keep 50% of your disk space free for Cassandra maintenance
    operations. If you expect your nodes to hold 500GB of data each, you
    will need a 1TB partition. Using non-default compaction such as
    LeveledCompactionStrategy can lower this waste.

  • Much more information can be found in the Cassandra 2.2 documentation

Network Access

The default Cassandra packages are installed from the apache.org
archive. To avoid this download, place a copy of the packages in a local
archive and specify its location in the install_sources configuration
option. The signing key is automatically added.

When using DataStax Enterprise, you need to specify the archive location
containing the DataStax Enterprise .deb packages in the
install_sources configuration item, and the signing key in the
install_keys configuration item. Place the DataStax packages in a
local archive to avoid downloading from datastax.com.

Oracle Java SE

Cassandra recommends using Oracle Java SE 8. Unfortunately, this
software is accessible only after accepting Oracle's click-through
license making deployments using it much more cumbersome. You will need
to download the Oracle Java SE 8 Server Runtime for Linux, and place the
tarball at a URL accessible to your deployed units. The config item
private_jre_url needs to be set to this URL.

Usage

To relate the Cassandra charm to a service that understands how to talk to
Cassandra using Thrift or the native Cassandra protocol::

juju deploy cs:service-that-needs-cassandra
juju add-relation service-that-needs-cassandra cassandra:database

Alternatively, if you require a superuser connection, use the
database-admin relation instead of database::

juju deploy cs:admin-service
juju add-relation admin-service cassandra:database-admin

Client charms need to provide nothing. The Cassandra service publishes the
following connection settings and cluster information on the client's relation:

username and password:

Authentication credentials. The cluster is configured to use
the standard PasswordAuthenticator authentication provider, rather
than the insecure default. You can use different credentials
if you wish, using an account created through some other mechanism.

host:

IP address to connect to.

native_transport_port:

Port for drivers and tools using the newer native protocol.

rpc_port:

Port for drivers and tools using the legacy Thrift protocol.

cluster_name:

The cluster name. A client service may be related to several
Cassandra services, and this setting may be used to tell which
services belong to which cluster.

datacenter and rack:

The datacenter and rack units in this service belong to. Required for
setting keyspace replication correctly.

The cluster is configured to use the recommended 'snitch'
(GossipingPropertyFileSnitch), so you will need to configure replication of
your keyspaces using the NetworkTopologyStrategy replica placement strategy.
For example, using the default datacenter named 'juju':

CREATE KEYSPACE IF NOT EXISTS mydata WITH REPLICATION =
{ 'class': 'NetworkTopologyStrategy', 'juju': 3};

Although authentication is configured using the standard
PasswordAuthentication, by default no authorization is configured
and the provided credentials will have access to all data on the cluster.
For more granular permissions, you will need to set the authorizer
in the service configuration to CassandraAuthorizer and manually grant
permissions to the users.

Known Limitations and Issues

This is the 'trusty' charm. Upgrade from the 'precise' charm is not supported.

The system_auth keyspace replication factor is automatically increased
but not decreased. If you have a service with three or more units and
decommission enough nodes to drop below three, you will need to manually
update the system_auth keyspace replication settings.

Contact Information

Charm

Cassandra

Configuration

jre
(string)
                            Which Java runtime environment to use. May be 'openjdk' or 'oracle'.

                        
openjdk
stream_throughput_outbound_megabits_per_sec
(int)
                            Throttles all outbound streaming file transfers on nodes to the given total throughput in Mbps. This is necessary because Cassandra does mostly sequential IO when streaming data during bootstrap or  repair, which can lead to saturating the network connection and  degrading rpc performance. When unset, the default is 200 Mbps or 25 MB/s. 0 to disable throttling.

                        
200
http_proxy
(string)
                            Value for the http_proxy and https_proxy environment variables. This causes pip(1) and other tools to perform downloads via the proxy server. eg. http://squid.dc1.lan:8080

                        
compaction_throughput_mb_per_sec
(int)
                            Throttles compaction to the given total throughput (in MB/sec) across the entire system. The faster you insert data, the faster you need to compact in order to keep the sstable count down, but in general, setting this to 16 to 32 times the rate you are inserting data is more than sufficient.  Setting this to 0 disables throttling. Note that this account for all types of compaction, including validation compaction.

                        
16
edition
(string)
                            One of 'community', 'dse', or 'apache-snap'. 'community' uses the Apache Cassandra packages. 'dse' is for DataStax Enterprise. Selecting 'dse' overrides the jvm setting. 'apache-snap' uses a snap package of Apache Cassandra.

                        
community
listen_interface
(string)
                            Network interface used for connecting to other Cassandra nodes. Must correspond to a single IP address. By default, the unit's public IP address is used.

                        
install_keys
(string)
                            charm-helpers standard listing of package install source signing keys, corresponding to install_sources.

                        
- null # Apache package signing key added automatically. - null # PPA package signing key added automatically. - null # PPA package signing key added automatically. # - null # DataStack package signing key added automatically.
rpc_interface
(string)
                            Network interface used for client connections. Must correspond to a single IP address. By default, the unit's public IP address is used.

                        
authorizer
(string)
                            Authorization backend, implementing IAuthorizer; used to limit access/provide permissions Out of the box, Cassandra provides AllowAllAuthorizer & CassandraAuthorizer - AllowAllAuthorizer allows any action to any user - set it to
  disable authorization.
- CassandraAuthorizer stores permissions in
  system_auth.permissions table.

                        
AllowAllAuthorizer
tombstone_warn_threshold
(int)
                            When executing a scan, within or across a partition, we need to keep the tombstones seen in memory so we can return them to the coordinator, which will use them to make sure other replicas also know about the deleted rows. With workloads that generate a lot of tombstones, this can cause performance problems and even exaust the server heap. Adjust the thresholds here if you understand the dangers and want to scan more tombstones anyway.

                        
1000
nagios_disk_crit_pct
(int)
                            The pct of data disk used to trigger a nagios critcal alert

                        
25
cluster_name
(string)
                            Name of the Cassandra cluster. This is mainly used to prevent machines in one logical cluster from joining another. All Cassandra services you wish to cluster together must have the same cluster_name. This setting cannot be changed after service deployment.

                        
juju
ssl_storage_port
(int)
                            Cluster secure communication port. TODO: Unused. configure SSL.

                        
7001
private_jre_url
(string)
                            URL for the private jre tar file. DSE requires Oracle Java SE 8 Server JRE (eg. server-jre-8u60-linux-x64.tar.gz).

                        
storage_port
(int)
                            Cluster communication port
                        
7000
rpc_port
(int)
                            Thrift protocol port for legacy clients.
                        
9160
nagios_servicegroups
(string)
                            A comma-separated list of nagios servicegroups. If left empty, the nagios_context will be used as the servicegroup

                        
wait_for_storage_broker
(boolean)
                            Do not start the service before external storage has been mounted using the block storage broker relation. If you do not set this and you relate the service to the storage broker, then your service will have started up using local disk, and later torn down and rebuilt when the external storage became available.

                        
nagios_heapchk_warn_pct
(int)
                            The pct of heap used to trigger a nagios warning

                        
80
saved_caches_directory
(string)
                            Saved caches directory. The path is relative to /var/lib/cassandra or the block storage broker external mount point.

                        
saved_caches
max_heap_size
(string)
                            Total size of Java memory heap, for example 1G or 512M. If you set this, you should also set heap_newsize. The default is automatically tuned.

                        
384M
nagios_disk_warn_pct
(int)
                            The pct of data disk used to trigger a nagios warning

                        
50
authenticator
(string)
                            Authentication backend. Only PasswordAuthenticator and AllowAllAuthenticator are supported. You should only use AllowAllAuthenticator for legacy applications that cannot provide authentication credentials.

                        
PasswordAuthenticator
num_tokens
(int)
                            Number of tokens per node.
                        
256
tombstone_failure_threshold
(int)
                            When executing a scan, within or across a partition, we need to keep the tombstones seen in memory so we can return them to the coordinator, which will use them to make sure other replicas also know about the deleted rows. With workloads that generate a lot of tombstones, this can cause performance problems and even exaust the server heap. Adjust the thresholds here if you understand the dangers and want to scan more tombstones anyway.

                        
100000
commitlog_directory
(string)
                            Commit log directory. The path is relative to /var/lib/cassandra or the block storage broker external mount point.

                        
commitlog
datacenter
(string)
                            The node's datacenter used by the endpoint_snitch. e.g. "DC1". It cannot be changed after service deployment.

                        
juju
heap_newsize
(string)
                            The size of the JVM's young generation in the heap. If you set this, you should also set max_heap_size. If in doubt, go with 100M per physical CPU core. The default is automatically tuned.

                        
32M
io_scheduler
(string)
                            Set kernel io scheduler for persistent storage. https://www.kernel.org/doc/Documentation/block/switching-sched.txt

                        
noop
data_file_directories
(string)
                            Space delimited data directories. Use multiple data directories to split data over multiple physical hardware drive partitions. Paths are relative to /var/lib/cassandra or the block storage broker external mount point.

                        
data
native_transport_port
(int)
                            Native protocol port for native protocol clients.
                        
9042
nagios_context
(string)
                            Used by the nrpe subordinate charms.
A string that will be prepended to instance name to set the host name
in nagios. So for instance the hostname would be something like:
    juju-myservice-0
If you're running multiple environments with the same services in them
this allows you to differentiate between them.

                        
juju
rack
(string)
                            The rack used by the endpoint_snitch for all units in this service. e.g. "Rack1". This cannot be changed after deployment. It defaults to the service name. Cassandra will store replicated data in different racks whenever possible.

                        
partitioner
(string)
                            The cassandra partitioner to use. Use Murmur3Partitioner, unless another is required for backwards compatibility.

                        
Murmur3Partitioner
package_status
(string)
                            The status of service-affecting packages will be set to this value in the dpkg database. Useful valid values are "install" and "hold".

                        
install
extra_packages
(string)
                            Extra packages to install. A space delimited list of packages.

                        
install_sources
(string)
                            charm-helpers standard listing of package install sources. If you are using Datastax Enterprise, you will need to override one defaults with your own username and password.

                        
- deb http://www.apache.org/dist/cassandra/debian 30x main - ppa:openjdk-r/ppa # For OpenJDK 8 - ppa:cassandra-charmers/stable # For Python driver
nagios_heapchk_crit_pct
(int)
                            The pct of heap used to trigger a nagios critcal alert

                        
90