ceph osd #97

  • By james-page
  • Latest version (#97)
  • xenial, bionic, cosmic, trusty
  • Stable

Description

Ceph is a distributed storage and network file system designed to provide
excellent performance, reliability, and scalability.
.
This charm provides the Ceph OSD personality for expanding storage capacity
within a ceph deployment.


Overview

Ceph is a distributed storage and network file system designed to provide
excellent performance, reliability, and scalability.

This charm deploys additional Ceph OSD storage service units and should be
used in conjunction with the 'ceph-mon' charm to scale out the amount of
storage available in a Ceph cluster.

Usage

The charm also supports specification of the storage devices to use in the ceph
cluster::

osd-devices:
    A list of devices that the charm will attempt to detect, initialise and
    activate as ceph storage.

    If the charm detects pre-existing data on a device it will go into a
    blocked state and the operator must resolve the situation utilizing the
    `list-disks`, `zap-disk` and/or `blacklist-*` actions.

    This this can be a superset of the actual storage devices presented to
    each service unit and can be changed post ceph-osd deployment using
    `juju set`.

For example::

ceph-osd:
  options:
    osd-devices: /dev/vdb /dev/vdc /dev/vdd /dev/vde

Example utilizing Juju storage::

ceph-osd:
  storage:
    osd-devices: cinder,20G

Please refer to Juju Storage Documentation for details on support for various storage providers and cloud substrates.

How to deploy::

juju deploy -n 3 ceph-osd
juju deploy ceph-mon --to lxd:0
juju add-unit ceph-mon --to lxd:1
juju add-unit ceph-mon --to lxd:2
juju add-relation ceph-osd ceph-mon

Once the 'ceph-mon' charm has bootstrapped the cluster, it will notify the
ceph-osd charm which will scan for the configured storage devices and add them
to the pool of available storage.

Network Space support

This charm supports the use of Juju Network Spaces, allowing the charm to be bound to network space configurations managed directly by Juju. This is only supported with Juju 2.0 and above.

Network traffic can be bound to specific network spaces using the public (front-side) and cluster (back-side) bindings:

juju deploy ceph-osd --bind "public=data-space cluster=cluster-space"

alternatively these can also be provided as part of a Juju native bundle configuration:

ceph-osd:
  charm: cs:xenial/ceph-osd
  num_units: 1
  bindings:
    public: data-space
    cluster: cluster-space

Please refer to the Ceph Network Reference for details on how using these options effects network traffic within a Ceph deployment.

NOTE: Spaces must be configured in the underlying provider prior to attempting to use them.

NOTE: Existing deployments using ceph-*-network configuration options will continue to function; these options are preferred over any network space binding provided if set.

AppArmor Profiles

AppArmor is not enforced for Ceph by default. An AppArmor profile can be generated by the charm. However, great care must be taken.

Changing the value of the aa-profile-mode option is disruptive to a running Ceph cluster as all ceph-osd processes must be restarted as part of changing the AppArmor profile enforcement mode.

The generated AppArmor profile currently has a narrow supported use case, and it should always be verified in pre-production against the specific configurations and topologies intended for production.

The AppArmor profile(s) which are generated by the charm should NOT yet be used in the following scenarios:
- When there are separate journal devices.
- On any version of Ceph prior to Luminous.
- On any version of Ubuntu other than 16.04.
- With Bluestore enabled.

Block Device Encryption

The ceph-osd charm supports encryption of underlying block devices supporting OSD's.

To use the 'native' key management approach (where dm-crypt keys are stored in the
ceph-mon cluster), simply set the 'osd-encrypt' configuration option::

ceph-osd:
  options:
    osd-encrypt: True

NOTE: This is supported for Ceph Jewel or later.

Alternatively, encryption keys can be stored in Vault; this requires deployment of
the vault charm (and associated initialization of vault - see the Vault charm for
details) and configuration of the 'osd-encrypt' and 'osd-encrypt-keymanager'
options::

ceph-osd:
  options:
    osd-encrypt: True
    osd-encrypt-keymanager: vault

NOTE: This option is only supported with Ceph Luminous or later.

NOTE: Changing these options post deployment will only take effect for any
new block devices added to the ceph-osd application; existing OSD devices will
not be encrypted.

Actions

The charm offers actions which
may be used to perform operational tasks on individual units.

pause

USE WITH CAUTION - Set the local osd units in the charm to 'out' but
does not stop the osds. Unless the osd cluster is set to noout (see below),
this removes them from the ceph cluster and forces ceph to migrate the PGs
to other OSDs in the cluster.

From upstream documentation
"Do not let your cluster reach its full ratio when removing an OSD.
Removing OSDs could cause the cluster to reach or exceed its full ratio."

Also note that for small clusters you may encounter the corner case where
some PGs remain stuck in the active+remapped state. Refer to the above link
on how to resolve this.

pause-health (on a ceph-mon) unit can be used before pausing a ceph-osd
unit to stop the cluster rebalancing the data off this ceph-osd unit.
pause-health sets 'noout' on the cluster such that it will not try to
rebalance the data accross the remaining units.

It is up to the user of the charm to determine whether pause-health should
be used as it depends on whether the osd is being paused for maintenance or
to remove it from the cluster completely.

NOTE the pause action does NOT stop the ceph-osd processes.

resume

Set the local osd units in the charm to 'in'.

list-disks

List disks

The 'disks' key is populated with block devices that are known by udev,
are not mounted and not mentioned in 'osd-journal' configuration option.

The 'blacklist' key is populated with osd-devices in the blacklist stored
in the local kv store of this specific unit.

The 'non-pristine' key is populated with block devices that are known by
udev, are not mounted, not mentioned in 'osd-journal' configuration option
and are currently not eligible for use because of presence of foreign data.

add-disk

Add disk(s) to Ceph

Parameters

  • osd-devices (required)
  • The devices to format and set up as osd volumes.
  • bucket
  • The name of the bucket in Ceph to add these devices into

blacklist-add-disk

Add disk(s) to blacklist. Blacklisted disks will not be
initialized for use with Ceph even if listed in the application
level osd-devices configuration option.

The current blacklist can be viewed with list-disks action.

NOTE This action and blacklist will not have any effect on
already initialized disks.

Parameters

  • osd-devices (required)
  • A space-separated list of devices to add to blacklist.

    Each element should be a absolute path to a device node or filesystem
    directory (the latter is supported for ceph >= 0.56.6).

    Example: '/dev/vdb /var/tmp/test-osd'

blacklist-remove-disk

Remove disk(s) from blacklist.

Parameters

  • osd-devices (required)
  • A space-separated list of devices to remove from blacklist.

    Each element should be a existing entry in the units blacklist.
    Use list-disks action to list current blacklist entries.

    Example: '/dev/vdb /var/tmp/test-osd'

zap-disk

Purge disk of all data and signatures for use by Ceph

This action can be necessary in cases where a Ceph cluster is being
redeployed as the charm defaults to skipping disks that look like Ceph
devices in order to preserve data. In order to forcibly redeploy, the
admin is required to perform this action for each disk to be re-consumed.

In addition to triggering this action, it is required to pass an additional
parameter option of i-really-mean-it to ensure that the
administrator is aware that this will cause data loss on the specified
device(s)

Parameters

  • devices (required)
  • A space-separated list of devices to remove the partition table from.
  • i-really-mean-it (required)
  • This must be toggled to enable actually performing this action

Contact Information

Author: James Page james.page@ubuntu.com
Report bugs at: http://bugs.launchpad.net/charm-ceph-osd/+filebug
Location: http://jujucharms.com/ceph-osd

Configuration

aa-profile-mode
(string) Enable apparmor profile. Valid settings: 'complain', 'enforce' or 'disable'. . NOTE: changing the value of this option is disruptive to a running Ceph cluster as all ceph-osd processes must be restarted as part of changing the apparmor profile enforcement mode. Always test in pre-production before enabling AppArmor on a live cluster.
disable
bluestore-db
(string) Path to a BlueStore WAL db block device or file
sysctl
(string) YAML-formatted associative array of sysctl key/value pairs to be set persistently. By default we set pid_max, max_map_count and threads-max to a high value to avoid problems with large numbers (>20) of OSDs recovering. very large clusters should set those values even higher (e.g. max for kernel.pid_max is 4194303).
{ kernel.pid_max : 2097152, vm.max_map_count : 524288, kernel.threads-max: 2097152 }
bluestore
(boolean) Enable bluestore storage format for OSD devices; Only applies for Ceph Luminous or later.
True
availability_zone
(string) Custom availability zone to provide to Ceph for the OSD placement
osd-format
(string) Format of filesystem to use for OSD devices; supported formats include: . xfs (Default >= 0.48.3) ext4 (Only option < 0.48.3) btrfs (experimental and not recommended) . Only supported with ceph >= 0.48.3.
xfs
crush-initial-weight
(float) The initial crush weight for newly added osds into crushmap. Use this option only if you wish to set the weight for newly added OSDs in order to gradually increase the weight over time. Be very aware that setting this overrides the default setting, which can lead to imbalance in the cluster, especially if there are OSDs of different sizes in use. By default, the initial crush weight for the newly added osd is set to its volume size in TB. Leave this option unset to use the default provided by Ceph itself. This option only affects NEW OSDs, not existing ones.
ceph-cluster-network
(string) The IP address and netmask of the cluster (back-side) network (e.g., 192.168.0.0/24) . If multiple networks are to be used, a space-delimited list of a.b.c.d/x can be provided.
osd-max-backfills
(int) The maximum number of backfills allowed to or from a single OSD. . Setting this option on a running Ceph OSD node will not affect running OSD devices, but will add the setting to ceph.conf for the next restart.
bluestore-block-wal-size
(int) Size of a partition or file to use for BlueStore WAL (RocksDB WAL) A default value is not set as it is calculated by ceph-disk if not specified.
use-syslog
(boolean) If set to True, supporting services will log to syslog.
max-sectors-kb
(int) This parameter will adjust every block device in your server to allow greater IO operation sizes. If you have a RAID card with cache on it consider tuning this much higher than the 1MB default. 1MB is a safe default for spinning HDDs that don't have much cache.
1048576
osd-journal-size
(int) Ceph OSD journal size. The journal size should be at least twice the product of the expected drive speed multiplied by filestore max sync interval. However, the most common practice is to partition the journal drive (often an SSD), and mount it such that Ceph uses the entire partition for the journal. . Only supported with ceph >= 0.48.3.
1024
use-direct-io
(boolean) Configure use of direct IO for OSD journals.
True
source
(string) Optional configuration to support use of additional sources such as: . - ppa:myteam/ppa - cloud:xenial-proposed/ocata - http://my.archive.com/ubuntu main . The last option should be used in conjunction with the key configuration option.
osd-devices
(string) The devices to format and set up as OSD volumes. . These devices are the range of devices that will be checked for and used across all service units, in addition to any volumes attached via the --storage flag during deployment. . For ceph >= 0.56.6 these can also be directories instead of devices - the charm assumes anything not starting with /dev is a directory instead.
/dev/vdb
osd-encrypt
(boolean) By default, the charm will not encrypt Ceph OSD devices; however, by setting osd-encrypt to True, Ceph's dmcrypt support will be used to encrypt OSD devices. . Specifying this option on a running Ceph OSD node will have no effect until new disks are added, at which point new disks will be encrypted.
prefer-ipv6
(boolean) If True enables IPv6 support. The charm will expect network interfaces to be configured with an IPv6 address. If set to False (default) IPv4 is expected. . NOTE: these charms do not currently support IPv6 privacy extension. In order for this charm to function correctly, the privacy extension must be disabled and a non-temporary address must be configured/available on your network interface.
nagios_servicegroups
(string) A comma-separated list of nagios servicegroups. If left empty, the nagios_context will be used as the servicegroup
osd-recovery-max-active
(int) The number of active recovery requests per OSD at one time. More requests will accelerate recovery, but the requests places an increased load on the cluster. . Setting this option on a running Ceph OSD node will not affect running OSD devices, but will add the setting to ceph.conf for the next restart.
customize-failure-domain
(boolean) Setting this to true will tell Ceph to replicate across Juju's Availability Zone instead of specifically by host.
ceph-public-network
(string) The IP address and netmask of the public (front-side) network (e.g., 192.168.0.0/24) . If multiple networks are to be used, a space-delimited list of a.b.c.d/x can be provided.
ignore-device-errors
(boolean) By default, the charm will raise errors if a whitelisted device is found, but for some reason the charm is unable to initialize the device for use by Ceph. . Setting this option to 'True' will result in the charm classifying such problems as warnings only and will not result in a hook error.
key
(string) Key ID to import to the apt keyring to support use with arbitary source configuration from outside of Launchpad archives or PPA's.
bluestore-block-db-size
(int) Size of a partition or file to use for BlueStore metadata or RocksDB SSTs. A default value is not set as it is calculated by ceph-disk if not specified.
osd-encrypt-keymanager
(string) Keymanager to use for storage of dm-crypt keys used for OSD devices; by default 'ceph' itself will be used for storage of keys, making use of the key/value storage provided by the ceph-mon cluster. . Alternatively 'vault' may be used for storage of dm-crypt keys. Both approaches ensure that keys are never written to the local filesystem. This also requires a relation to the vault charm.
ceph
config-flags
(string) User provided Ceph configuration. Supports a string representation of a python dictionary where each top-level key represents a section in the ceph.conf template. You may only use sections supported in the template. . WARNING: this is not the recommended way to configure the underlying services that this charm installs and is used at the user's own risk. This option is mainly provided as a stop-gap for users that either want to test the effect of modifying some config or who have found a critical bug in the way the charm has configured their services and need it fixed immediately. We ask that whenever this is used, that the user consider opening a bug on this charm at http://bugs.launchpad.net/charms providing an explanation of why the config was needed so that we may consider it for inclusion as a natively supported config in the charm.
osd-journal
(string) The device to use as a shared journal drive for all OSD's. By default a journal partition will be created on each OSD volume device for use by that OSD. . Only supported with ceph >= 0.48.3.
loglevel
(int) OSD debug level. Max is 20.
1
nagios_context
(string) Used by the nrpe-external-master subordinate charm. A string that will be prepended to instance name to set the hostname in nagios. So for instance the hostname would be something like: . juju-myservice-0 . If you're running multiple environments with the same services in them this allows you to differentiate between them.
juju
bluestore-wal
(string) Path to a BlueStore WAL block device or file.
harden
(string) Apply system hardening. Supports a space-delimited list of modules to run. Supported modules currently include os, ssh, apache and mysql.
autotune
(boolean) Enabling this option will attempt to tune your network card sysctls and hard drive settings. This changes hard drive read ahead settings and max_sectors_kb. For the network card this will detect the link speed and make appropriate sysctl changes. Enabling this option should generally be safe.
ephemeral-unmount
(string) Cloud instances provide ephermeral storage which is normally mounted on /mnt. . Setting this option to the path of the ephemeral mountpoint will force an unmount of the corresponding device so that it can be used as a OSD storage device. This is useful for testing purposes (cloud deployment is not a typical use case).