nrpe #2

  • By paulgear
  • Latest version (#2)
  • xenial, trusty, zesty, yakkety
  • Stable

Description

Nagios is a host/service/network monitoring and management system. The
purpose of this addon is to allow you to execute Nagios plugins on a
remote host in as transparent a manner as possible. This program runs
as a background process on the remote host and processes command
execution requests from the check_nrpe plugin on the Nagios host.


Introduction

This subordinate charm is used to configure nrpe (Nagios Remote Plugin
Executor). It can be related to the nagios charm via the monitors relation and
will pass a monitors yaml to nagios informing it of what checks to monitor.

Principal Relations

This charm can be attatched to any principal charm (via the juju-info relation)
regardless of whether it has implemented the local-monitors or
nrpe-external-master relations. For example,

juju deploy ubuntu
juju deploy nrpe
juju deploy nagios
juju add-relation ubuntu nrpe
juju add-relation nrpe:monitors nagios:monitors

If joined via the juju-info relation the default checks are configured and
additional checks can be added via the monitors config option (see below).

The local-monitors relations allows the principal to request checks to be setup
by passing a monitors yaml and listing them in the 'local' section. It can
also list checks that is has configured by listing them in the remote nrpe
section and finally it can request external monitors are setup by using one of
the other remote types. See "Monitors yaml" below.

Other Subordinate Charms

If another subordinate charm deployed to the same principal has a
local-monitors or nrpe-external-master relation then it can also be related to
the local nrpe charm. For example,

echo -e "glance:\n vip: 10.5.106.1" > glance.yaml
juju deploy -n3 --config glance.yaml glance
juju deploy hacluster glance-hacluster
juju deploy nrpe glance-nrpe
juju deploy nagios
juju add-relation glance glance-hacluster
juju add-relation glance-nrpe:monitors nagios:monitors
juju add-relation glance glance-nrpe
juju add-relation glance-hacluster glance-nrpe

The glance-hacluster charm will pass monitoring information to glance-nrpe
which will amalgamate all monitor definitions before passing them to nagios.

Check sources

Check definitions can come from three places:

Default Checks

This charm creates a base set of checks in /etc/nagios/nrpe.d, including
check_load, check_users, check_disk_root. All of the options for these are
configurable but sensible defaults have been set in config.yaml.
For example to increase the alert threshold for number of processes:

juju set nrpe load="-w 10,10,10 -c 25,25,25"

Principal Requested Checks

Monitors passed to this charm by the principal charm via the local-monitors
or nrpe-external-master relation. The principal charm can write its own
check definition into /etc/nagios/nrpe.d and then inform this charm via the
monitors setting. It can also request a direct external check of a service
without using nrpe. See "Monitors yaml" below for examples.

User Requested Checks

This works in the same way as the Principal requested except the monitors yaml
is set by the user via the monitors config option. For example to add a monitor
for the rsylog process:

juju set nrpe monitors="
monitors:
    local:
        procrunning:
            rsyslogd:
                min: 1
                max: 1
                executable: rsyslogd
"

External Nagios

If the nagios server is not deployed in the juju environment then the charm can
be configured, via the export_nagios_definitions, to write out nagios config
fragments to /var/lib/nagios/export. Rsync is then configured to allow a host
(specified by nagios_master) to collect the fragments. An rsync stanza is created
allowing the Nagios server to pick up configs from /var/lib/nagios/export (as
a target called "external-nagios"), which will also be configured to allow
connections from the hostname or IP address as specified for the
"nagios_master" variable.

It is up to you to configure the Nagios master to pull the configs needed, which
will then cause it to connect back to the instances in question to run the nrpe
checks you have defined.

Monitors yaml

The list of monitors past down the monitors relation is an amalgamation of the
lists provided via the principal, the user and the default checks.

The monitors yaml is of the following form:

# Version of the spec, mostly ignored but 0.3 is the current one
version: '0.3'
# Dict with just 'local' and 'remote' as parts
monitors:
    # local monitors need an agent to be handled. See nrpe charm for
    # some example implementations
    local:
        # procrunning checks for a running process named X (no path)
        procrunning:
            # Multiple procrunning can be defined, this is the "name" of it
            nagios3:
                min: 1
                max: 1
                executable: nagios3
    # Remote monitors can be polled directly by a remote system
    remote:
        # do a request on the HTTP protocol
        http:
            nagios:
                port: 80
                path: /nagios3/
                # expected status response (otherwise just look for 200)
                status: 'HTTP/1.1 401'
                # Use as the Host: header (the server address will still be used to connect() to)
                host: www.fewbar.com
        mysql:
            # Named basic check
            basic:
                username: monitors
                password: abcdefg123456
        nrpe:
            apache2:
                command: check_apache2

Before a monitor is added it is checked to see if it is in the 'local' section.
If it is this charm needs to convert it into an nrpe checks. Only a small
number of check types are currently supported (see below) .These checks can
then be called by the nagios charm via the nrpe service. So for each check
listed in the local section:

  1. The definition is read and a check definition it written /etc/nagios/nrpe.d
  2. The check is defined as a remote nrpe check in the yaml passed to nagios

In the example above a check_proc_nagios3_user.cfg file would be written
out which contains:

# Check process nagios3 is running (user)
command[check_proc_nagios3_user]=/usr/lib/nagios/plugins/check_procs -w 1 -c 1 -C nagios3

and the monitors yaml passed to nagios would include:

monitors:
    nrpe:
    check_proc_nagios3_user:
        command: check_proc_nagios3_user

The principal charm, or the user via the monitors config option, can request an
external check by adding it to the remote section of the monitors yaml. In the
example above direct checks of a webserver and of mysql are being requested.
This charm passes those on to nagios unaltered.

Local check types

Supported nrpe checks are:
procrunning:
min: Minimum number of 'executable' processes
max: Maximum number of 'executable' processes
executable: Name of executable to look for in process list
processcount
min: Minimum total number processes
max: Maximum total number processes
executable: Name of executable to look for in process list
disk
path: Directory to monitor space usage of

Remote check types

Supported remote types:
http, mysql, nrpe, tcp, rpc, pgsql
(See Nagios charm for up-to-date list and options)

Spaces

By defining 'monitors' binding, you can influence which nrpe's IP will be reported
back to Nagios. This can be very handy if nrpe is placed on machines with multiple
IPs/networks.

Actions

The charm defines 2 actions, 'list-nrpe-checks' that gives a list of all the
nrpe checks defined for this unit and what commands they use. The other is
run-nrpe-check, which allows you to run a specified nrpe check and get the
output. This is useful to confirm if an alert is actually resolved.

Configuration

load
(string) Load check arguments (e.g. "-w 8,8,8 -c 15,15,15"); if 'auto' is set, then NUM_CPUS*0.7 is used for the warning threshold and NUM_CPUS for critical.
auto
sub_postfix
(string) A string to be appended onto all the nrpe checks created by this charm to avoid potential clashes with existing checks
nagios_hostname_type
(string) Determines whether a server is identified by its unit name or host name. If you're in a virtual environment, "unit" is probably best. If you're using MaaS, you may prefer "host".
unit
dont_blame_nrpe
(boolean) Setting dont_blame_nrpe to True sets dont_blame_nrpe=1 in nrpe.cfg This config option which allows specifying arguments to nrpe scripts. This can be a security risk so it is disabled by default. Nrpe is compiled with --enable-command-args option by default, which this option enables.
mem
(string) Memory check
-C -u -w 85 -c 90
swap
(string) Swap check
-w 40% -c 25%
server_port
(int) Port on which nagios-nrpe-server will listen
5666
hostgroups
(string) Comma separated list of hostgroups to add for these hosts
zombies
(string) Zombie processes check; defaults to disabled. To enable, set the desired check_procs arguments pertaining to zombies, for example: "-w 3 -c 6 -s Z"
nagios_host_context
(string) A string which will be prepended to instance name to set the host name in nagios. So for instance the hostname would be something like: juju-postgresql-0 If you're running multiple environments with the same services in them this allows you to differentiate between them.
juju
nagios_address_type
(string) Determines whether the nagios host check should use the private or public IP address of an instance. Can be "private" or "public".
private
hostcheck_inherit
(string) Hostcheck to inherit
server
procs
(string) Set thresholds for number of running processes. Defaults to disabled; to enable, specify 'auto' for the charm to generate thresholds based on processor count, or manually provide arguments for check_procs, for example: "-k -w 250 -c 300" to set warning and critical levels manually and exclude kernel threads.
conntrack
(string) conntrack table check
-w 80 -c 90
swap_activity
(string) Swap activity check
-i 5 -w 100 -c 500
debug
(boolean) Setting debug to True enables debug=1 in nrpe.cfg
users
(string) Set thresholds for number of logged-in users. Defaults to disabled; to enable, manually provide arguments for check_user, for example: "-w 20 -c 25"
nagios_master
(string) IP address of the nagios master from which to allow rsync access
None
monitors
(string) Additional monitors defined in the monitors yaml format (see README)
export_nagios_definitions
(boolean) If True nagios check definitions are written to '/var/lib/nagios/export' and rync is configured to allow nagios_master to collect them. Useful when Nagios is outside of the juju environment
disk_root
(string) Root disk check. This can be made to also check non-root disk systems as follows: -u GB -w 20% -c 15% -r '/srv/juju/vol-' -C -u GB -w 25% -c 20% The string '-p /' will be appended to this check, so you must finish the string taking that into account. See the nagios check_disk plugin help for further details.
-u GB -w 25% -c 20% -K 5%