As of version 1.20, Juju supports high availability for its state server.
High Availability in general terms means that a Juju environment has 3 or more (up to 7) redundant state servers. One of these is the master with automatic failover occurring should something happen to the master.
This section describes how Juju's high availability features works.
Juju's High Availability (HA) mode is tightly integrated and dependent on MongoDB. Juju stores all its data about the environment in a MongoDB database. MongoDB has an HA implementation that connects multiple MongoDB databases in a cluster called a replicaset. The replicaset has a 1:1 relation to Juju state servers, and the master of the replicaset is the master of the Juju state servers.
Juju's HA mode is turned on using the
juju ensure-availability command. By
default this sets the desired number of state machines in the environment to 3.
There is an optional
-n parameter which can be used to set this number higher.
As stated above, there is a 1:1 relationship between the number of state machines and the number of MongoDB instances. This means that the number of state machines must be an odd number to prevent ties during voting for master, and the number of state servers cannot exceed 7, due to MongoDB limits. In practice, this means the possibilities are 3, 5 or 7 state machines.
Currently the number of state machines can be increased using the -n flag on
juju ensure-availability, but not decreased. The only way to decrease the
number of machines is to create a backup of your environment and then restore
the backup to a new environment, which starts with a single state server.
Whenever you run ensure-availability, the command will report the changes that it made to the system's desired model, which will shortly be reflected in reality.
Juju does not automatically re-spawn state machines if one or more fail.
However, if an environment is already in HA mode, you can recover from state
machine failure by manually re-running
juju ensure-availability. This can be
done as long as more than half the original number of machines are still
The process to recovering when state servers have failed is:
juju ensure-availability. New state servers will be created to replace the dead ones.
- After some time the new state servers will be ready and the dead state servers
will be removed from Juju's set of high availability state servers. This will
take on the order of 30 seconds to 20 minutes depending on variables like the
load on the machines and the amount of Juju configuration data to
replicate. In the output from
juju statusthe new state servers will have a
has-voteand the dead state servers will have
no-vote. At this point, the state servers in the environment are fully-redundant again.
- To have Juju not treat the dead state server machines as state servers any
juju ensure-availabilityagain. The
state-server-member-statusattribute will disappear from these machines in the
- The dead state server hosts can now be completely removed from Juju's
configuration by using
If fewer than half of the original state servers are still running, you cannot recover by using the ensure-availability command because the MongoDB replicaset does not have a quorum with which to elect a new master. In this case, you must restore from a previous backup.