Note
High Availability support is only available to those with Enterprise-level licenses.
Tower installations can occur in a High Availability (HA) configuration. In this configuration, Tower runs with a single active node, called the Primary instance, and any number of inactive nodes, called Secondary instances. Secondary instances can become Primary at any time, with certain caveats. Running in a high-availability setup requires any database that Tower uses to be external–Postgres and MongoDB must be installed on a machine that is not one of the primary or secondary tower nodes.
Tower’s HA mode offers a standby Tower infrastructure that can become active in case of infrastructure failure–avoiding single points of failure. HA mode is not meant to run in an active/active or multi-master mode, and is not a mechanism for horizontally scaling the Tower service. Further, failover to a secondary instance is not automatic and must be user triggered.
For instructions on how to install into a HA configuration, refer to the Ansible Tower Installation and Reference Guide.
When creating a HA deployment of Tower, consider the following factors:
Tower is designed as a unit.
Only those parts explicitly mentioned as being supported as external services are swappable for external versions of those services. Just as Tower does not support swapping out Django for Flask, Apache for lighttpd, or PostgreSQL for Oracle or MSSQL, it does not support replacing MongoDB with a different component.
Tower servers need isolation.
If the primary and secondary Tower services share a physical host, a network, or potentially a datacenter, your infrastructure has a single point of failure. You should locate the Tower servers such that distribution occurs in a manner consistent with other services that you make available across your infrastructure. If your infrastructure is already using features such as Availability Zones in your cloud provider, having Tower distributed across Zones as well makes sense.
The database require replication.
If Tower runs in an HA mode, but the database is not run in an HA or replicated mode, you still have a single point of failure for your Tower infrastructure. The Tower installer will not set up database replication; instead, it prompts for database connection details to an existing database (which needs replication).
Choose a database replication strategy that is appropriate for your deployment.
- For the general case of PostgreSQL, refer to the PostgreSQL documentation or the PostgreSQL Wiki.
- For deployments using Amazon’s RDS, refer to the Amazon documentation.
- For deployments using MongoDB, refer to the MongoDB Manual.
Tower instances must maintain reasonable connections to the database.
Tower both queries and writes to the database frequently; good locality between the Tower server and the database replicas is critical to ensure performance.
Source Control is necessary.
To use playbooks stored locally on the Tower server (rather than set to check out from source control), you must ensure synchronization between the primary and secondary Tower instances. Using playbooks in source control alleviates this problem.
When using SCM Projects, a best practices approach is setting the Update on Launch flag on the job template. This ensures that checkouts occur each time the playbook launches and that newly promoted secondary instances have up-to-date copies of the project content. When a secondary instance is promoted, a project_update for all SCM managed projects in the database is triggered. This provides Tower with copies of all project playbooks.
A consistent Tower hostname for clients and users.
Between Tower users’ habits, Tower provisioning callbacks, and Tower API integrations, keep the Tower hostname that users and clients use constant. In a HA deployment, use a reverse proxy or a DNS CNAME. The CNAME is strongly preferred due to the websocket connection Tower uses for real-time output.
When in HA mode, the remote Postgres and MongoDB version requirements are Postgresql 9.4.x and mongodb 3.0.x.
Postgresql 9.4.x and Mongodb 3.0.x are also required if Tower is running locally. With local setups, Tower handles the installation of these services. You should ensure that these are setup correctly if working in a remote setup.
For help allowing remote access to your Postgresql server, refer to: http://www.thegeekstuff.com/2014/02/enable-remote-postgresql-connection/
For example, an HA configuration for an infrastructure consisting of three datacenters places a Tower server and a replicated database in each datacenter. Clients accessing Tower use a DNS CNAME which points to the address of the current primary Tower instance.
For information determining size requirements for a MongoDB setup, refer to Requirements in the Ansible Tower Installation and Reference Guide.
Note
Instances have been reported where reusing the external DB during subsequent HA installations causes installation failures.
For example, say that you performed an HA installation. Next, say that you needed to do this again and performed a second HA installation reusing the same external database, only this subsequent installation failed.
When setting up an external HA database which has been used in a prior installation, the HA database must be manually cleared before any additional installations can succeed.
The Tower service runs on both primary and secondary instances. The primary instance accepts requests or run jobs, while the secondary instances do not.
Connection attempts to the web interface or API of a secondary Tower server redirect to the primary Tower instance.
When changing the configuration of a primary instance after installation, apply these changes to the secondary instances as well.
Examples of these changes would be:
Updates to /etc/tower/conf.d/ha.py
If you have configured LDAP or customized logging in
/etc/tower/conf.d/ldap.py
, you will need to reflect these changes in
/etc/tower/conf.d/ldap.py
on your secondary instances as well.
Updating the Tower license
Any secondary instance of Tower requires a valid license to run properly when promoted to a primary instance. Copy the license from the primary node at any time or install it via the normal license installation mechanism after the instance promotes to primary status.
Note
Users of older versions of Tower should update /etc/tower/settings.py
instead of files within /etc/tower/conf.d/
.
To see the HA configuration of Tower, you can query the ping
endpoint of the Tower REST API. To do this via Tower’s built in API
browser, go to https://<Tower server name>/api/v1/ping
. You can go
to this specific URL on either the primary or secondary nodes.
An example return from this API call would be (in JSON format):
HTTP 200 OK
Content-Type: application/json
Vary: Accept
Allow: GET, HEAD, OPTIONS
X-API-Time: 0.008s
{
"instances": {
"primary": "192.168.122.158",
"secondaries": [
"192.168.122.109",
"192.168.122.26"
]
},
"ha": true,
"role": "primary",
"version": "2.1.4"
}
It contains the following fields.
To promote a secondary instance to be the new primary instance (also
known as initiating failover), use the tower_manage
command.
To make a running secondary node the primary node, log on to the desired
new primary node and run the update_instance
command of tower_manage
as follows:
root@localhost:~$ tower-manage update_instance --primary
Successfully updated instance role (uuid="ec2dc2ac-7c4b-9b7e-b01f-0b7c30d0b0ab",hostname="localhost",role="primary")
The current primary instance changes to be a secondary.
Note
Secondary nodes need a valid Tower license at /etc/tower/license
to function as a proper primary instance. Copy the license from the primary node at any time or install it via the normal license installation mechanism after the instance promotes to primary status.
On failover, queued or running jobs in the database are marked as failed.
Tower does not attempt any health checks between primary or secondary nodes to do automatic failover in case of the loss of the primary node. You should use an external monitoring or heartbeat tool combined with tower_manage
for these system health checks. Use of the /ping
API endpoint could help.
Caution
HA setup has been designed and tested only for LAN-connected configurations. If you are confident about the quality of your WAN, and research the downsides and possible failure scenarios of a replicated postgresql environment across a WAN, there is nothing technically stopping you from doing it this way. That said, this is outside of our tested configuration, it not recommended, and is unsupported.
You cannot decommission a current primary Tower instance without first selecting a new primary.
If you need to decommission a secondary instance of Tower, log onto the
secondary node and run the remove_instance
command of
tower_manage
as follows:
root@localhost:~$ tower-manage remove_instance --hostname tower2.example.com
Instance removed (changed: True).
Replace tower2.example.com
with the registered name (IP address or
hostname) of the Tower instance you are removing from the list of
secondaries.
You can then shutdown the Tower service on the decommissioned secondary node.