Documentation

9. Execution Environments

Tower allows you to execute jobs via ansible playbook runs directly on a member of the cluster or on a pre-provisioned isolated node. In Ansible Tower 3.6, you can execute jobs in a container group only as-needed per playbook. For more information, see Container Groups towards the end of this section.

9.1. Instance Groups

Instances can be grouped into one or more Instance Groups. Instance groups can be assigned to one or more of the resources listed below.

  • Organizations
  • Inventories
  • Job Templates

When a job associated with one of the resources executes, it will be assigned to the instance group associated with the resource. During the execution process, instance groups associated with Job Templates are checked before those associated with Inventories. Similarly, instance groups associated with Inventories are checked before those associated with Organizations. Thus, Instance Group assignments for the three resources form a hierarchy: Job Template > Inventory > Organization.

Here are some of the things to consider when working with instance groups:

  • You may optionally define other groups and group instances in those groups. These groups should be prefixed with instance_group_. Instances are not required to be in the tower group alongside other instance_group_ groups, but one instance must be present in the tower group. Technically, tower is a group like any other instance_group_ group, but it must always be present, and if a specific group is not associated with a specific resource, then job execution will always fall back to the tower group. The tower instance group always exists (it cannot be deleted nor renamed).
  • Do not create a group named instance_group_tower.
  • Do not name any instance the same as a group name.

9.1.1. Configuring Instance Groups from the API

Instance groups can be created by POSTing to /api/v2/instance_groups as a system administrator.

Once created, instances can be associated with an instance group with:

HTTP POST /api/v2/instance_groups/x/instances/ {'id': y}`

An instance that is added to an instance group will automatically reconfigure itself to listen on the group’s work queue. See the following section, Instance group policies, for more details.

9.1.2. Instance group policies

You can configure Tower instances to automatically join Instance Groups when they come online by defining a policy. These policies are evaluated for every new instance that comes online.

Instance Group Policies are controlled by three optional fields on an Instance Group:

  • policy_instance_percentage: This is a number between 0 - 100. It guarantees that this percentage of active Tower instances will be added to this Instance Group. As new instances come online, if the number of Instances in this group relative to the total number of instances is less than the given percentage, then new ones will be added until the percentage condition is satisfied.
  • policy_instance_minimum: This policy attempts to keep at least this many instances in the Instance Group. If the number of available instances is lower than this minimum, then all instances will be placed in this Instance Group.
  • policy_instance_list: This is a fixed list of instance names to always include in this Instance Group.

The Instance Groups list view from the Ansible Tower User Interface provides a summary of the capacity levels for each instance group according to instance group policies:

Instance Group policy example

9.1.3. Notable policy considerations

  • policy_instance_percentage and policy_instance_minimum both set minimum allocations. The rule that results in more instances assigned to the group will take effect. For example, if you have a policy_instance_percentage of 50% and a policy_instance_minimum of 2 and you start 6 instances, 3 of them would be assigned to the Instance Group. If you reduce the number of total instances in the cluster to 2, then both of them would be assigned to the Instance Group to satisfy policy_instance_minimum. This way, you can set a lower bound on the amount of available resources.
  • Policies do not actively prevent instances from being associated with multiple Instance Groups, but this can effectively be achieved by making the percentages add up to 100. If you have 4 instance groups, assign each a percentage value of 25 and the instances will be distributed among them with no overlap.

9.1.4. Manually pinning instances to specific groups

If you have a special instance which needs to be exclusively assigned to a specific Instance Group but don’t want it to automatically join other groups via “percentage” or “minimum” policies:

  1. Add the instance to one or more Instance Groups’ policy_instance_list
  2. Update the instance’s managed_by_policy property to be False.

This will prevent the Instance from being automatically added to other groups based on percentage and minimum policy; it will only belong to the groups you’ve manually assigned it to:

HTTP PATCH /api/v2/instance_groups/N/
{
"policy_instance_list": ["special-instance"]
}

HTTP PATCH /api/v2/instances/X/
{
"managed_by_policy": False
}

9.1.5. Job Runtime Behavior

When you run a job associated with a instance group, some behaviors worth noting are:

  • If a cluster is divided into separate instance groups, then the behavior is similar to the cluster as a whole. If two instances are assigned to a group then either one is just as likely to receive a job as any other in the same group.
  • As Tower instances are brought online, it effectively expands the work capacity of the Tower system. If those instances are also placed into instance groups, then they also expand that group’s capacity. If an instance is performing work and it is a member of multiple groups, then capacity will be reduced from all groups for which it is a member. De-provisioning an instance will remove capacity from the cluster wherever that instance was assigned. See the Deprovision Instances section for more detail.

Note

Not all instances are required to be provisioned with an equal capacity.

9.1.6. Control Where a Job Runs

If any of the job template, inventory, or organization has instance groups associated with them, a job ran from that job template will not be eligible for the default behavior. That means that if all of the instances inside of the instance groups associated with these 3 resources are out of capacity, the job will remain in the pending state until capacity becomes available.

The order of preference in determining which instance group to submit the job to is as follows:

  1. job template
  2. inventory
  3. organization (by way of project)

If instance groups are associated with the job template, and all of these are at capacity, then the job will be submitted to instance groups specified on inventory, and then organization. Jobs should execute in those groups in preferential order as resources are available.

The global tower group can still be associated with a resource, just like any of the custom instance groups defined in the playbook. This can be used to specify a preferred instance group on the job template or inventory, but still allow the job to be submitted to any instance if those are out of capacity.

As an example, by associating group_a with a Job Template and also associating the tower group with its inventory, you allow the tower group to be used as a fallback in case group_a gets out of capacity.

In addition, it is possible to not associate an instance group with one resource but designate another resource as the fallback. For example, not associating an instance group with a job template and have it fall back to the inventory and/or the organization’s instance group.

This presents two other great use cases:

  1. Associating instance groups with an inventory (omitting assigning the job template to an instance group) will allow the user to ensure that any playbook run against a specific inventory will run only on the group associated with it. This can be super useful in the situation where only those instances have a direct link to the managed nodes.
  2. An administrator can assign instance groups to organizations. This effectively allows the administrator to segment out the entire infrastructure and guarantee that each organization has capacity to run jobs without interfering with any other organization’s ability to run jobs.

Likewise, an administrator could assign multiple groups to each organization as desired, as in the following scenario:

  • There are three instance groups: A, B, and C. There are two organizations: Org1 and Org2.
  • The administrator assigns group A to Org1, group B to Org2 and then assign group C to both Org1 and Org2 as an overflow for any extra capacity that may be needed.
  • The organization administrators are then free to assign inventory or job templates to whichever group they want (or just let them inherit the default order from the organization).

Instance Group example

Arranging resources in this way offers a lot of flexibility. Also, you can create instance groups with only one instance, thus allowing you to direct work towards a very specific Host in the Tower cluster.

9.1.7. Deprovision Instance Groups

Re-running the setup playbook does not automatically deprovision instances since clusters do not currently distinguish between an instance that was taken offline intentionally or due to failure. Instead, shut down all services on the Tower instance and then run the deprovisioning tool from any other instance:

  1. Shut down the instance or stop the service with the command, ansible-tower-service stop.

  2. Run the deprovision command $ awx-manage deprovision_instance —-hostname=<name used in inventory file> from another instance to remove it from the Tower cluster registry AND the RabbitMQ cluster registry.

    Example: awx-manage deprovision_instance -—hostname=hostB

Similarly, deprovisioning instance groups in Tower does not automatically deprovision or remove instance groups, even though re-provisioning will often cause these to be unused. They may still show up in API endpoints and stats monitoring. These groups can be removed with the following command:

Example: awx-manage unregister_queue --queuename=<name>

Removing an instance’s membership from an instance group in the inventory file and re-running the setup playbook does not ensure the instance won’t be added back to a group. To be sure that an instance will not be added back to a group, remove via the API and also remove it in your inventory file, or you can stop defining instance groups in the inventory file altogether. You can also manage instance group topology through the Ansible Tower User Interface. For more information on managing instance groups in the UI, refer to Instance Groups in the Ansible Tower User Guide.

9.1.8. Isolated Instance Groups

Tower has the ability to optionally define isolated groups inside security-restricted networking zones from which to run jobs and ad hoc commands. Instances in these groups will not have a full installation of Tower, but will have a minimal set of utilities used to run jobs. Isolated groups must be specified in the inventory file prefixed with isolated_group_. Below is an example of an inventory file for an isolated instance group.

[tower]
towerA
towerB
towerC

[instance_group_security]
towerB
towerC

[isolated_group_govcloud]
isolatedA
isolatedB

[isolated_group_govcloud:vars]
controller=security

In the isolated instance group model, “controller” instances interact with “isolated” instances via a series of Ansible playbooks over SSH. At installation time, by default, a randomized RSA key is generated and distributed as an authorized key to all “isolated” instances. The private half of the key is encrypted and stored within the Tower database, and is used to authenticate from “controller” instances to “isolated” instances when jobs are run.

When a job is scheduled to run on an “isolated” instance:

  • The “controller” instance compiles metadata required to run the job and copies it to the “isolated” instance.
  • Once the metadata has been synchronized to the isolated host, the “controller” instance starts a process on the “isolated” instance, which consumes the metadata and starts running ansible/ansible-playbook. As the playbook runs, job artifacts (such as stdout and job events) are written to disk on the “isolated” instance.
  • While the job runs on the “isolated” instance, the “controller” instance periodically copies job artifacts (stdout and job events) from the “isolated” instance. It consumes these until the job finishes running on the “isolated” instance.

Note

Controller nodes fail if they go offline in the middle of an isolated run. If a Tower node restarts, or the dispatcher stops during playbook runs, jobs running on that node fails and won’t start again when the dispatcher comes online.

Isolated groups (nodes) may be created in a way that allow them to exist inside of a VPC with security rules that only permit the instances in its controller group to access them; only ingress SSH traffic from “controller” instances to “isolated” instances is required. When provisioning isolated nodes, your install machine needs to be able to have connectivity to the isolated nodes. In cases where an isolated node is not directly accessible but can be reached indirectly through other hosts, you can designate a “jump host” by using ProxyCommand in your SSH configuration to specify the jump host and then run the installer.

Isolated nodes daisy chain example

Jump Host example

The recommended system configurations with isolated groups are as follows:

  • Do not create a group named isolated_group_tower.

  • Do not put any isolated instances inside the tower group or other ordinary instance groups.

  • Define the controller variable as either a group variable or as a host variable on all the instances in the isolated group. Do not allow isolated instances in the same group to have a different value for this variable - the behavior in this case cannot be predicted.

  • Do not put an isolated instance in more than one isolated group.

  • Do not put an instance in both ordinary groups and isolated groups.

  • Isolated instances can be installed on RHEL 7 and later.

  • The following durations associated with isolated groups can be configured in the Jobs tab of the Settings (settings) menu:

    • Isolated Status Check Interval: 30 seconds is the default amount of time set to sleep between status checks for jobs running on isolated instances.
    • Isolated Launch Timeout: 600 seconds (10 mins) is the default timeout for launching jobs on isolated instances. This includes the time needed to copy source control files (playbooks) to the isolated instance.
    • Isolated Connection Timeout: 10 seconds is the default Ansible SSH connection timeout when communicating with isolated instances. This value should be substantially greater than the expected network latency.

Isolated groups are labeled accordingly in the Instance Groups list view of the Tower User Interface.

_images/instance-group-list-with-isolated-group.png

9.2. Container Groups

Ansible Tower 3.6 introduces the concept of Container Groups that allow you to execute jobs in Tower regardless of whether Tower is installed as a standalone, in a virtual environment, or in a container. Container groups act as a pool of resources within a virtual environment. You can create instance groups to point to an OpenShift or Kubernetes container, which are job environments that are provisioned on-demand as a Pod that exists only for the duration of the playbook run. This is known as the ephemeral execution model and ensures a clean environment for every job run.

In some cases, it is desirable to have the execution environment be “always-on”, which is configured through the creation of an instance.

9.2.1. Create a container group

A ContainerGroup is simply an InstanceGroup that has an associated Credential that allows for connecting to an OpenShift or Kubernetes cluster. To set up a container group on Kubernetes or OpenShift, you must first have the following:

  • A namespace you can launch into (there is a “default” namespace but most likely varied by customer)
  • A service account that has the roles that allow it to launch and manage Pods in this namespace
  • A token associated with that service account (Kubernetes or OpenShift Bearer Token)
  • A CA certificate associated with the cluster

To create a container group:

  1. Use the Tower User Interface to create an OpenShift or Kubernetes API bearer token credential that will be used with your container group, see Add a New Credential in the Ansible Tower User Guide for detail.
  2. Create a new container group by navigating to the Instance Groups configuration window instance group icon.
  1. Click the add button and select Create Container Group.

IG - create new CG

  1. Enter a name for your new container group and select the credential previously created to associate it to the container group.

9.2.2. Customize the Pod spec

Tower provides a simple default Pod specification, however, you can provide a custom YAML (or JSON) document that overrides the default Pod spec. This field uses any custom fields (i.e., image or namespace) that can be “serialized” as valid Pod JSON or YAML. A full list of options can be found in the Kubernetes documentation.

To customize the Pod spec, specify the namespace in the Pod Spec Override field by using the toggle to enable and expand the Pod Spec Override field and click Save when done.

IG - CG customize pod

You may provide additional customizations, if needed. Click Expand to view the entire customization window.

_images/instance-group-customize-cg-pod-expanded.png

Once the container group is successfully created, the Details tab of the newly created container group remains, which allows you to review and edit your container group information. This is the same menu that is opened if the Edit (edit-button) button is clicked from the Instance Group link. You can also edit Instances and review Jobs associated with this instance group.

IG - example CG successfully created

Container groups and instance groups are labeled accordingly.

Note

Despite the fact that customers have custom Pod specs, upgrades may be difficult if the default pod_spec changes. Most any manifest can be applied to any namespace, with the namespace specified separately, most likely you will only need to override the namespace. Similarly, pinning a default image for different releases of Tower to different versions of the default job runner container is tricky. If the default image is specified in the Pod spec, then upgrades do not pick up the new default changes are made to the default Pod spec.

9.2.3. Verify container group functions

To verify the deployment and termination of your container:

  1. Create a mock inventory and associate the container group to it by populating the name of the container group in the Instance Group field. See Add a new inventory in the Ansible Tower User Guide for detail.

Dummy inventory

  1. Create “localhost” host in inventory with variables:
{'ansible_host': '127.0.0.1', 'ansible_connection': 'local'}

Inventory with localhost

  1. Launch an ad hoc job against the localhost using the ping or setup module. Even though the Machine Credential field is required, it does not matter which one is selected for this simple test.

Launch inventory with localhost

You can see in the jobs detail view the container was reached successfully using one of ad hoc jobs.

Inventory with localhost ping success

If you have an OpenShift or Kubernetes UI, you can see Pods appear and disappear as they deploy and terminate. Alternatively, you can use the CLI to perform a get pod operation on your namespace to watch these same events occurring in real-time.

9.2.4. View container group jobs

When you run a job associated with a container group, you can see the details of that job in the Jobs tab of the Instance Group window and then click Expanded to expand the view to show details about each job.

NEED TO REPLACE IMAGE with EXECUTION NODE IN SCREEN SHOT

IG - instances jobs

Each job displays the job status, ID, and name; type of job, time started and completed, who started the job; and which template, inventory, project, and credential were used.

9.2.5. Kubernetes failure conditions

When running a container group and Kubernetes responds that the resource quota has been exceeded, Tower keeps the job in pending state. Other failures result in the traceback of the Error Details field showing the failure reason, similar to the example here:

_images/instance-group-cg-job-details-error.png

9.2.6. Container group forks and capacity limits

TBD