8. 容器和实例组¶

控制器允许您通过 ansible playbook 直接在集群成员或具有必要服务帐户的 Openshift 集群的命名空间中执行作业。您可以在每个 playbook 中根据需要在容器组中执行作业。如需更多信息，请参阅本节末尾的容器组。

对于 execution environments，请参阅 Automation Controller User Guide 中的 Execution Environments。

8.1. 实例组¶

实例可以被分为一个或多个不同的实例组。实例组可以分配给下面列出的一个或多个资源。

机构（Organization）
清单（Inventory）
作业模板（Job Template）

当与其中一个资源关联的作业执行时，它将被分配给与该资源关联的实例组。在执行过程中，会先检查与作业模板关联的实例组，然后检查与清单关联的实例组。类似地，先检查与清单关联的实例组，然后检查与机构关联的实例组。因此，三种资源的实例组分配形成一个层级结构：作业模板 > 清单 > 机构。

在处理实例组时需要考虑以下几个问题：

您也可以在这些组中定义其他组和组实例。这些组需要带有前缀 instance_group_。实例需要在 instance_group_ 组以及 automationcontroller 或 execution_nodes 组中。在一个集群设置中，至少有一个实例 **必须**存在于 automationcontroller 组中，该组会在 API 实例组中显示为 controlplane。示例情景请参阅 automationcontroller 组策略。
default API 实例组会自动创建用于运行作业的所有节点。从技术上讲，它和任何其他实例组一样，但如果一个特定的实例组没有与一个特定资源相关联，则作业执行始终回退到 default 实例组。default 实例组始终存在（它不能被删除或重命名）。
请勿创建名为 instance_group_default 的组。
请勿将任何实例命名为与组名相同。

8.1.1. `automationcontroller` 组策略¶

在定义节点时使用以下条件：

automationcontroller 组中的节点可以定义 node_type hostvar 为 hybrid （默认）或 control
execution_nodes 组中的节点可以定义 node_type hostvar 为 execution （默认）或 hop

您可以在清单文件中定义自定义组，将组命名为 instance_group_*，其中 * 成为 API 中的组名称。或者，您可以在安装完成后在 API 中创建自定义实例组。

当前的行为预期 instance_group_* 的成员需要是 automationcontroller 或 execution_nodes 组的成员。请考虑这个示例场景：

[automationcontroller]
126-addr.tatu.home ansible_host=192.168.111.126  node_type=control

[automationcontroller:vars]
peers=execution_nodes

[execution_nodes]

[instance_group_test]
110-addr.tatu.home ansible_host=192.168.111.110 receptor_listener_port=8928

运行安装程序后，您会看到以下错误：

TASK [ansible.automation_platform_installer.check_config_static : Validate mesh topology] ***
fatal: [126-addr.tatu.home -> localhost]: FAILED! => {"msg": "The host '110-addr.tatu.home' is not present in either [automationcontroller] or [execution_nodes]"}

要修复这个问题，您可以将 110-addr.tatu.home 移到 execution_node 组。

[automationcontroller]
126-addr.tatu.home ansible_host=192.168.111.126  node_type=control

[automationcontroller:vars]
peers=execution_nodes

[execution_nodes]
110-addr.tatu.home ansible_host=192.168.111.110 receptor_listener_port=8928

[instance_group_test]
110-addr.tatu.home

这会导致：

TASK [ansible.automation_platform_installer.check_config_static : Validate mesh topology] ***
ok: [126-addr.tatu.home -> localhost] => {"changed": false, "mesh": {"110-addr.tatu.home": {"node_type": "execution", "peers": [], "receptor_control_filename": "receptor.sock", "receptor_control_service_name": "control", "receptor_listener": true, "receptor_listener_port": 8928, "receptor_listener_protocol": "tcp", "receptor_log_level": "info"}, "126-addr.tatu.home": {"node_type": "control", "peers": ["110-addr.tatu.home"], "receptor_control_filename": "receptor.sock", "receptor_control_service_name": "control", "receptor_listener": false, "receptor_listener_port": 27199, "receptor_listener_protocol": "tcp", "receptor_log_level": "info"}}}

当从控制器 4.0 或更早版本升级时，旧的 instance_group_ 成员最有非常可能会安装 awx 代码，这会导致该节点放置在 automationcontroller 组中。

8.1.2. 使用 API 配置实例组¶

做为一个系统管理员，可以通过向 /api/v2/instance_groups 发送 POST 来创建示实例组。

创建后，就可以把实例与实例组进行关联：

HTTP POST /api/v2/instance_groups/x/instances/ {'id': y}`

添加到实例组中的实例将自动重新配置自身，以侦听组的工作队列。如需更多详情，请参阅以下部分实例组策略。

8.1.3. 实例组策略¶

您可以通过定义 policy，将控制器实例配置为在其上线时自动加入实例组。这些策略会针对上线的每个新实例进行评估。

实例组策略由 Instance Group 的三个可选字段控制：

policy_instance_percentage：一个从 0 到 100 之间的数字。它保证这个百分比的活跃控制器实例会添加到此实例组。随着新实例上线，如果组中实例的数量相对于实例的总数量小于给定的百分比，则将添加新实例，直到满足百分比条件为止。
policy_instance_minimum：此策略用于将至少这么多的实例保留在实例组中。如果可用实例的数量低于这个最小值，则所有实例都会放置在这个实例组中。
policy_instance_list：这是需要始终包括在此实例组中的实例名称列表。

automation controller 用户界面中的实例组列表视图根据实例组策略为每个实例组提供容量级别的汇总：

Instance Group policy example

8.1.4. 重要策略注意事项¶

policy_instance_percentage 和 policy_instance_minimum 均设置最小分配。其中会为组分配更多实例的设置将生效。例如，policy_instance_percentage 为 50％，policy_instance_minimum 的设置为 2，如果您启动了 6 个实例，则其中 3 个将被分配给实例组。如果将集群中的实例总数减到 2 个，则这两个实例都将分配给实例组（满足 policy_instance_minimum 的要求）。这样，您可以对可用资源的数量设置下限。
策略不会主动阻止实例与多个实例组关联，但可通过设置百分比使其总和为 100 来实现。例如，有 4 个实例组，为每个实例组分配一个百分比值 25，则实例在实例组中的分布就不会相互重叠。

8.1.5. 手动将实例固定到特定的组中¶

如果您有一个特殊的实例，需要把它专门分配给特定的实例组，但不想让它通过“百分比”或“最小”策略自动加入其他组：

将实例添加到一个或多个实例组的 policy_instance_list 中
将实例的 managed_by_policy 属性更新为 False。

这可防止实例根据百分比和最低策略被自动添加到其他组中；它只属于您手动将其分配到的组：

HTTP PATCH /api/v2/instance_groups/N/
{
"policy_instance_list": ["special-instance"]
}

HTTP PATCH /api/v2/instances/X/
{
"managed_by_policy": False
}

8.1.6. 作业运行时行为¶

当您运行与实例组关联的作业时，需要注意的一些行为有：

如果将集群划分为不同的实例组，则行为与整个集群类似。如果将两个实例分配给一个组，则其中任何一个实例都会像同一组中的另一个实例一样接收作业。
随着更多控制器实例上线，它可以有效地扩展系统的工作容量。如果这些实例也被放入实例组中，则它们也会扩展该组的容量。如果某个实例正在执行工作，且它是多个组的成员，则它的容量就会从它所属的所有组中减少。取消置备实例将从实例分配到的集群中删除容量。如需更多详情，请参阅:ref:ag_cluster_deprovision。

注解

不是所有实例都需要置备相同的容量。

8.1.7. 控制作业运行位置¶

如果任何作业模板、清单或机构都有与其关联的实例组，则从该作业模板运行的作业将无法满足默认行为的要求。这意味着，如果与这 3 个资源关联的实例组中的所有实例都容量用尽，则该作业将保持待处理状态，直到有可用容量为止。

决定哪个实例组提交该作业的优先顺序如下：

作业模板
清单
机构（通过项目）

如果实例组与作业模板关联，并且所有这些实例组都满负荷，则该作业将提交到清单上指定的实例组，然后提交到组织。在资源可用时，作业应当在这些组中按优先顺序执行。

全局 default 组仍然可以与资源关联，就像 playbook 中定义的任何自定义实例组一样。这可用于指定作业模板或清单上的首选实例组，但仍然允许在用尽容量时将作业提交到任何实例。

例如，通过将 group_a 与作业模板关联并将 default 组与其清单关联，您可以在 group_a 用尽容量时将 default 组用作回退 (fallback)。

另外，可以不将实例组与一个资源关联，而是将另一个资源指定为回退。例如，不将实例组与作业模板关联，并把它回退到清单和/或机构的实例组。

这里提供了两个其他的经典用例：

将实例组与清单关联（忽略将作业模板分配给实例组）可确保针对一个特定清单运行的 playbook 只在与其关联的组上运行。这对于只有实例组中的实例才可以与受管节点直接连接的情况来说很有用。
管理员可为机构分配实例组。这样，管理员就可以有效地对整个基础架构进行分段，从而确保每个机构都具备运行作业的容量，而不会影响到其它机构运行作业的能力。

同样，管理员也可以根据需要为每个机构分配多个组，如以下情境中所示：

有三个实例组：A、B 和 C。有两个机构：Org1 和 Org2。

管理员将 A 组分配给 Org1，B 组分配给 Org2，然后将 C 组分配给 Org1 和 Org2，作为可能需要的额外容量。

然后，机构管理员可以自由地为他们想要的组分配清单或作业模板（或者只是允许它们从机构继承默认顺序）。

以这种方式安排资源非常灵活。另外，您还可以创建只有一个实例的实例组，从而允许您将工作直接指向控制器集群中的特定主机。

8.1.8. 取消置备实例组¶

重新运行设置 (setup) playbook 不会自动取消置备实例，因为集群目前不会区分有意关闭的实例与因故障而关闭的实例。而是，关闭控制器实例上的所有服务，然后从任何其他实例运行取消置备工具：

使用命令 automation-controller-service stop 关闭实例或停止服务。
从另一个实例运行取消置备命令 $ awx-manage deprovision_instance --hostname=<name used in inventory file>，将其从控制器集群 registry 中删除。

示例：awx-manage deprovision_instance --hostname=hostB

同样，取消置备控制器中的实例组不会自动取消置备或删除实例组，尽管重新置备通常会导致这些实例组没有被使用。它们可能仍然会出现在 API 端点和静态监控中。这些组可以通过以下命令删除：

示例：awx-manage unregister_queue --queuename=<name>

从清单文件中的实例组删除实例的成员资格不能确保，在重新运行设置 (setup) playbook 时实例不会被添加回组中。要实现这一点，请通过 API 将不需要的实例删除，同时也从清单文件中将其删除。或者，您可以完全停止定义清单文件中的实例组。您还可以通过 automation controller 用户界面来管理实例组拓扑。如需有关在 UI 中管理实例组的更多信息，请参阅 Automation Controller User Guide 中的 Instance Groups。

注解

If you have isolated instance groups created in older versions of the controller (3.8.x and earlier) and want to migrate them to execution nodes to make them compatible for use with the automation mesh architecture, see Migrate isolated instances to execution nodes in the Ansible Automation Platform Upgrade and Migration Guide.

8.2. 容器组¶

Ansible Automation Platform supports Container Groups, which allow you to execute jobs in the controller regardless of whether the controller is installed as a standalone, in a virtual environment, or in a container. Container groups act as a pool of resources within a virtual environment. You can create instance groups to point to an OpenShift container, which are job environments that are provisioned on-demand as a Pod that exists only for the duration of the playbook run. This is known as the ephemeral execution model and ensures a clean environment for every job run.

在某些情况下，需要容器组为"始终开启"，这通过创建实例来配置。

注解

从 automation controller 4.0 之前的版本升级的容器组将恢复到默认状态并完全删除旧的 pod 定义，清除迁移中的所有自定义 pod 定义。

容器镜像与 execution environments 的不同之处在于 execution environments 是容器镜像，且不使用虚拟环境。如需更多详情，请参阅 Automation Controller User Guide 中的 Execution Environments。

8.2.1. 创建容器组¶

A ContainerGroup is a type of InstanceGroup that has an associated Credential that allows for connecting to an OpenShift cluster. To set up a container group, you must first have the following:

A namespace you can launch into (every cluster has a “default” namespace, but you may want to use a specific namespace)
具有允许在该命名空间中启动和管理 Pod 的角色的服务帐户
如果您要在私有 registry 中使用 execution environments，并在自动化控制器中关联了与其关联的 Container Registry 凭证，则服务帐户还需要这些角色来获取、创建和删除命名空间中的 secret。如果您不想将这些角色赋予服务帐户，可以预先创建 ImagePullSecrets 并在 ContainerGroup 的 pod 规格中指定它们。在这种情况下，execution environment 不能关联 Container Registry 凭证，或者控制器会尝试为您在命名空间中创建 secret。
A token associated with that service account (OpenShift or Kubernetes Bearer Token)
与集群关联的 CA 证书

This section describes creating a Service Account in an Openshift cluster (or K8s) in order to be used to run jobs in a container group via automation controller. After the Service Account is created, its credentials are provided to the controller in the form of an Openshift or Kubernetes API bearer token credential. Below describes how to create a service account and collect the needed information for configuring automation controller.

To configure the controller:

To create a service account, you may download and use this sample service account, containergroup sa and modify it as needed to obtain the above credentials.
Apply the configuration from containergroup-sa.yml:
```
oc apply -f containergroup-sa.yml
```

Get the secret name associated with the service account:

export SA_SECRET=$(oc get sa containergroup-service-account -o json | jq '.secrets[0].name' | tr -d '"')

Get the token from the secret:

oc get secret $(echo ${SA_SECRET}) -o json | jq '.data.token' | xargs | base64 --decode > containergroup-sa.token

Get the CA cert:

oc get secret $SA_SECRET -o json | jq '.data["ca.crt"]' | xargs | base64 --decode > containergroup-ca.crt

Use the contents of containergroup-sa.token and containergroup-ca.crt to provide the information for the OpenShift 或 Kubernetes API 持有者令牌 required for the container group.

要创建容器组：

Use the controller user interface to create an OpenShift 或 Kubernetes API 持有者令牌 credential that will be used with your container group, see 添加新凭证 in the Automation Controller User Guide for detail.
从左面的导航栏中点 Instance Groups 进入 Instance Groups 配置窗口，创建一个新容器组。
点 Add 按钮并选择 Create Container Group。

IG - create new CG

输入新容器组的名称，并选择之前创建的凭证将其与容器组关联。

8.2.2. 自定义 Pod 规格¶

Ansible Automation Platform provides a simple default Pod specification, however, you can provide a custom YAML (or JSON) document that overrides the default Pod spec. This field uses any custom fields (i.e. ImagePullSecrets) that can be "serialized" as valid Pod JSON or YAML. A full list of options can be found in the OpenShift documentation.

要自定义 Pod 规格，使用切换功能启用并扩展 Pod Spec Override 字段，在 Pod Spec Override 字段中指定命名空间，完成后点击 Save。

IG - CG customize pod

如果需要，您可以提供额外的定制功能。请点击 Expand 查看整个定制窗口。

_images/instance-group-customize-cg-pod-expanded.png

注解

在作业启动时使用的镜像由 execution environment 与作业关联。如果一个 Container Registry 凭证与 execution environment 关联，则控制器会尝试生成 ImagePullSecret 来拉取镜像。如果您不想授予服务帐户管理 secret 的权限，您必须预先创建 ImagePullSecret 并在 pod spec 中指定它，并忽略来自 execution environment 的任何凭证。

如需了解如何创建镜像 pull secret 的更多信息，请参阅 Red Hat Container Registry Authentication article 的 Allowing Pods to reference Images from other Secured Registries 部分。

成功创建容器组后，新创建的容器组的 Details 选项卡将保留，它可用于审核和编辑容器组信息。如果从 Instance Group 链接点击 Edit () 按钮，也打开此菜单。您也可以编辑 Instances 并查看与该实例组关联的 Jobs。

IG - example CG successfully created

对容器组和实例组进行相应的标记。

注解

尽管客户有自定义的 Pod 规格，如果默认 pod_spec 更改，则升级可能会较为困难。大多数清单都可应用于任何命名空间，命名空间单独指定，很可能您只需要覆盖命名空间。类似地，将不同版本平台的默认镜像固定到默认作业运行程序容器的不同版本较为复杂。如果默认镜像在 Pod 规格中指定，则升级不会选择针对默认 Pod 规格所做的新默认更改。

8.2.3. 验证容器组功能¶

验证容器的部署和终止：

通过在 Instance Group 字段中填充容器组的名称来创建 mock 清单，并将容器组与其关联。如需详情，请参阅 Automation Controller User Guide 中的添加新清单。

Dummy inventory

使用变量在清单中创建“本地主机”主机：

{'ansible_host': '127.0.0.1', 'ansible_connection': 'local'}

Inventory with localhost

使用 ping 或者 setup 模块对本地主机启动一个临时作业。尽管需要 Machine Credential，为此简单测试选择哪个选项无关紧要。

Launch inventory with localhost

_images/inventories-launch-adhoc-cg-test-localhost2.png

您可在作业详情视图中看到，已成功使用一个临时作业访问了该容器。

Inventory with localhost ping success

If you have an OpenShift UI, you can see Pods appear and disappear as they deploy and terminate. Alternatively, you can use the CLI to perform a get pod operation on your namespace to watch these same events occurring in real-time.

8.2.4. 查看容器组作业¶

When you run a job associated with a container group, you can see the details of that job in the Details view and its associated container group and the execution environment that spun up.

IG - instances jobs

8.2.5. Kubernetes API failure conditions¶

When running a container group and the Kubernetes API responds that the resource quota has been exceeded, the controller keeps the job in pending state. Other failures result in the traceback of the Error Details field showing the failure reason, similar to the example here:

_images/instance-group-cg-job-details-error.png

8.2.6. 容器容量限制¶

容器的容量限制和配额通过 Kubernetes API 中的对象定义：

To set limits on all pods within a given namespace, use the LimitRange object. Refer to the OpenShift documentation for Quotas and Limit Ranges.
To set limits directly on the pod definition launched by the controller, see Customize the Pod spec and refer to the OpenShift documentation to set the options to compute resources.

注解

容器组不使用常规节点所使用的容量算法。您需要在作业模板一级为实例明确设置 fork 的数量。如果在控制器中配置了 fork，则该设置会随同传递给容器。