community.aws.glue_crawler module – Manage an AWS Glue crawler

Note

This module is part of the community.aws collection (version 7.2.0).

You might already have this collection installed if you are using the ansible package. It is not included in ansible-core. To check whether it is installed, run ansible-galaxy collection list.

To install it, use: ansible-galaxy collection install community.aws. You need further requirements to be able to use this module, see Requirements for details.

To use it in a playbook, specify: community.aws.glue_crawler.

New in community.aws 4.1.0

Synopsis 

Manage an AWS Glue crawler. See https://aws.amazon.com/glue/ for details.
Prior to release 5.0.0 this module was called community.aws.aws_glue_crawler. The usage did not change.

Aliases: aws_glue_crawler

Requirements 

The below requirements are needed on the host that executes this module.

python >= 3.6
boto3 >= 1.26.0
botocore >= 1.29.0

Parameters 

Parameter	Comments
access_key aliases: aws_access_key_id, aws_access_key, ec2_access_key string	AWS access key ID. See the AWS documentation for more information about access tokens https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys. The `AWS_ACCESS_KEY_ID`, `AWS_ACCESS_KEY` or `EC2_ACCESS_KEY` environment variables may also be used in decreasing order of preference. The aws_access_key and profile options are mutually exclusive. The aws_access_key_id alias was added in release 5.1.0 for consistency with the AWS botocore SDK. The ec2_access_key alias has been deprecated and will be removed in a release after 2024-12-01. Support for the `EC2_ACCESS_KEY` environment variable has been deprecated and will be removed in a release after 2024-12-01.
aws_ca_bundle path	The location of a CA Bundle to use when validating SSL certificates. The `AWS_CA_BUNDLE` environment variable may also be used.
aws_config dictionary	A dictionary to modify the botocore configuration. Parameters can be found in the AWS documentation https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html#botocore.config.Config.
database_name string	The name of the database where results are written.
debug_botocore_endpoint_logs boolean	Use a `botocore.endpoint` logger to parse the unique (rather than total) `"resource:action"` API calls made during a task, outputing the set to the resource_actions key in the task results. Use the `aws_resource_action` callback to output to total list made during a playbook. The `ANSIBLE_DEBUG_BOTOCORE_LOGS` environment variable may also be used. Choices: `false` ← (default) `true`
description string	Description of the crawler being defined.
endpoint_url aliases: ec2_url, aws_endpoint_url, s3_url string	URL to connect to instead of the default AWS endpoints. While this can be used to connection to other AWS-compatible services the amazon.aws and community.aws collections are only tested against AWS. The `AWS_URL` or `EC2_URL` environment variables may also be used, in decreasing order of preference. The ec2_url and s3_url aliases have been deprecated and will be removed in a release after 2024-12-01. Support for the `EC2_URL` environment variable has been deprecated and will be removed in a release after 2024-12-01.
name string / required	The name you assign to this crawler definition. It must be unique in your account.
profile aliases: aws_profile string	A named AWS profile to use for authentication. See the AWS documentation for more information about named profiles https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-profiles.html. The `AWS_PROFILE` environment variable may also be used. The profile option is mutually exclusive with the aws_access_key, aws_secret_key and security_token options.
purge_tags boolean	If purge_tags=true and tags is set, existing tags will be purged from the resource to match exactly what is defined by tags parameter. If the tags parameter is not set then tags will not be modified, even if purge_tags=True. Tag keys beginning with `aws:` are reserved by Amazon and can not be modified. As such they will be ignored for the purposes of the purge_tags parameter. See the Amazon documentation for more information https://docs.aws.amazon.com/general/latest/gr/aws_tagging.html#tag-conventions. Choices: `false` `true` ← (default)
recrawl_policy dictionary	A policy that specifies whether to crawl the entire dataset again, or to crawl only folders that were added since the last crawler run.
recrawl_behavior string	Specifies whether to crawl the entire dataset again or to crawl only folders that were added since the last crawler run. Supported options are `CRAWL_EVERYTHING` and `CRAWL_NEW_FOLDERS_ONLY`.
region aliases: aws_region, ec2_region string	The AWS region to use. For global services such as IAM, Route53 and CloudFront, region is ignored. The `AWS_REGION` or `EC2_REGION` environment variables may also be used. See the Amazon AWS documentation for more information http://docs.aws.amazon.com/general/latest/gr/rande.html#ec2_region. The `ec2_region` alias has been deprecated and will be removed in a release after 2024-12-01 Support for the `EC2_REGION` environment variable has been deprecated and will be removed in a release after 2024-12-01.
role string	The name or ARN of the IAM role associated with this crawler. Required when state=present.
schema_change_policy dictionary	The policy for the crawler’s update and deletion behavior.
delete_behavior string	Defines the deletion behavior when the crawler finds a deleted object. Supported options are `LOG`, `DELETE_FROM_DATABASE`, and `DEPRECATE_IN_DATABASE`.
update_behavior string	Defines the update behavior when the crawler finds a changed schema.. Supported options are `LOG` and `UPDATE_IN_DATABASE`.
secret_key aliases: aws_secret_access_key, aws_secret_key, ec2_secret_key string	AWS secret access key. See the AWS documentation for more information about access tokens https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys. The `AWS_SECRET_ACCESS_KEY`, `AWS_SECRET_KEY`, or `EC2_SECRET_KEY` environment variables may also be used in decreasing order of preference. The secret_key and profile options are mutually exclusive. The aws_secret_access_key alias was added in release 5.1.0 for consistency with the AWS botocore SDK. The ec2_secret_key alias has been deprecated and will be removed in a release after 2024-12-01. Support for the `EC2_SECRET_KEY` environment variable has been deprecated and will be removed in a release after 2024-12-01.
session_token aliases: aws_session_token, security_token, aws_security_token, access_token string	AWS STS session token for use with temporary credentials. See the AWS documentation for more information about access tokens https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys. The `AWS_SESSION_TOKEN`, `AWS_SECURITY_TOKEN` or `EC2_SECURITY_TOKEN` environment variables may also be used in decreasing order of preference. The security_token and profile options are mutually exclusive. Aliases aws_session_token and session_token were added in release 3.2.0, with the parameter being renamed from security_token to session_token in release 6.0.0. The security_token, aws_security_token, and access_token aliases have been deprecated and will be removed in a release after 2024-12-01. Support for the `EC2_SECRET_KEY` and `AWS_SECURITY_TOKEN` environment variables has been deprecated and will be removed in a release after 2024-12-01.
state string / required	Create or delete the AWS Glue crawler. Choices: `"present"` `"absent"`
table_prefix string	The table prefix used for catalog tables that are created.
tags aliases: resource_tags dictionary	A dictionary representing the tags to be applied to the resource. If the tags parameter is not set then tags will not be modified.
targets dictionary	A list of targets to crawl. See example below. Required when state=present.
validate_certs boolean	When set to `false`, SSL certificates will not be validated for communication with the AWS APIs. Setting validate_certs=false is strongly discouraged, as an alternative, consider setting aws_ca_bundle instead. Choices: `false` `true` ← (default)

Notes 

Note

Caution: For modules, environment variables and configuration files are read from the Ansible ‘host’ context and not the ‘controller’ context. As such, files may need to be explicitly copied to the ‘host’. For lookup and connection plugins, environment variables and configuration files are read from the Ansible ‘controller’ context and not the ‘host’ context.
The AWS SDK (boto3) that Ansible uses may also read defaults for credentials and other settings, such as the region, from its configuration files in the Ansible ‘host’ context (typically ~/.aws/credentials). See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html for more information.

Examples 

# Note: These examples do not set authentication details, see the AWS Guide for details.

# Create an AWS Glue crawler
- community.aws.glue_crawler:
    name: my-glue-crawler
    database_name: my_database
    role: my-iam-role
    schema_change_policy:
      delete_behavior: DELETE_FROM_DATABASE
      update_behavior: UPDATE_IN_DATABASE
    recrawl_policy:
      recrawl_ehavior: CRAWL_EVERYTHING
    targets:
      S3Targets:
        - Path: "s3://my-bucket/prefix/folder/"
          ConnectionName: my-connection
          Exclusions:
            - "**.json"
            - "**.yml"
    state: present

# Delete an AWS Glue crawler
- community.aws.glue_crawler:
    name: my-glue-crawler
    state: absent

Return Values 

Common return values are documented here, the following are the fields unique to this module:

Key	Description
creation_time string	The time and date that this crawler definition was created. Returned: when state is present Sample: `"2021-04-01T05:19:58.326000+00:00"`
database_name string	The name of the database where results are written. Returned: when state is present Sample: `"my_table"`
description string	Description of the crawler. Returned: when state is present Sample: `"My crawler"`
last_updated string	The time and date that this crawler definition was last updated. Returned: when state is present Sample: `"2021-04-01T05:19:58.326000+00:00"`
name string	The name of the AWS Glue crawler. Returned: always Sample: `"my-glue-crawler"`
recrawl_policy complex	A policy that specifies whether to crawl the entire dataset again, or to crawl only folders that were added since the last crawler run. Returned: when state is present
RecrawlBehavior string	Whether to crawl the entire dataset again or to crawl only folders that were added since the last crawler run. Returned: when state is present Sample: `"CRAWL_EVERYTHING"`
role string	The name or ARN of the IAM role associated with this crawler. Returned: when state is present Sample: `"my-iam-role"`
schema_change_policy complex	The policy for the crawler’s update and deletion behavior. Returned: when state is present
DeleteBehavior string	The deletion behavior when the crawler finds a deleted object. Returned: when state is present Sample: `"DELETE_FROM_DATABASE"`
UpdateBehavior string	The update behavior when the crawler finds a changed schema. Returned: when state is present Sample: `"UPDATE_IN_DATABASE"`
table_prefix string	The table prefix used for catalog tables that are created. Returned: when state is present Sample: `"my_prefix"`
targets complex	A list of targets to crawl. Returned: when state is present
CatalogTargets list / elements=string	List of catalog targets. Returned: when state is present
DynamoDBTargets list / elements=string	List of DynamoDB targets. Returned: when state is present
JdbcTargets list / elements=string	List of JDBC targets. Returned: when state is present
MongoDBTargets list / elements=string	List of Mongo DB targets. Returned: when state is present
S3Targets list / elements=string	List of S3 targets. Returned: when state is present

Authors

Ivan Chekaldin (@ichekaldin)

community.aws.glue_crawler module – Manage an AWS Glue crawler

Synopsis

Requirements

Parameters

Notes

Examples

Return Values