community.aws.glue_crawler module – Manage an AWS Glue crawler

Note

This module is part of the community.aws collection (version 9.0.0).

You might already have this collection installed if you are using the ansible package. It is not included in ansible-core. To check whether it is installed, run ansible-galaxy collection list.

To install it, use: ansible-galaxy collection install community.aws. You need further requirements to be able to use this module, see Requirements for details.

To use it in a playbook, specify: community.aws.glue_crawler.

New in community.aws 4.1.0

Synopsis

  • Manage an AWS Glue crawler. See https://aws.amazon.com/glue/ for details.

  • Prior to release 5.0.0 this module was called community.aws.aws_glue_crawler. The usage did not change.

Aliases: aws_glue_crawler

Requirements

The below requirements are needed on the host that executes this module.

  • python >= 3.6

  • boto3 >= 1.28.0

  • botocore >= 1.31.0

Parameters

Parameter

Comments

access_key

aliases: aws_access_key_id, aws_access_key, ec2_access_key

string

AWS access key ID.

See the AWS documentation for more information about access tokens https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys.

The AWS_ACCESS_KEY_ID, AWS_ACCESS_KEY or EC2_ACCESS_KEY environment variables may also be used in decreasing order of preference.

The aws_access_key and profile options are mutually exclusive.

The aws_access_key_id alias was added in release 5.1.0 for consistency with the AWS botocore SDK.

The ec2_access_key alias has been deprecated and will be removed in a release after 2024-12-01.

Support for the EC2_ACCESS_KEY environment variable has been deprecated and will be removed in a release after 2024-12-01.

aws_ca_bundle

path

The location of a CA Bundle to use when validating SSL certificates.

The AWS_CA_BUNDLE environment variable may also be used.

aws_config

dictionary

A dictionary to modify the botocore configuration.

Parameters can be found in the AWS documentation https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html#botocore.config.Config.

database_name

string

The name of the database where results are written.

debug_botocore_endpoint_logs

boolean

Use a botocore.endpoint logger to parse the unique (rather than total) "resource:action" API calls made during a task, outputing the set to the resource_actions key in the task results. Use the aws_resource_action callback to output to total list made during a playbook.

The ANSIBLE_DEBUG_BOTOCORE_LOGS environment variable may also be used.

Choices:

  • false ← (default)

  • true

description

string

Description of the crawler being defined.

endpoint_url

aliases: ec2_url, aws_endpoint_url, s3_url

string

URL to connect to instead of the default AWS endpoints. While this can be used to connection to other AWS-compatible services the amazon.aws and community.aws collections are only tested against AWS.

The AWS_URL or EC2_URL environment variables may also be used, in decreasing order of preference.

The ec2_url and s3_url aliases have been deprecated and will be removed in a release after 2024-12-01.

Support for the EC2_URL environment variable has been deprecated and will be removed in a release after 2024-12-01.

name

string / required

The name you assign to this crawler definition. It must be unique in your account.

profile

aliases: aws_profile

string

A named AWS profile to use for authentication.

See the AWS documentation for more information about named profiles https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-profiles.html.

The AWS_PROFILE environment variable may also be used.

The profile option is mutually exclusive with the aws_access_key, aws_secret_key and security_token options.

purge_tags

boolean

If purge_tags=true and tags is set, existing tags will be purged from the resource to match exactly what is defined by tags parameter.

If the tags parameter is not set then tags will not be modified, even if purge_tags=True.

Tag keys beginning with aws: are reserved by Amazon and can not be modified. As such they will be ignored for the purposes of the purge_tags parameter. See the Amazon documentation for more information https://docs.aws.amazon.com/general/latest/gr/aws_tagging.html#tag-conventions.

Choices:

  • false

  • true ← (default)

recrawl_policy

dictionary

A policy that specifies whether to crawl the entire dataset again, or to crawl only folders that were added since the last crawler run.

recrawl_behavior

string

Specifies whether to crawl the entire dataset again or to crawl only folders that were added since the last crawler run.

Supported options are CRAWL_EVERYTHING and CRAWL_NEW_FOLDERS_ONLY.

region

aliases: aws_region, ec2_region

string

The AWS region to use.

For global services such as IAM, Route53 and CloudFront, region is ignored.

The AWS_REGION or EC2_REGION environment variables may also be used.

See the Amazon AWS documentation for more information http://docs.aws.amazon.com/general/latest/gr/rande.html#ec2_region.

The ec2_region alias has been deprecated and will be removed in a release after 2024-12-01

Support for the EC2_REGION environment variable has been deprecated and will be removed in a release after 2024-12-01.

role

string

The name or ARN of the IAM role associated with this crawler.

Required when state=present.

schema_change_policy

dictionary

The policy for the crawler’s update and deletion behavior.

delete_behavior

string

Defines the deletion behavior when the crawler finds a deleted object.

Supported options are LOG, DELETE_FROM_DATABASE, and DEPRECATE_IN_DATABASE.

update_behavior

string

Defines the update behavior when the crawler finds a changed schema..

Supported options are LOG and UPDATE_IN_DATABASE.

secret_key

aliases: aws_secret_access_key, aws_secret_key, ec2_secret_key

string

AWS secret access key.

See the AWS documentation for more information about access tokens https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys.

The AWS_SECRET_ACCESS_KEY, AWS_SECRET_KEY, or EC2_SECRET_KEY environment variables may also be used in decreasing order of preference.

The secret_key and profile options are mutually exclusive.

The aws_secret_access_key alias was added in release 5.1.0 for consistency with the AWS botocore SDK.

The ec2_secret_key alias has been deprecated and will be removed in a release after 2024-12-01.

Support for the EC2_SECRET_KEY environment variable has been deprecated and will be removed in a release after 2024-12-01.

session_token

aliases: aws_session_token, security_token, aws_security_token, access_token

string

AWS STS session token for use with temporary credentials.

See the AWS documentation for more information about access tokens https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys.

The AWS_SESSION_TOKEN, AWS_SECURITY_TOKEN or EC2_SECURITY_TOKEN environment variables may also be used in decreasing order of preference.

The security_token and profile options are mutually exclusive.

Aliases aws_session_token and session_token were added in release 3.2.0, with the parameter being renamed from security_token to session_token in release 6.0.0.

The security_token, aws_security_token, and access_token aliases have been deprecated and will be removed in a release after 2024-12-01.

Support for the EC2_SECRET_KEY and AWS_SECURITY_TOKEN environment variables has been deprecated and will be removed in a release after 2024-12-01.

state

string / required

Create or delete the AWS Glue crawler.

Choices:

  • "present"

  • "absent"

table_prefix

string

The table prefix used for catalog tables that are created.

tags

aliases: resource_tags

dictionary

A dictionary representing the tags to be applied to the resource.

If the tags parameter is not set then tags will not be modified.

targets

dictionary

A list of targets to crawl. See example below.

Required when state=present.

validate_certs

boolean

When set to false, SSL certificates will not be validated for communication with the AWS APIs.

Setting validate_certs=false is strongly discouraged, as an alternative, consider setting aws_ca_bundle instead.

Choices:

  • false

  • true ← (default)

Notes

Note

  • Caution: For modules, environment variables and configuration files are read from the Ansible ‘host’ context and not the ‘controller’ context. As such, files may need to be explicitly copied to the ‘host’. For lookup and connection plugins, environment variables and configuration files are read from the Ansible ‘controller’ context and not the ‘host’ context.

  • The AWS SDK (boto3) that Ansible uses may also read defaults for credentials and other settings, such as the region, from its configuration files in the Ansible ‘host’ context (typically ~/.aws/credentials). See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html for more information.

Examples

# Note: These examples do not set authentication details, see the AWS Guide for details.

# Create an AWS Glue crawler
- community.aws.glue_crawler:
    name: my-glue-crawler
    database_name: my_database
    role: my-iam-role
    schema_change_policy:
      delete_behavior: DELETE_FROM_DATABASE
      update_behavior: UPDATE_IN_DATABASE
    recrawl_policy:
      recrawl_ehavior: CRAWL_EVERYTHING
    targets:
      S3Targets:
        - Path: "s3://my-bucket/prefix/folder/"
          ConnectionName: my-connection
          Exclusions:
            - "**.json"
            - "**.yml"
    state: present

# Delete an AWS Glue crawler
- community.aws.glue_crawler:
    name: my-glue-crawler
    state: absent

Return Values

Common return values are documented here, the following are the fields unique to this module:

Key

Description

creation_time

string

The time and date that this crawler definition was created.

Returned: when state is present

Sample: "2021-04-01T05:19:58.326000+00:00"

database_name

string

The name of the database where results are written.

Returned: when state is present

Sample: "my_table"

description

string

Description of the crawler.

Returned: when state is present

Sample: "My crawler"

last_updated

string

The time and date that this crawler definition was last updated.

Returned: when state is present

Sample: "2021-04-01T05:19:58.326000+00:00"

name

string

The name of the AWS Glue crawler.

Returned: always

Sample: "my-glue-crawler"

recrawl_policy

complex

A policy that specifies whether to crawl the entire dataset again, or to crawl only folders that were added since the last crawler run.

Returned: when state is present

RecrawlBehavior

string

Whether to crawl the entire dataset again or to crawl only folders that were added since the last crawler run.

Returned: when state is present

Sample: "CRAWL_EVERYTHING"

role

string

The name or ARN of the IAM role associated with this crawler.

Returned: when state is present

Sample: "my-iam-role"

schema_change_policy

complex

The policy for the crawler’s update and deletion behavior.

Returned: when state is present

DeleteBehavior

string

The deletion behavior when the crawler finds a deleted object.

Returned: when state is present

Sample: "DELETE_FROM_DATABASE"

UpdateBehavior

string

The update behavior when the crawler finds a changed schema.

Returned: when state is present

Sample: "UPDATE_IN_DATABASE"

table_prefix

string

The table prefix used for catalog tables that are created.

Returned: when state is present

Sample: "my_prefix"

targets

complex

A list of targets to crawl.

Returned: when state is present

CatalogTargets

list / elements=string

List of catalog targets.

Returned: when state is present

DynamoDBTargets

list / elements=string

List of DynamoDB targets.

Returned: when state is present

JdbcTargets

list / elements=string

List of JDBC targets.

Returned: when state is present

MongoDBTargets

list / elements=string

List of Mongo DB targets.

Returned: when state is present

S3Targets

list / elements=string

List of S3 targets.

Returned: when state is present

Authors

  • Ivan Chekaldin (@ichekaldin)