Working with Unicode

Unicode makes it possible to produce two strings which may be visually equivalent, but are comprised of distinctly different characters/character sequences. To address this Unicode defines normalization forms which avoid these distinctions by choosing a unique character sequence for a given visual representation.

You can use the community.general.unicode_normalize filter to normalize Unicode strings within your playbooks.

- name: Compare Unicode representations
  debug:
    msg: "{{ with_combining_character | community.general.unicode_normalize == without_combining_character }}"
  vars:
    with_combining_character: "{{ 'Mayagu\u0308ez' }}"
    without_combining_character: Mayagüez

This produces:

TASK [Compare Unicode representations] ********************************************************
ok: [localhost] => {
    "msg": true
}

The community.general.unicode_normalize filter accepts a keyword argument to select the Unicode form used to normalize the input string.

form

One of 'NFC' (default), 'NFD', 'NFKC', or 'NFKD'. See the Unicode reference for more information.

New in version 3.7.0.