Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i18n: Silent failure when using ISO 639-3 language tags that have ISO 639-1 equivalents #9260

Open
jmooring opened this issue Dec 8, 2021 · 2 comments

Comments

@jmooring
Copy link
Member

jmooring commented Dec 8, 2021

Reference: https://discourse.gohugo.io/t/35938

This behavior was introduced with v0.76.0 where we upgraded nicksnyder/go-i18n from v1 to v2.

Language Tag Description i18n
eng ISO 639-3
nld ISO 639-3
es ISO 639-1 ✔️
zzz artificial ✔️
mylang artificial ✔️

MRE

git clone --single-branch -b hugo-forum-topic-35938 https://github.com/jmooring/hugo-testing hugo-forum-topic-35938
cd hugo-forum-topic-35938
hugo server

Notes

  1. I suspect this is a limitation of nicksnyder/go-i18n, but do not know for certain. For example, the CLDR data has entries for en and nl, but not for eng and nld.
  2. Using ISO 639-3 also prevents localization of dates, currency, numbers, and percentages. That is a related but separate issue, Use Language.LanguageCode as localization key, falling back to language.Lang #9109.
@github-actions
Copy link

github-actions bot commented Dec 9, 2022

This issue has been automatically marked as stale because it has not had recent activity. The resources of the Hugo team are limited, and so we are asking for your help.
If this is a bug and you can still reproduce this error on the master branch, please reply with all of the information you have about it in order to keep the issue open.
If this is a feature request, and you feel that it is still relevant and valuable, please tell us why.
This issue will automatically be closed in the near future if no further activity occurs. Thank you for all your contributions.

@github-actions github-actions bot added the Stale label Dec 9, 2022
@bep bep added this to the v0.108.0 milestone Dec 9, 2022
@github-actions github-actions bot removed the Stale label Dec 11, 2022
@bep bep modified the milestones: v0.108.0, v0.109.0 Dec 14, 2022
@bep bep modified the milestones: v0.109.0, v0.111.0, v0.110.0 Jan 26, 2023
@bep bep modified the milestones: v0.111.0, v0.112.0 Feb 15, 2023
@bep bep modified the milestones: v0.112.0, v0.113.0 Apr 15, 2023
@jmooring
Copy link
Member Author

jmooring commented May 31, 2023

i18n fails with ISO 639-3 language tags

This is false. It only fails with some ISO 639-3 tags, and the failure is, sort of, correct. The problem is that the failure is silent. We might handle this with documentation; not sure yet.

To borrow a phrase from Douglas Adams, BCP 47 (specifically RFC 5646) will be "first against the wall when the revolution comes."

RFC 5646

The ABNF syntax for the language tag defined in RFC 5646 is:

 langtag       = language
                 ["-" script]
                 ["-" region]
                 *("-" variant)
                 *("-" extension)
                 ["-" privateuse]

language      = 2*3ALPHA            ; shortest ISO 639 code

The namespace of language tags and their subtags is administered by the Internet Assigned Numbers Authority (IANA)

When languages have both an ISO 639-1 two-character code and a three-character code (assigned by ISO 639-2, ISO 639-3, or ISO 639-5), only the ISO 639-1 two-character code is defined in the IANA registry.

So, even though "eng" for English is valid per IS0 639-3, RFC 5646 requires "en" because English is also defined in IS0 639-1.

Why is it important?

Hugo, either directly or indirectly, uses the language tag for:

Each of these features relies on the Unicode Common Locale Data Repository (CLDR), which conforms to BCP 47, which includes RFC 5646.

Parsing the language tag

Hugo, nicksnyder/go-i18n, and golang.org/x/text/collate use Go's language.Parse function. This function (a) validates the language tag, and (b) returns a language tag that conforms to RFC 5646 (see playground example).

Language Input Output ISO Standard
English eng-us en-US ISO 639-1 per RFC 5646
Hawaiian haw haw ISO 639-3 per RFC 5646
Dutch nld nl ISO 639-1 per RFC 5646

Translation failure

With this site configuration:

[languages.eng]

Neither of these files are read:

i18n/eng.toml
i18n/en.toml

Solution options

1) Improve existing documentation. Currently this (a) is buried deep in a long page and (b) lacks a simple explanation of what it means to conform to RFC 5646.

2) Validate the language tags in the site configuration. We could test equivalence between the language tag and the value returned by language.Parse, which seems like it would work as the language.Parse function respects artificial language tags.

@bep bep modified the milestones: v0.113.0, v0.115.0 Jun 13, 2023
@bep bep modified the milestones: v0.115.0, v0.116.0 Jun 30, 2023
@bep bep modified the milestones: v0.116.0, v0.117.0 Aug 1, 2023
@bep bep modified the milestones: v0.117.0, v0.118.0 Aug 30, 2023
@bep bep removed this from the v0.118.0 milestone Sep 15, 2023
@bep bep added this to the v0.119.0 milestone Sep 15, 2023
@bep bep modified the milestones: v0.119.0, v0.120.0 Oct 5, 2023
@bep bep modified the milestones: v0.120.0, v0.121.0 Oct 31, 2023
@bep bep modified the milestones: v0.121.0, v0.122.0 Dec 6, 2023
@bep bep modified the milestones: v0.122.0, v0.123.0, v0.124.0 Jan 27, 2024
@bep bep modified the milestones: v0.124.0, v0.125.0 Mar 4, 2024
@jmooring jmooring changed the title i18n fails with ISO 639-3 language tags i18n: Silent failure when using ISO 639-3 language tags that have ISO 639-1 equivalents Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants