Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Artificial language codes longer than 8 alphanumeric characters break i18n #9119

Closed
fnordfish opened this issue Nov 3, 2021 · 5 comments · Fixed by #9140
Closed

Artificial language codes longer than 8 alphanumeric characters break i18n #9119

fnordfish opened this issue Nov 3, 2021 · 5 comments · Fixed by #9140

Comments

@fnordfish
Copy link

When using a language with a custom key including a slash the generated translate function keys won't match the i18n translation files.

According to the documentation:

From Hugo 0.31 you no longer need to use a valid language code. It can be anything.
#3564

When using a language config like this:

languages:
  fragen:
    languageName: Deutsch
    # ....
  en/questions:
    languageName: English
    # ....

The generated translate function will use the i18n translation file names as keys.
But executing the translation, the lookup key will be using the original languages key ("fragen" and "en/questions" respectively)
Since I cannot name the translation files including a slash, I tried replacing it with a colon, but had no success.

For me this patch works, but it is hardly a generic solution

--- a/langs/i18n/i18n.go
+++ b/langs/i18n/i18n.go
@@ -49,6 +49,7 @@ func NewTranslator(b *i18n.Bundle, cfg config.Provider, logger loggers.Logger) T
 // Func gets the translate func for the given language, or for the default
 // configured language if not found.
 func (t Translator) Func(lang string) translateFunc {
+       lang = strings.SplitN(lang, "/", 2)[0]
        if f, ok := t.translateFuncs[lang]; ok {
                return f
        }

What version of Hugo are you using (hugo version)?

$ hugo version
hugo v0.89.0-ADE966B8 linux/amd64 BuildDate=2021-11-02T10:00:18Z VendorInfo=gohugoio

Does this issue reproduce with the latest release?

Yes

@jmooring
Copy link
Member

jmooring commented Nov 4, 2021

From Hugo 0.31 you no longer need to use a valid language code. It can be anything.

Or maybe we could just change the documentation, specifying a valid pattern.

@fnordfish
Copy link
Author

@jmooring :)

Some background: I still try to find a solution to #8279 where I need to have more fine grained control over which path prefix is used in a mutli lang setup.

Maybe we could come up with a more stable way of linking the language to the read-in i18n files - maybe the languageCode if present?

@jmooring
Copy link
Member

jmooring commented Nov 4, 2021

Let's continue the discussion at #8279.

@jmooring jmooring changed the title Custom language codes with / break i18n Artificial language codes longer than 8 alphanumeric characters break i18n Nov 7, 2021
@jmooring
Copy link
Member

jmooring commented Nov 7, 2021

So, yes, the documentation here needs to be changed from:

From Hugo 0.31 you no longer need to use a valid language code. It can be anything.

To something like:

Language tags must conform to the IETF BCP 47 specification. Valid examples include en and en_US. You may also use custom language codes that do not exceed 8 alphanumeric characters. Valid examples of custom language codes include mylang and proj1234.

Localization features (dates, currencies, numbers, and percentages) are not available when you use a custom language code.

There's a related issue about language tag vs. locale. The outcome of that investigation may affect the examples above.

In addition to the required documentation change, there's a bug. The i18n file is ignored when the language tag is nine or more characters. Example:

git clone --single-branch -b hugo-github-issue-9119 https://github.com/jmooring/hugo-testing hugo-github-issue-9119
cd hugo-github-issue-9119
hugo server

With this configuration:

defaultContentLanguage = 'en'

[languages.en]
weight = 1
[languages.ninechars]
weight = 2

# i18n/en.toml
# i18n/ninechars.toml (or art-x-ninechars.toml -- both work with 8 or less characters in name)

Hugo parses ninechars using golang.org/x/text/language and determines it is not a valid language tag per BCP 47, so it prepends art-x- to the value. art is a code for artificial languages, and x indicates private use per BCP 47. Hugo then passes art-x-ninechars to nicksnyder/go-i18n. And that's where it fails.

It fails because nicksnyder/go-i18n parses the language tag too, also using golang.org/x/text/language. It determines that art-x-ninechars is not a valid language tag, but the failure is silenced somewhere. The tag isn't valid because the private use value may not exceed 8 alphanumeric characters.

We should re-validate the language tag after prepending art-x- and throw an error if it doesn't conform to the BCP 47 spec.

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 14, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants