Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added plurals and context support to Translation #40443

Merged
merged 4 commits into from
Aug 25, 2020

Conversation

SkyLucilfer
Copy link
Contributor

@SkyLucilfer SkyLucilfer commented Jul 16, 2020

Context: Translation/i18n

Added plurals and context support for translation in project and Editor.

Added API

  • For project:
    tr_n(message, plural_message, n, context = "")

  • For Editor:
    TTRN(message, plural_message, n, context = "")
    DTRN(message, plural_message, n, context = "")
    RTRN(message, plural_message, n, context = "")

  • All existing translation functions can now add context with the last argument too.
    tr(message, context = "")
    TTR(message, context = "")
    etc.

Example usage

  • Specifying with context:
    tr("Connect", "Device")
    tr("Connect", "Signal")

  • Plurals translation:
    print(tr_n("%d user likes this.", "%d users like this.", n) % n);

  • If we want more precise control over the replacement of n in the string during plurals translation:
    print(tr_n("One user likes this.", "{num} users like this.", n).format([n], "{num}"));

  • In Editor
    Some examples from changes in scene_tree_editor.cpp

multiple_connections

one_connect_multi_groups_en

one_connect_multi_group_french

Test project

po_parsing_pot_gen_2.zip
PO files which I have used to test the updated PO parser are contained in the folder "PO files". Also did some tests using the new tr(), tr_n() interfaces in GDScript.
Also tested the POT generation to generate POT files containing msgctxt and msgid_plurals, both for projects and Editor.

Performance testing [07/08/2020]

performance_test.zip
I have abstracted PO to a specified class, TranslationPO. So translation using CSV will have the same performance as last time, as it uses the same Translation class before this PR.
When we import from a PO file, the imported locale will use the TranslationPO class. For 14000 translations in the dictionary:

  • tr() takes 54 ms to translate 10 000 strings ~ 0.005 ms to translate one string.

  • tr_n() takes

    • 280 ms to translate 10 000 strings ~ 0.028 ms for one string with short plural-rule, like French, German etc.
    • 1250ms to translate 10 000 strings ~ 0.25 ms for one string with very long plural-rule, Arabic

For reference, for a game running at 120 FPS, it takes 8.33 ms to render one frame.
So tr_n() is quite slow, and should be used with care. However, I do cache the last tr_n() result so if a translation query is the same (same key, n and context), it will return very fast. I imagine multiple same queries can happen if an UI element uses tr_n().
I have tried my best to optimize tr_n() - by caching variables, converting recursion to iteration, but it doesn't improve much. The slow runtime comes from the cost of evaluating the plural expression. Right now I use the Expression class. If runtime is really a concern, maybe we could make it faster by writing a dedicated function to evaluate the plural expression without using the Expression class, or hardcoding all possible plural rules and match locale with it.

Performance testing [27/08/2020]

With the PR #41519, the performance problem of tr_n() is fixed. This is done by having a dedicated file containing all the plural rules.
tr_n() now has roughly the same runtime as tr(), with extra 0.001 ms to translate one string compared to tr().

@SkyLucilfer SkyLucilfer force-pushed the PluralsSupport branch 2 times, most recently from d2a2275 to b14d04c Compare July 16, 2020 10:53
@YuriSizov
Copy link
Contributor

This approach for plurals cannot be reused across languages with rules different from English, can it?

@SkyLucilfer
Copy link
Contributor Author

SkyLucilfer commented Jul 16, 2020

This approach for plurals cannot be reused across languages with rules different from English, can it?

Can, it's based on the GNU ngettext idea for translating plurals strings. The "n" passed in by the users will be used to guide the system to fetch the correct translation, based on the plural rule of the selected language.

@Calinou Calinou added this to the 4.0 milestone Jul 16, 2020
@name-here
Copy link

Don't some languages have more than one plural, depending on quantity? Does this support that?

@SkyLucilfer
Copy link
Contributor Author

@name-here Yes it supports that. You might want to refer to the answer I posted above.

@dalexeev
Copy link
Member

It looks like the tr_n function is not very suitable if you are using identifier system:

tr_n("MY_ID", "", n) # Or `tr_n("MY_ID", "MY_ID", n)`?
From the docs

There are two approaches to generate multilingual language games and applications. Both are based on a key:value system. The first is to use one of the languages as the key (usually English), the second is to use a specific identifier. <...> In general, games use the second approach and a unique ID is used for each string.

@Calinou
Copy link
Member

Calinou commented Jul 23, 2020

@dalexeev In my experience, gettext PO files are heavily centered around using English text as identifiers. On the other hand, custom formats (like Godot's CSV format) and XLIFF tend to recommend using keys as identifiers.

@YuriSizov
Copy link
Contributor

In my experience, gettext PO files are heavily centered around using English text as identifiers.

This is definitely the intended way to use it by the creators for translating Linux, but the file format itself is not enforcing this as a rule in any way. If you use keys as identifiers, some tools may warn you that your translation language is English (POEdit does that, for one), but it's on the user to handle this. In this case the user being the engine.

Copy link
Member

@YeldhamDev YeldhamDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@akien-mga akien-mga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work!

I left a few suggestions but it's mostly good to merge already.

My main concern is about adding so much gettext-specific logic to the core Translation (also used by CSV, and the abstract API that users access), instead of a more PO-specific resource type derived from Translation. I'd welcome input from other contributors on this.

core/compressed_translation.cpp Outdated Show resolved Hide resolved
core/io/translation_loader_po.cpp Outdated Show resolved Hide resolved
core/io/translation_loader_po.cpp Show resolved Hide resolved
core/object.h Outdated Show resolved Hide resolved
core/translation.h Outdated Show resolved Hide resolved
editor/scene_tree_editor.cpp Outdated Show resolved Hide resolved
doc/classes/EditorTranslationParserPlugin.xml Outdated Show resolved Hide resolved
editor/editor_translation_parser.cpp Outdated Show resolved Hide resolved
editor/pot_generator.h Show resolved Hide resolved
editor/translations/extract.py Outdated Show resolved Hide resolved
@dalexeev
Copy link
Member

dalexeev commented Jul 31, 2020

It looks like the tr_n function is not very suitable if you are using identifier system

In my experience, gettext PO files are heavily centered around using English text as identifiers. On the other hand, custom formats (like Godot's CSV format) and XLIFF tend to recommend using keys as identifiers.

For CSV, we should also implement plurals support. For example like this:

KEY en ru
DAYS_AGO[0] %d day ago %d день назад
DAYS_AGO[1] %d days ago %d дня назад
DAYS_AGO[2] - %d дней назад

Usage:

var s = tr_n(n, "DAYS_AGO") % n

That is, we just have to make n the first argument, and it will be compatible with both systems.

Also, maybe we should remove the _ from the function name? Rename to trn or ntr?

@SkyLucilfer
Copy link
Contributor Author

Hello @dalexeev
I talked to core devs about adding plurals for CSV too. However, it seems that the advantage of CSV is its simplicity, and trying to handle plurals with it would cancel out that benefit. In addition, let's say we need to support Arabic language in the CSV file, we will need 6 rows for every key, so it can be inconvenient. Therefore, in case we want plurals, we might be better off just use PO files. You could still make a proposal and see how the community responds to it.
The reason I chose tr_n() is because I want to make it very explicit that we're translating plurals - tr stands for translate, and n indicates plurals. If there's no particular reason that the interface name is inconvenient or unusable, I would prefer not to change it.

@dalexeev
Copy link
Member

dalexeev commented Jul 31, 2020

@SkyLucilfer Thanks for the answer.

However, it seems that the advantage of CSV is its simplicity, and trying to handle plurals with it would cancel out that benefit.

Is it difficult to implement or use? In my opinion, both are simple.

var s = tr("DAYS_AGO[%d]" % f(n)) % n
# where f is "Plural-Forms" logic. For English:
# func f(n: int) -> int:
#     if n == 1:
#         return 0
#     else:
#         return 1

In addition, let's say we need to support Arabic language in the CSV file, we will need 6 rows for every key, so it can be inconvenient.

No, only for strings with number substitution. And there are usually few such strings:

KEY en ru
REGULAR_KEY Regular key Обычный ключ
... ... ...
SPECIAL_KEY[0] %d key %d ключ
SPECIAL_KEY[1] %d keys %d ключа
SPECIAL_KEY[2] %d ключей
... ... ...
ANOTHER_KEY Another key Другой ключ
... ... ...

Therefore, in case we want plurals, we might be better off just use PO files.

That is, I will not be able to use the identifier system? In addition, it will be inconvenient to use a non-English language as the main one.

You could still make a proposal and see how the community responds to it.

I'll add the proposal later.


There is another option:

KEY en[0] en[1] ru[0] ru[1] ru[2]
DAYS_AGO %d day ago %d days ago %d день назад %d дня назад %d дней назад

But I like the first option better, because strings usually don't have numeric substitutions.

@akien-mga
Copy link
Member

For CSV plurals, I would suggest opening a proposal indeed and doing research on how plurals are handled by other projects that support CSV translations.

From what I found, there are many different CSV translation workflows and the few that support plurals have it hacked in in a way as suggested e.g. here, but there's no common standard. It's a simple system so we can indeed design our own plurals logic, but if there was a somewhat "popular" way of doing plurals with CSV used e.g. in other game engines, it would be best for us to follow that.

@dalexeev
Copy link
Member

dalexeev commented Aug 3, 2020

If there's no particular reason that the interface name is inconvenient or unusable, I would prefer not to change it.

Maybe change at least the order of the arguments?

# tr_n(message, plural_message, n, context = "")
var s = tr_n("%d day ago", "%d days ago", n) % n
# ru.po
msgid "%d day ago"
msgid_plural "%d days ago"
msgstr[0] "%d день назад"
msgstr[1] "%d дня назад"
msgstr[2] "%d дней назад"

If using IDs:

# tr_n(message, plural_message, n, context = "")
var s = tr_n("DAYS_AGO", "", n) % n
# en.po
msgid "DAYS_AGO"
msgid_plural ""
msgstr[0] "%d day ago"
msgstr[1] "%d days ago"

I suggested changing the order of the arguments:

That is, we just have to make n the first argument, and it will be compatible with both systems.

# tr_n(n, message, plural_message = "", context = "")
var s = tr_n(n, "DAYS_AGO") % n

core/translation_po.cpp Outdated Show resolved Hide resolved
@akien-mga
Copy link
Member

I've only read the last two commits diagonally, but overall this is very solid work so I think it's best to merge as is, and see if any follow ups are needed once using it in the master branch.

@akien-mga akien-mga merged commit 9d8f349 into godotengine:master Aug 25, 2020
@akien-mga
Copy link
Member

Thanks, and congrats for your top notch work on these features! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants