Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Implement gettext plurals for PO files #677

Merged
merged 16 commits into from
Dec 7, 2020
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 54 additions & 0 deletions docs/ref/catalog-formats.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,60 @@ The advantages of this format are:

.. _gettext: https://www.gnu.org/software/gettext/manual/html_node/PO-Files.html

.. _po-gettext:

PO File with gettext Plurals
============================

When using localization backends that don't understand the ICU plural syntax exported by the default `po` formatter,
**po-gettext** can be used to read and write to PO files using gettext-native plurals.

This is how the regular PO format exports plurals:

.. code-block:: po

msgid "{count, plural, one {Message} other {Messages}}"
msgstr "{count, plural, one {Message} other {Messages}}"

With `po-gettext`, plural messages are exported in the following way, depending on wheter an explicit ID is set:

.. code-block:: po

# Message with custom ID "my_message" that is pluralized on property "someCount".
#
# Notice that 'msgid_plural' was generad by appending a '_plural' suffix.
msgctxt "pluralize_on=someCount"
msgid "my_message"
msgid_plural "my_message_plural"
msgstr[0] "Singular case"
msgstr[1] "Case number {someCount}"

# Message without custom ID that is pluralized on property "anotherCount".
#
# Notice how 'msgid' and 'msgid_plural' were extracted from original message.
#
# To allow matching this PO item to the appropriate catalog entry when deserializing,
# the original ICU message is also stored in msgctxt.
msgctxt "icu=%7BanotherCount%2C+plural%2C+one+%7BSingular+case%7D+other+%7BCase+number+%7BanotherCount%7D%7D%7D&pluralize_on=anotherCount"
msgid "Singular case"
msgid_plural "Case number {anotherCount}"
msgstr[0] "Singular case"
msgstr[1] "Case number {anotherCount}"

Note that this format comes with several caveats and should therefore only be used if using ICU plurals in PO files is
not an option:

- Nested/multiple plurals in one message as shown in :jsmacro:`plural` are not supported as they cannot be expressed
with gettext plurals. Messages containing nested/multiple formats will not be output correctly.

- :jsmacro:`select` and :jsmacro:`selectOrdinal` cannot be expressed with gettext plurals, but the original ICU format
is still saved to the `msgid`/`msgstr` properties. To disable the warning that this might not be the expected
behavior, include :code:`{ disableSelectWarning: true }` in the :conf:`formatOptions`.

- Source/development languages with more than two plurals could experience difficulties when no custom IDs are used,
as gettext cannot have more than two plurals cases identifying an item (:code:`msgid` and :code:`msgid_plural`).


JSON
====

Expand Down
5 changes: 5 additions & 0 deletions docs/ref/conf.rst
Original file line number Diff line number Diff line change
Expand Up @@ -295,6 +295,11 @@ Gettext PO file:
msgid "MessageID"
msgstr "Translated Message"

po-gettext
^^^^^^^^^^

Uses PO files but with gettext-style plurals, see :ref:`po-gettext`.

minimal
^^^^^^^

Expand Down
4 changes: 3 additions & 1 deletion packages/cli/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -57,14 +57,16 @@
"normalize-path": "^3.0.0",
"ora": "^5.1.0",
"papaparse": "^5.3.0",
"pofile": "^1.0.11",
"plurals-cldr": "^1.0.4",
"pofile": "^1.1.0",
"pseudolocale": "^1.1.0",
"ramda": "^0.27.1"
},
"devDependencies": {
"@types/micromatch": "^4.0.1",
"@types/normalize-path": "^3.0.0",
"@types/papaparse": "^5.2.3",
"@types/plurals-cldr": "^1.0.1",
"mockdate": "^3.0.2",
"typescript": "^4.0.3"
},
Expand Down
251 changes: 251 additions & 0 deletions packages/cli/src/api/formats/__snapshots__/po-gettext.test.ts.snap
Original file line number Diff line number Diff line change
@@ -0,0 +1,251 @@
// Jest Snapshot v1, https://goo.gl/fbAQLP

exports[`po-gettext format should convert ICU plural messages to gettext plurals 1`] = `
msgid ""
msgstr ""
"POT-Creation-Date: 2018-08-27 10:00+0000\\n"
"Mime-Version: 1.0\\n"
"Content-Type: text/plain; charset=utf-8\\n"
"Content-Transfer-Encoding: 8bit\\n"
"X-Generator: @lingui/cli\\n"
"Language: en\\n"

msgctxt "pluralize_on=count"
msgid "message_with_id_and_octothorpe"
msgid_plural "message_with_id_and_octothorpe_plural"
msgstr[0] "Singular"
msgstr[1] "Number is #"

msgctxt "pluralize_on=someCount"
msgid "message_with_id"
msgid_plural "message_with_id_plural"
msgstr[0] "Singular case with id"
msgstr[1] "Case number {someCount} with id"

msgctxt "icu=%7BanotherCount%2C+plural%2C+one+%7BSingular+case%7D+other+%7BCase+number+%7BanotherCount%7D%7D%7D&pluralize_on=anotherCount"
msgid "Singular case"
msgid_plural "Case number {anotherCount}"
msgstr[0] "Singular case"
msgstr[1] "Case number {anotherCount}"

msgctxt "pluralize_on=count"
msgid "message_with_id_but_without_translation"
msgid_plural "message_with_id_but_without_translation_plural"
msgstr[0] ""
msgstr[1] ""

msgctxt "icu=%7Bcount%2C+plural%2C+one+%7BSingular+automatic+id+no+translation%7D+other+%7BPlural+%7Bcount%7D+automatic+id+no+translation%7D%7D&pluralize_on=count"
msgid "Singular automatic id no translation"
msgid_plural "Plural {count} automatic id no translation"
msgstr[0] ""
msgstr[1] ""

`;

exports[`po-gettext format should convert gettext plurals to ICU plural messages 1`] = `
Object {
message_with_id: Object {
comment: undefined,
comments: Array [],
flags: Array [],
obsolete: false,
origin: Array [],
translation: {someCount, plural, one {Singular case} other {Case number {someCount}}},
},
message_with_id_but_without_translation: Object {
comment: undefined,
comments: Array [],
flags: Array [],
obsolete: false,
origin: Array [],
translation: ,
},
{anotherCount, plural, one {Singular case} other {Case number {anotherCount}}}: Object {
comment: undefined,
comments: Array [],
flags: Array [],
obsolete: false,
origin: Array [],
translation: {anotherCount, plural, one {Singular case} other {Case number {anotherCount}}},
},
{count, plural, one {Singular} other {Plural}}: Object {
comment: undefined,
comments: Array [],
flags: Array [],
obsolete: false,
origin: Array [],
translation: ,
},
}
`;

exports[`po-gettext format should correct badly used comments 1`] = `
Object {
withDescriptionAndComments: Object {
comment: Single description only,
comments: Array [
Translator comment,
Second description?,
],
flags: Array [],
obsolete: false,
origin: Array [],
translation: Second description joins translator comments,
},
withMultipleDescriptions: Object {
comment: First description,
comments: Array [
Second comment,
Third comment,
],
flags: Array [],
obsolete: false,
origin: Array [],
translation: Extra comments are separated from the first description line,
},
}
`;

exports[`po-gettext format should read catalog in pofile format 1`] = `
Object {
obsolete: Object {
comment: undefined,
comments: Array [],
flags: Array [],
obsolete: true,
origin: Array [],
translation: Is marked as obsolete,
},
static: Object {
comment: undefined,
comments: Array [],
flags: Array [],
obsolete: false,
origin: Array [],
translation: Static message,
},
veryLongString: Object {
comment: undefined,
comments: Array [],
flags: Array [],
obsolete: false,
origin: Array [],
translation: One morning, when Gregor Samsa woke from troubled dreams, he found himself transformed in his bed into a horrible vermin. He lay on his armour-like back, and if he lifted his head a little he could see his brown belly, slightly domed and divided by arches into stiff sections. The bedding was hardly able to cover it and seemed ready to slide off any moment. His many legs, pitifully thin compared with the size of the rest of him, waved about helplessly as he looked. "What's happened to me?" he thought. It wasn't a dream. His room, a proper human,
},
withComments: Object {
comment: undefined,
comments: Array [
Translator comment,
This one might come from developer,
],
flags: Array [],
obsolete: false,
origin: Array [],
translation: Support translator comments separately,
},
withDescription: Object {
comment: Description is comment from developers to translators,
comments: Array [],
flags: Array [],
obsolete: false,
origin: Array [],
translation: Message with description,
},
withFlags: Object {
comment: undefined,
comments: Array [],
flags: Array [
fuzzy,
otherFlag,
],
obsolete: false,
origin: Array [],
translation: Keeps any flags that are defined,
},
withMultipleOrigins: Object {
comment: undefined,
comments: Array [],
flags: Array [],
obsolete: false,
origin: Array [
Array [
src/App.js,
4,
],
Array [
src/Component.js,
2,
],
],
translation: Message with multiple origin,
},
withOrigin: Object {
comment: undefined,
comments: Array [],
flags: Array [],
obsolete: false,
origin: Array [
Array [
src/App.js,
4,
],
],
translation: Message with origin,
},
}
`;

exports[`po-gettext format should throw away additional msgstr if present 1`] = `
Object {
withMultipleTranslation: Object {
comment: undefined,
comments: Array [],
flags: Array [],
obsolete: false,
origin: Array [],
translation: This is just fine,
},
}
`;

exports[`po-gettext format should write catalog in pofile format 1`] = `
msgid ""
msgstr ""
"POT-Creation-Date: 2018-08-27 10:00+0000\\n"
"Mime-Version: 1.0\\n"
"Content-Type: text/plain; charset=utf-8\\n"
"Content-Transfer-Encoding: 8bit\\n"
"X-Generator: @lingui/cli\\n"
"Language: en\\n"

msgid "static"
msgstr "Static message"

#: src/App.js:4
msgid "withOrigin"
msgstr "Message with origin"

#: src/App.js:4
#: src/Component.js:2
msgid "withMultipleOrigins"
msgstr "Message with multiple origin"

msgid "withDescription"
msgstr "Message with description"

# Translator comment
# This one might come from developer
msgid "withComments"
msgstr "Support translator comments separately"

#~ msgid "obsolete"
#~ msgstr "Obsolete message"

#, fuzzy,otherFlag
msgid "withFlags"
msgstr "Keeps any flags that are defined"

msgid "veryLongString"
msgstr "One morning, when Gregor Samsa woke from troubled dreams, he found himself transformed in his bed into a horrible vermin. He lay on his armour-like back, and if he lifted his head a little he could see his brown belly, slightly domed and divided by arches into stiff sections. The bedding was hardly able to cover it and seemed ready to slide off any moment. His many legs, pitifully thin compared with the size of the rest of him, waved about helplessly as he looked. \\"What's happened to me?\\" he thought. It wasn't a dream. His room, a proper human"

`;
32 changes: 32 additions & 0 deletions packages/cli/src/api/formats/fixtures/messages_plural.po
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
msgid ""
msgstr ""
"POT-Creation-Date: 2018-08-27 10:00+0000\n"
"Mime-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"X-Generator: @lingui/cli\n"
"Language: en\n"

msgctxt "pluralize_on=someCount"
msgid "message_with_id"
msgid_plural "message_with_id_plural"
msgstr[0] "Singular case"
msgstr[1] "Case number {someCount}"

msgctxt "icu=%7BanotherCount%2C+plural%2C+one+%7BSingular+case%7D+other+%7BCase+number+%7BanotherCount%7D%7D%7D&pluralize_on=anotherCount"
msgid "Singular case"
msgid_plural "Case number {anotherCount}"
msgstr[0] "Singular case"
msgstr[1] "Case number {anotherCount}"

msgctxt "pluralize_on=count"
msgid "message_with_id_but_without_translation"
msgid_plural "message_with_id_but_without_translation_plural"
msgstr[0] ""
msgstr[1] ""

msgctxt "icu=%7Bcount%2C+plural%2C+one+%7BSingular%7D+other+%7BPlural%7D%7D&pluralize_on=count"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary for this feature to use msgctx? There's actually another PR #856 implementing the original msgctx behavior from Gettext.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storing the original ICU message in msgctx is clearly a workaround and not using the field as intended. In earlier versions, only the pluralize_on value was stored there, which I need to reconstruct the original ICU message.
The full ICU message is stored s.t. msgid can be restored for messages where the developer does not use custom IDs, as the ICU cases in development language are used for msgid and msgid_plural, so items look like this:

msgctxt "icu=%7Bcount%2C+plural%2C+one+%7BSingular%7D+other+%7BPlural%7D%7D&pluralize_on=count"
msgid "Singular"
msgid_plural "Plural"
msgstr[0] ""
msgstr[1] ""

I could also store the querystring encoded data in a new type of comment, say ' #?foo=bar' and, when converting from po to ICU, iterate comments until I find one that matches the format. What do you think about that?

Copy link
Contributor

@tricoder42 tricoder42 Dec 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think we need to use comments to store any required metadata. You could even have one comment per line:

#. icu: { count, plural, one {Singular} other {Plural}}
#. pluralize_on: count
msgid "Singular"
msgid_plural "Plural"
msgstr[0] ""
msgstr[1] ""

but whatever works for you the best 👍

There're several types of comments available in gettext:

white-space
#  translator-comments
#. extracted-comments
#: reference…
#, flag…
#| msgid previous-untranslated-string
msgid untranslated-string
msgstr translated-string

Not sure which are supported by the PO library we use, but I guess this would be a good fit for extracted-comments.

I'm open to any suggestions :)

msgid "Singular"
msgid_plural "Plural"
msgstr[0] ""
msgstr[1] ""
7 changes: 4 additions & 3 deletions packages/cli/src/api/formats/index.ts
Original file line number Diff line number Diff line change
@@ -1,17 +1,18 @@
import { CatalogFormatOptions, CatalogFormat } from "@lingui/conf"
import { CatalogFormat, CatalogFormatOptions } from "@lingui/conf"

import { CatalogType } from "../catalog"

import csv from "./csv"
import lingui from "./lingui"
import minimal from "./minimal"
import po from "./po"
import csv from "./csv"
import poGettext from "./po-gettext"

const formats: Record<CatalogFormat, CatalogFormatter> = {
lingui,
minimal,
po,
csv,
"po-gettext": poGettext,
}

type CatalogFormatOptionsInternal = {
Expand Down
Loading