Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple comments get ignored for same ID #845

Closed
Bertg opened this issue Nov 10, 2020 · 13 comments · Fixed by #854
Closed

Multiple comments get ignored for same ID #845

Bertg opened this issue Nov 10, 2020 · 13 comments · Fixed by #854

Comments

@Bertg
Copy link
Contributor

Bertg commented Nov 10, 2020

Describe the bug

A given translation ID may occur multiple times in the source code.
It is reasonable that some of these invocations have comments and other not, some comments may even be different.
Lingui only uses the comment of the first instance it find. If that instance does not have a comment, no comment is added to the PO file.

To Reproduce

Given the following file

<Trans id="long_value" comment="First comment" />
<Trans id="long_value" comment="Second comment, does not work yet" />

when running lingui extract

We get

#. First comment
#: src/pages/Lingui.tsx:62
#: src/pages/Lingui.tsx:69
#: src/pages/Lingui.tsx:74
msgid "long_value"
msgstr ""

Expected behavior

The PO file should contain all comments:

#. First comment
#: src/pages/Lingui.tsx:62
# Second comment, does not work yet
#: src/pages/Lingui.tsx:69
#: src/pages/Lingui.tsx:74
msgid "long_value"
msgstr ""

Additional context

  • jsLingui version 3.1.0
@nakkamarra
Copy link
Contributor

nakkamarra commented Nov 10, 2020

@Bertg @tricoder42 I noticed this as well. However, if the different comments were to cause your translator to return separate translations for the same word given the comment (or context, as I'm thinking about it), how would we possibly encode it in the .po file? I believe the msgctxt field would be necessary to solving this though, which as per my understanding of gettext, acts as an additional keying mechanism.

i.e Imagine the content for "long_value" is something like the English word "Lead"

#: src/pages/Lingui.tsx:62
msgctxt "As in the element on the periodic table: Pb"
msgid "long_value"
msgstr "Lead"

#: src/pages/Lingui.tsx:69
#: src/pages/Lingui.tsx:74
msgctxt "The verb, meaning to control or guide"
msgid "long_value"
msgstr "Lead"

And then the same translation key can be referenced contextually by some function or component like i18n._(id, context) in order to pull out the correct one for a given context.

I think another part of the issue here would be conflating the idea of a comment with context. We use comments to provide our translators with context about the string and it's meaning, but gettext/po's more "programmatic" notion of context is slightly different, right?

@tricoder42
Copy link
Contributor

@Bertg Yeah, this will require a bit of refactoring in extraction script...

@nakkamarra Unfortunately, msgctx isn't supported at the moment. Your best bet is include the msgctx string inside the ID:

<Trans id="long_value / As in the element on the periodic table: Pb">Lead</Trans>
<Trans id="long_value / The verb, meaning to control or guide">Lead</Trans>

We should review Fluent before adding this feature to Lingui. I'm curious how they solve it. In the future I would like Lingui library to handle ICU Message Format as well as Fluent, so we need to find a common denominator for both.

@Bertg
Copy link
Contributor Author

Bertg commented Nov 11, 2020

@nakkamarra I think context & comments should remain separate concepts. It can be that a comment is basically similar "CTA on a button", or "Button CTA". This does not warrant a separate translation or context.

However, the comments may give a hint to a translator that something might be up in their specific language. Eg: "CTA to remove a user" and "CTA to delete item" on a string like this "Remove {item}". In English this is fine. However, in a language like french or other latin languages this might raise a concern as the value of {item} might change the translation entirely.
In this situation the translator can give feedback that the base information in the translation is not sufficient. This can then be rectified by adding context (if supported), adding gender information or splitting up the translation entirely.

In conclusion. I think we should have our cake and eat it too. We should have the multiple comments, and we should have msgctxt as well.

@Bertg
Copy link
Contributor Author

Bertg commented Nov 11, 2020

@tricoder42

I've been looking into this, and the initial capturing phase (capture comment per message) is a very easy change.

diff --git a/packages/babel-plugin-extract-messages/src/index.ts b/packages/babel-plugin-extract-messages/src/index.ts
index 038ec9c1..a24d2614 100644
--- a/packages/babel-plugin-extract-messages/src/index.ts
+++ b/packages/babel-plugin-extract-messages/src/index.ts
@@ -56,7 +56,7 @@ export default function ({ types: t }) {
     t.isIdentifier(node.object, { name: "i18n" }) &&
     t.isIdentifier(node.property, { name: "_" })
 
-  function collectMessage(path, file, props) {
+  function collectMessage(path, file, {comment, ...props}) {
     const messages = file.get(MESSAGES)
 
     const rootDir = file.get(CONFIG).rootDir
@@ -64,7 +64,7 @@ export default function ({ types: t }) {
       .relative(rootDir, file.opts.filename)
       .replace(/\\/g, "/")
     const line = path.node.loc ? path.node.loc.start.line : null
-    props.origin = [[filename, line]]
+    props.origin = [[filename, line, comment]]
 
     addMessage(path, messages, props)
   }

This is a super simple change, but does change the entire internal representation quite a bit.

Is it OK for you if I continue in this direction?

@tricoder42
Copy link
Contributor

It should be fine. Javascript doesn't have tuples, only arrays, so there's probably const [filename, line] = origin somewhere down the line and this change won't break it.

Feel free to go for it! 👍 You'll also need to change the cli package.

@nakkamarra
Copy link
Contributor

@Bertg Yeah, this will require a bit of refactoring in extraction script...

@nakkamarra Unfortunately, msgctx isn't supported at the moment. Your best bet is include the msgctx string inside the ID:

<Trans id="long_value / As in the element on the periodic table: Pb">Lead</Trans>
<Trans id="long_value / The verb, meaning to control or guide">Lead</Trans>

We should review Fluent before adding this feature to Lingui. I'm curious how they solve it. In the future I would like Lingui library to handle ICU Message Format as well as Fluent, so we need to find a common denominator for both.

@tricoder42 Yeah, after some brief reading and grepping, nothing immediately jumps out at me from there docs about how it is handled. I was under the impression that msgctxt is part of the gettext spec and thus doesn't have anything to do with the message format but is a function of the .po file format.

From the Fluent Syntax section of this article, I'm lead to believe that the way in which fluent handles this is just with comments, which I think is fundamentally different from gettext and Lingui users who are using their content as their message IDs.

Maybe worth a separate issue for discussion of this?

@tricoder42
Copy link
Contributor

Yeah, definitely. Personally I don't see a reason why you'd need to separate msgid and msgctx, when you can use them both as an id. I guess it might be useful when you use natural language keys.

@nakkamarra
Copy link
Contributor

@tricoder42 Well, I guess my issue with that is that it feels more like a hack than a solution, and it is unclear to me whether or not the context encoded in the IDs like that will be picked up a Translator / TMS the same way that a comment or a msgctxt would (meaning it might get glossed over based on the software/UI on their end).

@Bertg
Copy link
Contributor Author

Bertg commented Nov 11, 2020

I have to follow @nakkamarra here. It is a hack.

Say we follow the hack approach and write t({id: "Go go gadget machine / A CTA button"}). We can't at this time use the id as the default. Which I think is a major use-case in Lingui.

Writing t({id: "Go go gadget machine", ctx: "A CTA button"}) is not that much more complex, but you get the id == default behaviour back; and you get the context based translations as well.

If users consider msgctx to complex or to much of a hassle, they can still use the hack-y way of managing their translations, regardless of Lingui sporting the feature.

If there is interest in this, I could have a look at bringing this into Lingui as well.

@Bertg
Copy link
Contributor Author

Bertg commented Nov 11, 2020

FYI I @tricoder42 The comments fix is in.

Had to do it differently still, but got it to work eventually ;)

@nakkamarra
Copy link
Contributor

I have to follow @nakkamarra here. It is a hack.

Say we follow the hack approach and write t({id: "Go go gadget machine / A CTA button"}). We can't at this time use the id as the default. Which I think is a major use-case in Lingui.

Writing t({id: "Go go gadget machine", ctx: "A CTA button"}) is not that much more complex, but you get the id == default behaviour back; and you get the context based translations as well.

If users consider msgctx to complex or to much of a hassle, they can still use the hack-y way of managing their translations, regardless of Lingui sporting the feature.

If there is interest in this, I could have a look at bringing this into Lingui as well.

Yeah, that definitely highlights my concern with it, since that is my exact use-case. As for the support in Fluent, I'm beginning to think that it isn't supported, at least as per the use case I envision (the traditional gettext usage of msgctxt), since Fluent doesn't seem to support using a message itself as the ID, so the problem that the msgctxt is trying to solve won't exist.

So I envision it something like this, if in the future the library supports multiple message + file format combinations:

Config defines PO as the file format: comment extracted to PO comment, context extracted to msgctxt
Config defines Fluent as the file format: comment extracted to Fluent comment, context throws error

And as far as I know, the solution for .po should work for .json as well.

I'm definitely interested in helping to solve this as well @Bertg @tricoder42, so please let me know if either of you would like to coordinate for getting a solution for context thought out and added to the library.

@Bertg
Copy link
Contributor Author

Bertg commented Nov 12, 2020

@nakkamarra Well, first steps first would be to create a separate ticket for adding context support, instead of keeping the conversation here :)

Config defines PO as the file format: comment extracted to PO comment, context extracted to msgctxt
Config defines Fluent as the file format: comment extracted to Fluent comment, context throws error

This seems very reasonable as a suggestion.

@tricoder42 tricoder42 mentioned this issue Nov 12, 2020
5 tasks
@tricoder42
Copy link
Contributor

@nakkamarra I like your suggestion. I've just created #856 where I outlined what needs to be done. Feel free to ask me anything you might need if you want to work on this feature

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants