Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix metadata length computation logic #383

Merged
merged 3 commits into from
Jul 18, 2022
Merged

Fix metadata length computation logic #383

merged 3 commits into from
Jul 18, 2022

Conversation

AliSoftware
Copy link
Contributor

@AliSoftware AliSoftware commented Jul 12, 2022

Fixes #382

Why

The logic to compute the translation's lengths in metadata was incorrect.

How things work

When exporting Store metadata translations from GlotPress, the code from gp_downloadmetadata_action downloads the export as JSON format, which looks something like this (extract modified for simplicity, readability, and showing all cases)

{
  "app_store_desc\u0004Pocket Casts is the world's most powerful podcast platform, an app by listeners, for listeners.\nHere's what the press has to say:\n\"Pocket Casts Is the Podcast App Every iPhone User Needs\"": [
    "Pocket Casts est la plateforme de podcast la plus puissante au monde, une application aliment\u00e9e par les auditeurs, pour les auditeurs.\nVoici ce qu\u2019en dit la presse\u00a0:\n\u00ab\u00a0Pocket Casts est l\u2019application de podcast dont tous les utilisateurs d\u2019iPhone ont besoin\u00a0\u00bb"
  ], "app_store_keywords\u0004podcasts,stream,audio,video": [
    "podcasts,diffuser,audio,vid\u00e9o"
  ],
  "app_store_subtitle\u0004Powerful podcast player": []
}

Notice how:

  • The JSON keys contain both the "GlotPress context" (aka the string "key") and the original copy, separated by \u0004.
  • The JSON values are arrays, with zero (missing translation) or one (translated) value.
  • The special characters are escaped in the raw JSON as expected per the JSON spec, so once we'll JSON.parse them with Ruby they will be unescaped by the parser.
  • Newlines are \n, not \r\n.

Note I think the JSON export format is intended to support the case for multiple translations in the value array, for the case where a key might have e.g. both a current and a waiting translation; so maybe there are cases where there can be more than one value. That being said, we always only export the current values, so in our case we should never have more than one value anyway.

The bug

The core of the bug was was that to evaluate the translation's length, we used msg.to_s.length - 4 on line 69.

But on that line, msg can either be the original copyextracted from the part after the \u0004 from the JSON key — or the actual Hash value — which is then an array with zero or one translation.

That means that

  • When we had a case of missing translation, aka msg = [], then message_len would equal -2 (as [].to_s is the string "[]" of length 2, minus 4 equals -2)
  • In the case we used the source locale (iteration where is_source == true), message_len would be 4 less than the actual length
  • When we had a translation, aka msg = ["some\ntext"], then the fact that message_len used .to_s on the Array meant that the string representation of that array not only contained the [" and "] characters at start and end respectively (hence the -4 we added, I'd guess), but also every character escape like \n would be re-escaped like \\n (because .to_s sees the Array as a whole, not individual strings). This was the root of the issue.

Solution

When extracting the Hash's value to determine the value of msg, we're now using d[1].first || '' instead of just d[1].

That means that msg will now always be a String (both when it's mapped to source and when it's extracted from the Hash's value to get the translation), as opposed to before where it was either a String (source) or an Array (Hash's value d[1])

As a consequence, we can drop the to_s that introduced the bug, as well as the -4 that was supposed to (incorrectly) offset the value to account for the Array delimiters when that array was to_s'd.

To Test

I have NOT tested this solution with real use case.

The only thing I did to test it was to download a couple of .json exports from GlotPress as examples, then launch pry, used JSON.parse to parse the json files, and play around with the parsed data to test my theories.

As such, it would be nice to test this change with a real client app — for example point PocketCast to this branch of the toolkit, and run the gp_downloadmetadata action again, and check that this time it doesn't consider that the translations for app_store_desc are too long anymore. \cc @mokagio

Copy link
Contributor

@mokagio mokagio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested it on Pocket Casts iOS via this PR

gp_downloadmetadata output using version 4.2.0 (what resulted in this issue being noticed in the first place)

[14:20:55]: Project URL: https://translate.wordpress.com/projects/pocket-casts/ios/release-notes/
[14:20:55]: Locales: {"de"=>"de-DE", "es"=>"es-ES", "fr"=>"fr-FR", "it"=>"it", "ja"=>"ja", "nl"=>"nl-NL", "pt-br"=>"pt-BR", "ru"=>"ru", "sv"=>"sv", "zh-cn"=>"zh-Hans", "zh-tw"=>"zh-Hant"}
[14:20:55]: Source locale: -
[14:20:55]: Path: /Users/gio/Developer/a8c/pcios/fastlane/metadata
Downloading language: de-DE
[14:20:56]: Rejecting de-DE traslation for app_store_subtitle: translation length: 31 - max allowed length: 30
Downloading language: es-ES
[14:20:56]: Rejecting es-ES traslation for app_store_desc: translation length: 4031 - max allowed length: 4000
Downloading language: fr-FR
[14:20:57]: Rejecting fr-FR traslation for app_store_desc: translation length: 4031 - max allowed length: 4000
Downloading language: it
Downloading language: ja
Downloading language: nl-NL
Downloading language: pt-BR
Downloading language: ru
Downloading language: sv
Downloading language: zh-Hans
Downloading language: zh-Hant

Output from this branch:

[14:20:08]: ---------------------------------
[14:20:08]: --- Step: gp_downloadmetadata ---
[14:20:08]: ---------------------------------
[14:20:08]: Project URL: https://translate.wordpress.com/projects/pocket-casts/ios/release-notes/
[14:20:08]: Locales: {"de"=>"de-DE", "es"=>"es-ES", "fr"=>"fr-FR", "it"=>"it", "ja"=>"ja", "nl"=>"nl-NL", "pt-br"=>"pt-BR", "ru"=>"ru", "sv"=>"sv", "zh-cn"=>"zh-Hans", "zh-tw"=>"zh-Hant"}
[14:20:08]: Source locale: -
[14:20:08]: Path: /Users/gio/Developer/a8c/pcios/fastlane/metadata
Downloading language: de-DE
[14:20:09]: Rejecting de-DE translation for app_store_subtitle: translation length: 31 - max allowed length: 30
Downloading language: es-ES
Downloading language: fr-FR
Downloading language: it
Downloading language: ja
Downloading language: nl-NL
Downloading language: pt-BR
Downloading language: ru
Downloading language: sv
Downloading language: zh-Hans
Downloading language: zh-Hant

The de-DE 31 characters violation is a genuine one:

image

I noticed there are not tests for this fix in the diff, then noticed there are no tests for metadata_download_helper in the first place.

@@ -58,15 +58,15 @@ def reparse_alternates(target_locale, loc_data, is_source)
if file[0].to_s == key
puts "Alternate: #{key}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR, but I wonder whether we should update this puts and the one above at line 57 to UI.message?

@AliSoftware
Copy link
Contributor Author

I noticed there are not tests for this fix in the diff, then noticed there are no tests for metadata_download_helper in the first place.

Yeah I thought about adding unit tests but wanted this workaround to be shipped ASAP (wanted to timebox my effort on this bug)… but also mostly because we intend to re-implement this whole action completely some time soon (remember the long RFC P2s we wrote about generating then parsing back metadata strings from po files when working on the L10n tooling project? 😛) so I figured we'd TDD and make that more testable when we get to that instead.

@mokagio
Copy link
Contributor

mokagio commented Jul 18, 2022

I'm going to merge this so it doesn't get stale while Olivier's AFK. Too bad I missed the 5.0.0 cut.

@mokagio mokagio enabled auto-merge July 18, 2022 06:03
@mokagio mokagio merged commit c72d0ed into trunk Jul 18, 2022
@mokagio mokagio deleted the fix/metadata-length branch July 18, 2022 06:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ensure logic that lints localizations' string length considers both LF and CRLF as one character
2 participants