Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Hash based id #1440

Merged
merged 14 commits into from
Feb 22, 2023
Merged

Conversation

timofei-iatsenko
Copy link
Collaborator

@timofei-iatsenko timofei-iatsenko commented Feb 15, 2023

Implementation for: #1360

For easier reviewing, i split this feature in the different commits. I recommend checking them one by one.

implementation details

  1. Js macro now always generate call with message descriptor, no matter what input was
     i18n._({id: "mjnlP0", message: "Ola!"})
  2. Id is generated as base64(sha256(value)).length(6). Unlike the formatjs where this placeholder is configurable, i explicitly didn't want to give flexibility here. Because it comes down to complexity of porting this to Rust.
  3. i found very exhausting to updating tests with new ids, since we don't use jest snapsohots for them. So i created a small babel plugin used in tests which is stripping ids from the test cases where checking id is not a purpose of test.
  4. i created a separate package message-id which helps with creating ids from messages. I need it in few places that's why it's extracted.
  5. context has effect only on message id generation and in PO formatter (all during compile time). All usages in compile and runtime was removed. Context property also stripped in production mode.
  6. po formatter works a bit tricky. We don't want to have our hash based id in the msgid field since it against the gettext ideology. But we also don't want to break compatibility with custom ids provided by users. For that purpose during serialization we're storing a flag generated-id in po file, which then used when deserializing to regenerate hash id.
    input:
     {
       id: <hash>,
       message: "Ola!",
       context: "My Contex",
     }
    output:
    #, generated-id
     msgctxt "my context"
     msgid "Ola!"
     msgstr ""
    We also can do vice-versa, store flag when explicit id provided, this is TBD. (and like this option more because in my mind there should more autogenerated id's than custom)

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

Fixes # (issue)

Checklist

  • I have read the CONTRIBUTING and CODE_OF_CONDUCT docs
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

@vercel
Copy link

vercel bot commented Feb 15, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated
js-lingui ✅ Ready (Inspect) Visit Preview 💬 Add your feedback Feb 21, 2023 at 1:18PM (UTC)

@timofei-iatsenko timofei-iatsenko changed the base branch from main to next February 15, 2023 15:40
@github-actions
Copy link

github-actions bot commented Feb 15, 2023

size-limit report 📦

Path Size
./packages/core/build/esm/index.js 1.76 KB (0%)
./packages/detect-locale/build/esm/index.js 812 B (0%)
./packages/react/build/esm/index.js 1.79 KB (0%)
./packages/remote-loader/build/esm/index.js 7.29 KB (0%)

@codecov
Copy link

codecov bot commented Feb 15, 2023

Codecov Report

Base: 72.27% // Head: 73.89% // Increases project coverage by +1.62% 🎉

Coverage data is based on head (e3882c8) compared to base (6b6adc9).
Patch coverage: 89.22% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             next    #1440      +/-   ##
==========================================
+ Coverage   72.27%   73.89%   +1.62%     
==========================================
  Files          74       75       +1     
  Lines        1897     1923      +26     
  Branches      510      513       +3     
==========================================
+ Hits         1371     1421      +50     
+ Misses        422      383      -39     
- Partials      104      119      +15     
Impacted Files Coverage Δ
packages/macro/src/utils.ts 100.00% <ø> (ø)
packages/macro/test/js-arg.ts 100.00% <ø> (ø)
packages/macro/test/js-defineMessage.ts 100.00% <ø> (ø)
packages/macro/test/js-plural.ts 100.00% <ø> (ø)
packages/macro/test/js-select.ts 100.00% <ø> (ø)
packages/macro/test/js-selectOrdinal.ts 100.00% <ø> (ø)
packages/macro/test/js-t.ts 100.00% <ø> (ø)
packages/macro/test/jsx-plural.ts 100.00% <ø> (ø)
packages/macro/test/jsx-select.ts 100.00% <ø> (ø)
packages/macro/test/jsx-selectOrdinal.ts 100.00% <ø> (ø)
... and 26 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@JSteunou
Copy link
Contributor

Thank you for leading this hard work @thekip this is a top notch long time awaiting feature :)

About the hash, would sha256 be a bit more time consuming than sha1 ?

@timofei-iatsenko
Copy link
Collaborator Author

About the hash, would sha256 be a bit more time consuming than sha1 ?

I have been thinking about what hash function to use for a while. The criteria were - short / with low collision rate / non cryptographical / implemented in Rust

I considered

Related SO topics:
https://stackoverflow.com/questions/4567089/hash-function-that-produces-short-hashes

But i couldn't choose and just took what formatjs is using as default, assuming they choose it by the reason.

I think for regular catalogs with few thousands of messages, the difference in speed between sha256 or sha1 would be negligible

@JSteunou
Copy link
Contributor

I think for regular catalogs with few thousands of messages, the difference in speed between sha256 or sha1 would be negligible

That what it seems indeed https://github.com/hex7c0/nodejs-hash-performance 👍 ~same speed, less collision

@timofei-iatsenko
Copy link
Collaborator Author

@Martin005 @andrii-bodnar guys i want a feedback on these changes before I start porting it to the Rust. Updating test suite is extremely exhausting, so i don't want to do it more than one time.

@andrii-bodnar
Copy link
Contributor

@thekip currently I have that one suggestion about the new package

also, I'm worrying about this PO flag - it looks like the PO Gettext specification has already a predefined list of allowed flags, isn't it? Or it allows any other flags also?

@timofei-iatsenko
Copy link
Collaborator Author

@andrii-bodnar new package is not related to Rust macro, so i can start porting.

Or it allows any other flags also?

Yes, there is a predefined list of flags which native gettext tools understand. But specification says more generously:

#, flag…

And also they say

The comment lines beginning with #, are special because they are not completely ignored by the programs as comments generally are. The comma separated list of flags is used by the msgfmt program to give the user some better diagnostic messages.

Means it's designed to affect message interpretation somehow, unlike to other comments which could be safely omitted.

@Martin005
Copy link
Contributor

@thekip We have agreed with @andrii-bodnar that it would be best to postpone the flattening in extraction (drafted in #1431) for now.

@timofei-iatsenko
Copy link
Collaborator Author

@Martin005 actually to fix issue with flattening we can make js-lingui-id required in po files. And don't calculate id when reading from (message + context) but only take this hash. Something like this is made for po-gettext extractor. Where comments used to store original icu components.

#. js-lingui-id: <hash> 
msgctxt "..."
msgid "..."
msgstr ""

website/docs/guides/plurals.md Show resolved Hide resolved
website/docs/guides/plurals.md Show resolved Hide resolved
website/docs/ref/macro.md Outdated Show resolved Hide resolved
website/docs/ref/macro.md Outdated Show resolved Hide resolved
website/docs/ref/macro.md Outdated Show resolved Hide resolved
website/docs/ref/macro.md Outdated Show resolved Hide resolved
website/docs/ref/macro.md Show resolved Hide resolved
website/docs/ref/macro.md Show resolved Hide resolved
website/docs/ref/macro.md Show resolved Hide resolved
website/docs/ref/macro.md Show resolved Hide resolved
@timofei-iatsenko
Copy link
Collaborator Author

Guys, everything related to docs, just do it yourself. I'm just not able to do it all by myself.

@andrii-bodnar
Copy link
Contributor

@thekip please just resolve the current suggestions and then we'll take care of its polishing

packages/cli/src/api/formats/po-gettext.test.ts Outdated Show resolved Hide resolved
packages/macro/test/jsx-trans.ts Outdated Show resolved Hide resolved
packages/macro/test/jsx-trans.ts Outdated Show resolved Hide resolved
Co-authored-by: Martin Chrástek <chrastek12@gmail.com>

Apply suggestions from code review

Co-authored-by: Andrii Bodnar <29282228+andrii-bodnar@users.noreply.github.com>

Update website/docs/ref/macro.md

Co-authored-by: Martin Chrástek <chrastek12@gmail.com>

Update website/docs/ref/macro.md

Co-authored-by: Martin Chrástek <chrastek12@gmail.com>

Update website/docs/ref/macro.md

Co-authored-by: Martin Chrástek <chrastek12@gmail.com>

Update website/docs/ref/macro.md

Co-authored-by: Martin Chrástek <chrastek12@gmail.com>

Update website/docs/ref/macro.md

Co-authored-by: Martin Chrástek <chrastek12@gmail.com>

Update website/docs/ref/macro.md

Co-authored-by: Martin Chrástek <chrastek12@gmail.com>

Update website/docs/ref/macro.md

Co-authored-by: Martin Chrástek <chrastek12@gmail.com>
Co-authored-by: Martin Chrástek <chrastek12@gmail.com>
packages/cli/src/api/formats/po-gettext.test.ts Outdated Show resolved Hide resolved
packages/cli/src/api/formats/po-gettext.test.ts Outdated Show resolved Hide resolved
packages/macro/test/jsx-trans.ts Outdated Show resolved Hide resolved
website/docs/tutorials/react-patterns.md Outdated Show resolved Hide resolved
website/docs/tutorials/react-patterns.md Outdated Show resolved Hide resolved
website/docs/ref/macro.md Outdated Show resolved Hide resolved
website/docs/ref/macro.md Outdated Show resolved Hide resolved
website/docs/ref/macro.md Outdated Show resolved Hide resolved
website/docs/ref/macro.md Outdated Show resolved Hide resolved
website/docs/ref/macro.md Outdated Show resolved Hide resolved
@andrii-bodnar andrii-bodnar merged commit 93ab9f4 into lingui:next Feb 22, 2023
@timofei-iatsenko timofei-iatsenko deleted the feature/hash-based-id branch February 22, 2023 07:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RFC: Context + hash based message id Message context (msgctxt) not working
4 participants