-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check for duplicate sources in messages #156
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #156 +/- ##
==========================================
+ Coverage 90.74% 90.80% +0.05%
==========================================
Files 11 11
Lines 2399 2436 +37
Branches 2399 2436 +37
==========================================
+ Hits 2177 2212 +35
Misses 159 159
- Partials 63 65 +2 ☔ View full report in Codecov by Sentry. |
i18n-helpers/src/xgettext.rs
Outdated
@@ -41,7 +41,13 @@ fn strip_link(text: &str) -> String { | |||
|
|||
fn add_message(catalog: &mut Catalog, msgid: &str, source: &str) { | |||
let sources = match catalog.find_message(None, msgid, None) { | |||
Some(msg) => wrap_sources(&format!("{}\n{}", msg.source(), source)), | |||
Some(msg) => { | |||
if msg.source().contains(source) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think checking for existence should be done per line rather than regular string contains. An example case where the proposed logic would fail will be if source contains entry some-dir/README.md
and the new source is README.md
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. I'll update the logic and add a test case for that scenario
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch @kdarkhan! I think this could be handled nicely by
- Separating the sources with
\n
whenadd_message
is called. - Use
lines()
to split the source field back to lines and then deduplicate these lines. - Do a call to
wrap_sources
to nicely wrap things.
Both 2 and 3 could be done after adding all messages to the catalog — I think that would give us linear complexity here. Right now, we have some potential for O(n²) complexity here.
Putting a filename per line remove ambiguity about filenames that contain spaces. I'm not actually sure how such filenames ought to be stored in the sources field — feel free to experiment a bit with xgettext
if you are curious!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the outline @mgeisler! I pushed an update along those lines. What I have assumes that duplicate sources are consecutive. Are there cases where that doesn't hold?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I have assumes that duplicate sources are consecutive. Are there cases where that doesn't hold?
I think it should hold right now — we walk over the contents of the book in the order given by Book::iter
. It says that it's doing a depth-first traversal on the files of the book, so it ought to do the right thing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks good to me.
Here is the diff of the generated messages.pot
from comprehensive-rust
repo.
https://gist.github.com/kdarkhan/9e270ae1e5842abd10f826dcaa5940aa
Thanks for contribution @zachcmadsen! Feel free to pick up the followup task #171 if you want. |
With rounded line numbers, it's possible for messages to have duplicate sources. This checks for duplicate sources before updating messages. An alternative approach is deduplicating sources after all the messages are added/updated. I went with the "on-the-fly" approach since it's simpler.
The check scans all of the sources for a message. If performance is a concern, we could try other ways to check for duplicates.
Fixes #154