-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement fine-grained extraction of translatable text #25
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! How will you handle migrating existing translations?
let new_state = cmark_resume_with_options( | ||
events.clone(), | ||
String::new(), | ||
state.clone(), | ||
options.clone(), | ||
) | ||
.unwrap(); | ||
|
||
// Block quotes and lists add padding to the state. This is | ||
// reflected in the rendered Markdown. We want to capture the | ||
// Markdown without the padding to remove the effect of these | ||
// structural elements. | ||
let state_without_padding = state.map(|state| State { | ||
padding: Vec::new(), | ||
..state | ||
}); | ||
cmark_resume_with_options(events, &mut markdown, state_without_padding, options).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me why this calls cmark_resume_with_options
twice.
Is the idea to return an accurate state (new_state
) but return markdown rendered without the padding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea to return an accurate state (
new_state
) but return markdown rendered without the padding?
Yes, precisely! The padding is the "> "
and list indents and I'm trying to avoid putting that into the .po
file.
Thanks!
My next step is to write a little normalization tool: it should be enough to go through a |
Before, we would extract text based on the byte offsets in the original document. As a consequence of this, the extracted text would look precisely like the original: the Markdown was copied directly from the original. In particular, text from a block quote would contain the leading ‘>’ characters and paragraphs in list items would contain leading whitespace. Now, we instead extract text by grouping the Markdown parse events into those which should be translated and those who should be skipped. We use this in two ways: - When extracting messages in ‘mdbook-xgettext’, we turn the translatable events back into Markdown. The structure of the document (headings, lists, block quotes, …) is no longer present in the extracted messages: only the text content itself it extracted. - When translating, we replace the sequence of translatable events with the events from the translation. We do this while leaving the structure of the document unchanged. The result of this is a much more robust system: editing one list item no longer impacts adjacent list items, moving a paragraph into a block quote no longer changes the paragraph. As a side effect of how we turn events into messages, links are now all expanded. This makes the messages larger, but it removes a common source of errors where ‘[foo][1]’ would end up pointing to the wrong location if the reference link was updated. Part of #19.
a82ef13
to
107484c
Compare
Before, we would extract text based on the byte offsets in the original document. As a consequence of this, the extracted text would look precisely like the original: the Markdown was copied directly from the original. In particular, text from a block quote would contain the leading ‘>’ characters and paragraphs in list items would contain leading whitespace.
Now, we instead extract text by grouping the Markdown parse events into those which should be translated and those who should be skipped. We use this in two ways:
When extracting messages in ‘mdbook-xgettext’, we turn the translatable events back into Markdown. The structure of the document (headings, lists, block quotes, …) is no longer present in the extracted messages: only the text content itself it extracted.
When translating, we replace the sequence of translatable events with the events from the translation. We do this while leaving the structure of the document unchanged.
The result of this is a much more robust system: editing one list item no longer impacts adjacent list items, moving a paragraph into a block quote no longer changes the paragraph.
As a side effect of how we turn events into messages, links are now all expanded. This makes the messages larger, but it removes a common source of errors where ‘[foo][1]’ would end up pointing to the wrong location if the reference link was updated.
Part of #19.