-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML tag abbr seems to confuse the parser #97
Comments
Thanks @HidenoriKobayashi! I think there might be two problems here. First, we have this test, which demonstrates that we chop up the text when we encounter a HTML tag (including a HTML comment): mdbook-i18n-helpers/i18n-helpers/src/lib.rs Lines 745 to 754 in 941a188
There is also this test, which demonstrates that we lose text immediately following a (block-level?) HTML tag: mdbook-i18n-helpers/i18n-helpers/src/lib.rs Lines 936 to 950 in 941a188
I think it's an open question if we could or should extract the HTML into the POT file. One concern would be if we should just extract This is only about what gets extracted and put into the POT file. The other half of this is how we weave things together when we translate the Markdown later. I just tried adding a little unit test for #[test]
fn test_translate_html() {
let catalog = create_catalog(&[("foo ", "FOO "), ("bar", "BAR"), (" baz", " BAZ")]);
assert_eq!(
translate("foo <b>bar</b> baz", &catalog),
"FOO <b>BAR</b> BAZ"
);
} it fails with
which shows that we lose a |
Further investigation shows that |
This is actually fair: Markdown does not distinguish between a line of text starting with I'm not entirely sure where where we should account for this, but perhaps |
So I think these two tests should work: diff --git a/i18n-helpers/src/bin/mdbook-gettext.rs b/i18n-helpers/src/bin/mdbook-gettext.rs
index b686be4..291bb2c 100644
--- a/i18n-helpers/src/bin/mdbook-gettext.rs
+++ b/i18n-helpers/src/bin/mdbook-gettext.rs
@@ -233,6 +233,15 @@ mod tests {
);
}
+ #[test]
+ fn test_translate_html() {
+ let catalog = create_catalog(&[("foo ", "FOO "), ("bar", "BAR"), (" baz", " BAZ")]);
+ assert_eq!(
+ translate("foo <b>bar</b> baz", &catalog),
+ "FOO <b>BAR</b> BAZ"
+ );
+ }
+
#[test]
fn test_translate_table() {
let catalog = create_catalog(&[
diff --git a/i18n-helpers/src/lib.rs b/i18n-helpers/src/lib.rs
index 3bf2a5f..484d348 100644
--- a/i18n-helpers/src/lib.rs
+++ b/i18n-helpers/src/lib.rs
@@ -590,6 +590,30 @@ mod tests {
assert_eq!(extract_events("", None), vec![]);
}
+ #[test]
+ fn extract_events_leading_whitespace() {
+ assert_eq!(
+ extract_events(" foo", None),
+ vec![
+ (1, Start(Paragraph)),
+ (1, Text(" foo".into())),
+ (1, End(Paragraph)),
+ ]
+ );
+ }
+
+ #[test]
+ fn extract_events_trailing_whitespace() {
+ assert_eq!(
+ extract_events("foo ", None),
+ vec![
+ (1, Start(Paragraph)),
+ (1, Text("foo ".into())),
+ (1, End(Paragraph)),
+ ]
+ );
+ }
+
#[test]
fn extract_events_paragraph() {
assert_eq!( |
The mixing of HTML and Markdown breaks our translation pipeline: we see the HTML and fail to parse things correctly. This might be google/mdbook-i18n-helpers#97, but I'm not 100% sure. The fix is to make put the HTML on its own line: then the Markdown is parsed again inside.
The mixing of HTML and Markdown breaks our translation pipeline: we see the HTML and fail to parse things correctly. This might be google/mdbook-i18n-helpers#97, but I'm not 100% sure. The fix is to make put the HTML on its own line: then the Markdown is parsed again inside. Fixes #1527.
@kdarkhan, the problem here is about how we lose a |
I believe this should be resolved now with #195 merged. |
I think so too, let's close it! |
I'm redirected to here from google/comprehensive-rust#1284.
The source:
https://github.com/google/comprehensive-rust/blob/a38a33c8fba58678d0a9127d9644242bce41ed94/src/bare-metal/microcontrollers/probe-rs.md
The po file (bit outdated but updating it does not resolve the issue):
https://github.com/google/comprehensive-rust/blob/a38a33c8fba58678d0a9127d9644242bce41ed94/po/ja.po#L13420
Now, if I make this change (just for testing purpose)
I get a result like this:
The strange thing about this is that string
cargo-embed
got merged into the last item in the list.The text was updated successfully, but these errors were encountered: