You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've the impression that the encoding value of ParserOptions is not evaluated correctly through the crate (note: to reproduce the bug, you've to use Parser::default_html() and not Parser::default())
To confirm this, I've tested the "equivalent" code in plain C with libxml 2.11.3:
// Process encoding.let encoding_cstring:Option<CString> =
parser_options.encoding.map(|v| CString::new(v).unwrap());let encoding_ptr = match encoding_cstring {Some(v) => v.as_ptr(),None => DEFAULT_ENCODING,};// Process url.let url_ptr = DEFAULT_URL;
If parser encoding is initialized with Some("utf-8"), encoding_ptr is not valid just before // Process url (it points to a null char).
So the call to the binding htmlReadMemory is made with no encoding... The unsafe part of the code is my Rust limit of understanding so I'm unable to see if there is something bad here. I hope my issue is clear, and, I should have started by this, thank you for your work on this crate !
Regards,
Jc
The text was updated successfully, but these errors were encountered:
I hit this one as well. It think it is caused by libxml2 changing the default encoding when NULL is passed from utf-8 to ISO-8859-1 which apparently is more correct. But its breaking a lot of real world use cases.
So maybe the encoding override in this crate never worked and nobody noticed since the default was utf-8 anyway?
Hi,
I've a strange encoding issue started with libxml 2.11.1+, (released a week ago https://gitlab.gnome.org/GNOME/libxml2/-/tags) with libxml rust crate 0.3.2.
My sample:
<data>café</data>
normalize-space(//data)
.Sample code:
With libxml 2.11.0, the value printed is
café
, with libxml 2.11.1 the value printed iscafé
:I've the impression that the
encoding
value ofParserOptions
is not evaluated correctly through the crate (note: to reproduce the bug, you've to useParser::default_html()
and notParser::default()
)To confirm this, I've tested the "equivalent" code in plain C with libxml 2.11.3:
My suspision is in
rust-libxml/src/parser.rs
Line 292 in a10a5a6
When I debug the following code:
If parser encoding is initialized with Some("utf-8"),
encoding_ptr
is not valid just before// Process url
(it points to a null char).So the call to the binding
htmlReadMemory
is made with no encoding... The unsafe part of the code is my Rust limit of understanding so I'm unable to see if there is something bad here. I hope my issue is clear, and, I should have started by this, thank you for your work on this crate !Regards,
Jc
The text was updated successfully, but these errors were encountered: