Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: sax parser encoding overrides #3282

Closed
wants to merge 5 commits into from
Closed

Conversation

flavorjones
Copy link
Member

@flavorjones flavorjones commented Jul 5, 2024

What problem is this PR intended to solve?

  • XML::SAX::Parser#parse_memory now accepts an optional encoding argument. When not provided, the parser will fall back to the encoding passed to the initializer, and then fall back to autodetection.
  • XML::SAX::ParserContext.memory now accepts an optional encoding_id argument. When not provided, the encoding will be autodetected.
  • XML::SAX::ParserContext.io's encoding_id argument is now optional, and when not provided will default to autodetecting the encoding.

Closes #918

Have you included adequate test coverage?

Yes, added through test coverage for both XML and HTML4 sax parsers.

Does this change affect the behavior of either the C or the Java implementations?

Updated both implementations.

- xmlGetActualEncoding is an internal-only libxml2 function which we
  can't use (yet?)
- the ctxt->input->encoding bit is no longer needed as of libxml 2.12
Previously, this functionality worked fine for `#parse_io` but didn't
work for `#parse_memory`.

This change introduces a new optional encoding parameter to
`SAX::Parser#parse_memory` and `SAX::ParserContext.memory`, and makes
sure to use that encoding or the one passed to the Parser's initializer.

This change also makes optional the encoding_id parameter to
`SAX::ParserContext.io`, which was previously required.

Closes #918
@flavorjones flavorjones changed the title 918 sax parser encoding fix: sax parser encoding overrides Jul 5, 2024
@flavorjones flavorjones force-pushed the 918-sax-parser-encoding branch 2 times, most recently from f561698 to 11f29a7 Compare July 5, 2024 18:36
@flavorjones
Copy link
Member Author

I don't like how this ended up, will open a different PR that unifies the two SAX::Parser classes and the two SAX::ParserContext classes, and fixes the encoding issues in a more fundamental way.

@flavorjones flavorjones closed this Jul 6, 2024
@flavorjones
Copy link
Member Author

See #3288

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant