Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EventWriter: escape '>' in characters #20

Merged
merged 1 commit into from
Mar 31, 2024

Conversation

mautier
Copy link

@mautier mautier commented Mar 30, 2024

Fixes an issue where the emitter would produce XML which cannot be parsed back (the reader panics):

  • Writer emits the characters ]]>.
  • Reader interprets this as a CDATA suffix instead of the literal ]]>, and panics.

This issue can occur in particular when processing dumps of MediaWiki markup which uses [[...]] to denote links: Let [[x]]>0.

A simple fix is to escape all > characters in text when emitting XML. This avoids accidentally forming ]]> substrings in the XML.

Fixes an issue where the emitter would produce XML which cannot be
parsed back (the reader panics):
- Writer emits the characters `]]>`.
- Reader interprets this as a CDATA suffix instead of the literal `]]>`,
  and panics.

This issue can occur in particular when processing dumps of MediaWiki
markup which uses `[[...]]` to denote links: `Let [[x]]>0`.

A simple fix is to escape all `>` characters in text when emitting XML.
This avoids accidentally forming `]]>` substrings.
@kornelski kornelski merged commit d8216c9 into kornelski:main Mar 31, 2024
2 checks passed
@mautier mautier deleted the escape_gt_in_writer branch April 1, 2024 12:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants