Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow override of default escape/unescape behavior in more situations #739

Merged
merged 3 commits into from
May 31, 2024

Conversation

phdavis1027
Copy link
Contributor

Addresses #734, adds tests to verify that no functionality was lost.

@phdavis1027 phdavis1027 changed the title Escape unescape Allow override of default escape/unescape behavior in more situations Apr 20, 2024
@codecov-commenter
Copy link

codecov-commenter commented Apr 21, 2024

Codecov Report

Attention: Patch coverage is 41.81818% with 32 lines in your changes are missing coverage. Please review.

Project coverage is 61.65%. Comparing base (5d76174) to head (b6d989a).
Report is 19 commits behind head on master.

Files Patch % Lines
src/escape.rs 40.47% 25 Missing ⚠️
examples/custom_entities.rs 0.00% 5 Missing ⚠️
src/de/mod.rs 66.66% 1 Missing ⚠️
src/events/attributes.rs 0.00% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #739      +/-   ##
==========================================
+ Coverage   61.24%   61.65%   +0.40%     
==========================================
  Files          39       39              
  Lines       16277    16644     +367     
==========================================
+ Hits         9969    10262     +293     
- Misses       6308     6382      +74     
Flag Coverage Δ
unittests 61.65% <41.81%> (+0.40%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Mingun
Copy link
Collaborator

Mingun commented May 26, 2024

I squashed all commits, refactor our code a bit and move tests to doctests.

I didn't include escape_with method, because it is not used in quick-xml and it is easy to implement it outside of quick-xml. I'm afraid these changes could negatively impact performance. That requires investigation, and if the impact would be small, we could include such method. For now I leave my implementation here (WARNING: it does not correctly process all characters!):

/// Escapes an `&str` with custom replacements. This method does not have any predefined escapes.
///
/// # Example
///
/// ```
/// # use quick_xml::escape::escape_with;
/// # use pretty_assertions::assert_eq;
/// let custom_resolver = |ch: u8| match ch {
///     b'\"' => Some("&amp;"),
///     b'}' => Some("&lcurl;"),
///     b'{' => Some("&rcurl;"),
///     _ => None,
/// };
///
/// let raw = "<&\"'>";
/// let unchanged = escape_with(raw, |_| None);
///
/// let custom = r##"f'A {weird} f-string that says "Hi!"'"##;
/// let changed = r##"f'A &rcurl;weird&lcurl; f-string that says &amp;Hi!&amp;'"##;
///
/// assert_eq!(escape_with(custom, custom_resolver), changed);
/// assert_eq!(raw, unchanged);
/// ```
pub fn escape_with<'input, 'entity, F>(raw: &'input str, mut escape_chars: F) -> Cow<'input, str>
where
    // the lifetime of the output comes from a capture or is `'static`
    F: FnMut(u8) -> Option<&'entity str>,
{
    let bytes = raw.as_bytes();
    let mut escaped = None;
    let mut pos = 0;
    for (i, b) in bytes.iter().enumerate() {
        if let Some(replacement) = escape_chars(*b) {
            if escaped.is_none() {
                escaped = Some(Vec::with_capacity(raw.len()));
            }
            let escaped = escaped.as_mut().expect("initialized");
            escaped.extend_from_slice(&bytes[pos..i]);
            escaped.extend_from_slice(replacement.as_bytes());

            pos = i + 1;
        }
    }

    if let Some(mut escaped) = escaped {
        if let Some(raw) = bytes.get(pos..) {
            escaped.extend_from_slice(raw);
        }
        // WARNING: Can be unsafe because we operate on bytes of UTF-8 input and
        // could break character
        Cow::Owned(String::from_utf8(escaped).unwrap())
    } else {
        Cow::Borrowed(raw)
    }
}

Edit: actually, the problem with this escape_with function with that it is incorrect. It could replace one byte inside of a character and produce invalid UTF-8.

src/escape.rs Outdated
Comment on lines 257 to 258
} else if let Some(value) = named_entity(pat) {
unescaped.push_str(value);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also considering to completely remove this if then the unescape_with will never resolve predefined entities and instead caller will need to call functions for default processing which we make public. Is that something that we want to do?

Copy link
Collaborator

@dralley dralley May 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be poor UX to force the user to declare all predefined entities. These are built into every other XML parser so I don't see a reason to diverge.

Actually, I think I'm fine with this suggestion, if it means that you pass in a "default resolver" function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I mean making named_entity public (as resolve_predefined_entity + resolve_xml_entity and resolve_html5_entity) so the user can call it after processing it's own entities.

I have no idea why such a complex module structure was used
@Mingun Mingun force-pushed the escape-unescape branch 2 times, most recently from cf27e0d to ecb6edf Compare May 30, 2024 19:29
@Mingun Mingun linked an issue May 30, 2024 that may be closed by this pull request
Mingun and others added 2 commits May 31, 2024 01:07
- `quick_xml::escape::resolve_predefined_entity`
- `quick_xml::escape::resolve_xml_entity`
- `quick_xml::escape::resolve_html5_entity`
@Mingun Mingun merged commit 3edb78b into tafia:master May 31, 2024
6 checks passed
Mingun added a commit to Mingun/quick-xml that referenced this pull request Jun 21, 2024
Because since tafia#739 custom entity resolution function have precedence over standard one,
so user can implement resolution of HTML entities by yourself.
Mingun added a commit to Mingun/quick-xml that referenced this pull request Jun 26, 2024
Mingun added a commit to Mingun/quick-xml that referenced this pull request Jun 26, 2024
Mingun added a commit to Mingun/quick-xml that referenced this pull request Jun 26, 2024
@Mingun Mingun mentioned this pull request Jun 26, 2024
Mingun added a commit to Mingun/quick-xml that referenced this pull request Jun 26, 2024
Mingun added a commit to Mingun/quick-xml that referenced this pull request Jun 27, 2024
Mingun added a commit to Mingun/quick-xml that referenced this pull request Jun 28, 2024
Mingun added a commit to Mingun/quick-xml that referenced this pull request Jun 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Question about changing un/escape behavior
4 participants