Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix buffer_position() after resuming parsing after IllFormed errors #689

Merged
merged 11 commits into from
Dec 1, 2023

Conversation

Mingun
Copy link
Collaborator

@Mingun Mingun commented Nov 29, 2023

Because since #677 we can continue parsing after Error::IllFormed we no longer can change buffer_position() so that it points to useful position of error. The best solution would be to add position to the Error itself, in form of:

struct Error {
  kind: ErrorKind, // former Error enum
  offset: usize,
}

but that would require massive changes, also in the code that I plan to remove soon during rewrite. To keep changes relative small and do not touch that code I decided to add a new error_position() getter which would report position of the error. After rewrite I'll try to implement the best solution.

reader-error.rs tests was updated to check buffer_position() result after resuming parsing after error. Also a couple of new cases was added, which also allowed me to find a bug in one case.

Dead code analysis should be able to remove this implementation from the final binary
…ltaneously

I found that that way it is more easy to debug tests
@Mingun Mingun added the bug label Nov 29, 2023
@Mingun Mingun requested a review from dralley November 29, 2023 20:01
@codecov-commenter
Copy link

codecov-commenter commented Nov 29, 2023

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (c383849) 65.47% compared to head (f295b01) 65.06%.

Files Patch % Lines
src/reader/mod.rs 93.75% 2 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #689      +/-   ##
==========================================
- Coverage   65.47%   65.06%   -0.42%     
==========================================
  Files          38       38              
  Lines       18025    17942      -83     
==========================================
- Hits        11802    11674     -128     
- Misses       6223     6268      +45     
Flag Coverage Δ
unittests 65.06% <96.07%> (-0.42%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dralley
Copy link
Collaborator

dralley commented Nov 30, 2023

Ignore that, I hit the button early by mistake.

@@ -14,23 +14,27 @@ macro_rules! ok {
fn borrowed() {
let mut reader = Reader::from_str($xml);
reader.config_mut().enable_all_checks(true);
assert_eq!(reader.read_event().unwrap(), $event);
assert_eq!(reader.read_event().unwrap(), $event, "Reader");
Copy link
Collaborator

@dralley dralley Nov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just split it up into individual tests under a mod reader and mod ns_reader?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I should do that. Will rewrite

@dralley
Copy link
Collaborator

dralley commented Nov 30, 2023

I still have one more commit to review

…module

That tests would panic after next commit which will fix behavior for new tests
that will be added soon, because they calls internal method in a state in which
Reader will never be in normal circumstances
…cial meaning in a markup

failures (18):
    syntax::doctype::unclosed04::async_tokio
    syntax::doctype::unclosed04::borrowed
    syntax::doctype::unclosed04::buffered
    syntax::doctype::unclosed07::async_tokio
    syntax::doctype::unclosed07::borrowed
    syntax::doctype::unclosed07::buffered
    syntax::doctype::unclosed10::async_tokio
    syntax::doctype::unclosed10::borrowed
    syntax::doctype::unclosed10::buffered
    syntax::doctype::unclosed13::async_tokio
    syntax::doctype::unclosed13::borrowed
    syntax::doctype::unclosed13::buffered
    syntax::doctype::unclosed16::async_tokio
    syntax::doctype::unclosed16::borrowed
    syntax::doctype::unclosed16::buffered
    syntax::doctype::unclosed19::async_tokio
    syntax::doctype::unclosed19::borrowed
    syntax::doctype::unclosed19::buffered
failures (66):
    ill_formed::double_hyphen_in_comment2::ns_reader::async_tokio
    ill_formed::double_hyphen_in_comment2::ns_reader::borrowed
    ill_formed::double_hyphen_in_comment2::ns_reader::buffered
    ill_formed::double_hyphen_in_comment2::reader::async_tokio
    ill_formed::double_hyphen_in_comment2::reader::borrowed
    ill_formed::double_hyphen_in_comment2::reader::buffered
    ill_formed::double_hyphen_in_comment3::ns_reader::async_tokio
    ill_formed::double_hyphen_in_comment3::ns_reader::borrowed
    ill_formed::double_hyphen_in_comment3::ns_reader::buffered
    ill_formed::double_hyphen_in_comment3::reader::async_tokio
    ill_formed::double_hyphen_in_comment3::reader::borrowed
    ill_formed::double_hyphen_in_comment3::reader::buffered
    ill_formed::double_hyphen_in_comment4::ns_reader::async_tokio
    ill_formed::double_hyphen_in_comment4::ns_reader::borrowed
    ill_formed::double_hyphen_in_comment4::ns_reader::buffered
    ill_formed::double_hyphen_in_comment4::reader::async_tokio
    ill_formed::double_hyphen_in_comment4::reader::borrowed
    ill_formed::double_hyphen_in_comment4::reader::buffered
    ill_formed::mismatched_end_tag2::ns_reader::async_tokio
    ill_formed::mismatched_end_tag2::ns_reader::borrowed
    ill_formed::mismatched_end_tag2::ns_reader::buffered
    ill_formed::mismatched_end_tag2::reader::async_tokio
    ill_formed::mismatched_end_tag2::reader::borrowed
    ill_formed::mismatched_end_tag2::reader::buffered
    ill_formed::mismatched_end_tag3::ns_reader::async_tokio
    ill_formed::mismatched_end_tag3::ns_reader::borrowed
    ill_formed::mismatched_end_tag3::ns_reader::buffered
    ill_formed::mismatched_end_tag3::reader::async_tokio
    ill_formed::mismatched_end_tag3::reader::borrowed
    ill_formed::mismatched_end_tag3::reader::buffered
    ill_formed::mismatched_end_tag4::ns_reader::async_tokio
    ill_formed::mismatched_end_tag4::ns_reader::borrowed
    ill_formed::mismatched_end_tag4::ns_reader::buffered
    ill_formed::mismatched_end_tag4::reader::async_tokio
    ill_formed::mismatched_end_tag4::reader::borrowed
    ill_formed::mismatched_end_tag4::reader::buffered
    ill_formed::missing_doctype_name1::ns_reader::async_tokio
    ill_formed::missing_doctype_name1::ns_reader::borrowed
    ill_formed::missing_doctype_name1::ns_reader::buffered
    ill_formed::missing_doctype_name1::reader::async_tokio
    ill_formed::missing_doctype_name1::reader::borrowed
    ill_formed::missing_doctype_name1::reader::buffered
    ill_formed::missing_doctype_name2::ns_reader::async_tokio
    ill_formed::missing_doctype_name2::ns_reader::borrowed
    ill_formed::missing_doctype_name2::ns_reader::buffered
    ill_formed::missing_doctype_name2::reader::async_tokio
    ill_formed::missing_doctype_name2::reader::borrowed
    ill_formed::missing_doctype_name2::reader::buffered
    ill_formed::unmatched_end_tag1::ns_reader::async_tokio
    ill_formed::unmatched_end_tag1::ns_reader::borrowed
    ill_formed::unmatched_end_tag1::ns_reader::buffered
    ill_formed::unmatched_end_tag1::reader::async_tokio
    ill_formed::unmatched_end_tag1::reader::borrowed
    ill_formed::unmatched_end_tag1::reader::buffered
    ill_formed::unmatched_end_tag2::ns_reader::async_tokio
    ill_formed::unmatched_end_tag2::ns_reader::borrowed
    ill_formed::unmatched_end_tag2::ns_reader::buffered
    ill_formed::unmatched_end_tag2::reader::async_tokio
    ill_formed::unmatched_end_tag2::reader::borrowed
    ill_formed::unmatched_end_tag2::reader::buffered
    ill_formed::unmatched_end_tag3::ns_reader::async_tokio
    ill_formed::unmatched_end_tag3::ns_reader::borrowed
    ill_formed::unmatched_end_tag3::ns_reader::buffered
    ill_formed::unmatched_end_tag3::reader::async_tokio
    ill_formed::unmatched_end_tag3::reader::borrowed
    ill_formed::unmatched_end_tag3::reader::buffered
src/reader/mod.rs Outdated Show resolved Hide resolved
src/reader/mod.rs Outdated Show resolved Hide resolved
@@ -95,7 +100,7 @@ impl ReaderState {
off += p + 1;
// if next byte after `-` is also `-`, return an error
if buf[3 + off] == b'-' {
self.offset -= len - 2 - p;
self.last_error_offset = self.offset - len + 2 + p;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mentioned that the constants are being cleaned up in the rewrite, yes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, with current approach some constants are always will be there because events don't include < and >, but the parser reports offsets that includes those characters. I should just add explanation to all such constants, but in new parser they will be more obvious. In that case, for example, self.offset just after >, buf contains !-- con--tent -- and p is counted from byte after <!--:

<!-- con--tent -->:
 ~~~~~~~~~~~~~~~~ : - buf
  : |---p         : - p is counted from | (| is 0)
  : :   :         ^ - self.offset
  ^ :   :           - self.offset - len
    ^   :           - self.offset - len + 2
        ^           - self.offset - len + 2 + p

Mingun and others added 2 commits December 1, 2023 21:10
In the perfect world we would add offset to the `Error` type, but right now
this is impractical, because this significantly changes some API that would be
removed soon

Co-authored-by: Daniel Alley <dalley@redhat.com>
@Mingun Mingun merged commit 222532c into tafia:master Dec 1, 2023
6 checks passed
@Mingun Mingun deleted the pos-in-error branch December 1, 2023 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants