Introduce `Config` struct that holds parser configuration and implement #513 #677

Mingun · 2023-11-04T18:44:17Z

This PR changes a way how reader is configured. Instead of having builder methods in Reader / NsReader now they provide a config() / config_mut() methods:

the config() method introduces the ability to read parser configuration, that was not be possible until now;
the config_mut() method allow to set parser options. You can even to set all options at once or store them (the Config struct implements Serialize / Deserialize when serde-types feature is active)

To ensure, that the behavior was not changed, this PR also introduces a new integration test-suite which tests all possible parser options. That allowed me to find a bug in trim_text_end option, but it is hard to fix it today. Because I plan to rewrite parser (that task is actually mostly done) I just ignore that test for now.

As a nice bonus, after implementing new tests it become obvious how to implement #513, which was done.

Closes #513

Old tests for that option that comes from xml-rs are removed

Old tests for that option that comes from xml-rs are removed failures (ignored): trim_text_end::true_ Failure is ignored for now because it is hard to fix it in current implementation of a parser, but a new implementation coming soon, where that will be easy

failures: check_end_names::true_::mismatched_tags

Fixed: check_end_names::true_::mismatched_tags failures: dashes_in_comments

`dashes_in_comment` test repeat the `check_comments::true_` tests, so removed

… with access via `.config_mut()` accessor

text_trim_start and text_trim_end have retained their logical order

codecov-commenter · 2023-11-04T18:51:59Z

Codecov Report

Merging #677 (3875bdb) into master (b95b503) will increase coverage by 0.11%.
The diff coverage is 46.37%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

@@            Coverage Diff             @@
##           master     #677      +/-   ##
==========================================
+ Coverage   65.05%   65.16%   +0.11%     
==========================================
  Files          38       38              
  Lines       17837    17851      +14     
==========================================
+ Hits        11604    11633      +29     
+ Misses       6233     6218      -15

Flag	Coverage Δ
unittests	`65.16% <46.37%> (+0.11%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
src/de/mod.rs	`68.88% <100.00%> (+0.03%)`	⬆️
src/errors.rs	`16.08% <ø> (-3.50%)`	⬇️
src/reader/async_tokio.rs	`56.94% <ø> (ø)`
src/reader/buffered_reader.rs	`85.51% <ø> (ø)`
src/reader/slice_reader.rs	`100.00% <ø> (ø)`
src/reader/state.rs	`99.34% <100.00%> (+0.61%)`	⬆️
src/writer.rs	`91.44% <ø> (ø)`
examples/custom_entities.rs	`0.00% <0.00%> (ø)`
examples/read_buffered.rs	`0.00% <0.00%> (ø)`
examples/read_texts.rs	`0.00% <0.00%> (ø)`
... and 4 more

... and 1 file with indirect coverage changes

dralley · 2023-11-04T23:45:28Z

tests/reader-config.rs

+
+    /// Self-closed elements should be reported as one `Empty` event
+    #[test]
+    fn false_() {


Perhaps "enabled" and "disabled" would be better names, to avoid keyword clashes?

I chose those words because if we will have other options that are not booleans, tests would be named as a config value

dralley · 2023-11-04T23:52:31Z

tests/reader-config.rs

+        );
+        assert_eq!(
+            reader.read_event().unwrap(),
+            Event::Comment(BytesText::new(" comment \t\r\n"))


Do we have any reason to support trimming the text values of "comments"? I cannot immediately think of a reason to do that, but perhaps one exists.

I think not, at least nobody request such feature. If such a request appears, we can add a separate option.

Generally speaking, I would delete the current trim options as they simply do not work correctly for text alternating with CDATA / comments / processing instructions, but I suppose that would break many users. I was thinking about renaming current Event into RawEvent and DeEvent to Event and give users stream of Events. The RawEvent then would be a low-level event which usually not needed by most users. That is very raw thoughts currently, so I decided to not do revolutional changes for now.

dralley · 2023-11-05T00:21:32Z

tests/reader-config.rs

+        #[test]
+        fn check_end_names_false() {
+            let mut reader = Reader::from_str("<root></root \t\r\n>");
+            reader.trim_markup_names_in_closing_tags(false);


What is it the reason that we only have this for closing tags, attributes?

I think, that this is optimization option. Usually end tags does not contain spaces before >, so if we will assume that the name ends immediately before the >, we could save some time. Such optimisation make sense only for the closing tags -- for opening tags we in any case should check if it has attributes and find the actual end of tag name.

dralley · 2023-11-05T01:31:49Z

src/reader/mod.rs

@@ -11,153 +11,189 @@ use crate::reader::state::ReaderState;

 use memchr;

-macro_rules! configure_methods {


I appreciate being able to ditch these macros.

dralley · 2023-11-05T01:38:17Z

fuzz/fuzz_targets/fuzz_target_1.rs

+    let config = reader.config_mut();
+    config.expand_empty_elements = true;
+    config.trim_text(true);
+    config.trim_text_end = true;


I slightly dislike the aesthetics here and inability to chain (because of converting methods to members), but ultimately it's an incredibly minor detail

Mingun added 11 commits November 4, 2023 22:45

Add explicit tests for expand_empty_elements reader option

47f3638

Old tests for that option that comes from xml-rs are removed

Add explicit tests for trim_text reader option

27cb615

Old tests for that option that comes from xml-rs are removed

Add explicit tests for trim_markup_names_in_closing_tags reader option

4176571

Add explicit tests for check_end_names reader option

86ccc0a

failures: check_end_names::true_::mismatched_tags

tafia#513: Allow to continue parsing after Error::IllFormed

7980448

Fixed: check_end_names::true_::mismatched_tags failures: dashes_in_comments

Add explicit tests for check_comments reader option

475a883

`dashes_in_comment` test repeat the `check_comments::true_` tests, so removed

Introduce .config() and .config_mut() and replace builder methods…

8101e29

… with access via `.config_mut()` accessor

Set the whole reader config in fuzzing tests

77af4b3

Sort options (mostly) alphabetically

75b4028

text_trim_start and text_trim_end have retained their logical order

Add explicit tests for trim_text_start reader option

3875bdb

Mingun added the enhancement label Nov 4, 2023

Mingun requested a review from dralley November 4, 2023 18:44

dralley reviewed Nov 4, 2023

View reviewed changes

dralley reviewed Nov 5, 2023

View reviewed changes

dralley approved these changes Nov 5, 2023

View reviewed changes

Mingun merged commit bbc7bda into tafia:master Nov 5, 2023
6 checks passed

Mingun deleted the config branch November 5, 2023 20:44

Mingun mentioned this pull request Nov 29, 2023

Fix buffer_position() after resuming parsing after IllFormed errors #689

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce `Config` struct that holds parser configuration and implement #513 #677

Introduce `Config` struct that holds parser configuration and implement #513 #677

Mingun commented Nov 4, 2023

codecov-commenter commented Nov 4, 2023

dralley Nov 4, 2023

Mingun Nov 5, 2023

dralley Nov 4, 2023

Mingun Nov 5, 2023

dralley Nov 5, 2023

Mingun Nov 5, 2023

dralley Nov 5, 2023

dralley Nov 5, 2023

		@@ -11,153 +11,189 @@ use crate::reader::state::ReaderState;

		use memchr;

		macro_rules! configure_methods {

Introduce Config struct that holds parser configuration and implement #513 #677

Introduce Config struct that holds parser configuration and implement #513 #677

Conversation

Mingun commented Nov 4, 2023

codecov-commenter commented Nov 4, 2023

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Introduce `Config` struct that holds parser configuration and implement #513 #677

Introduce `Config` struct that holds parser configuration and implement #513 #677