Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add reserved namespace bindings. #545

Merged
merged 3 commits into from
Sep 6, 2023
Merged

Conversation

wt
Copy link
Contributor

@wt wt commented Jan 25, 2023

This adds xml and xmlns namespace bindings. These are defined at https://www.w3.org/TR/xml-names11/#xmlReserved.

This is a a start of the work needed for #464

Copy link
Collaborator

@Mingun Mingun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, some minor notes for maintainability. Also, please add an entry to Changelog.md and probably one test to unit tests with a whole non-UTF-8 XML to ensure, that our expectations really works

src/name.rs Outdated Show resolved Hide resolved
src/name.rs Outdated Show resolved Hide resolved
src/name.rs Outdated Show resolved Hide resolved
src/name.rs Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Jan 25, 2023

Codecov Report

Merging #545 (a76d401) into master (1be35e1) will increase coverage by 0.21%.
The diff coverage is 85.34%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the GitHub App Integration for your organization. Read more.

@@            Coverage Diff             @@
##           master     #545      +/-   ##
==========================================
+ Coverage   64.22%   64.44%   +0.21%     
==========================================
  Files          36       36              
  Lines       17083    17284     +201     
==========================================
+ Hits        10972    11138     +166     
- Misses       6111     6146      +35     
Flag Coverage Δ
unittests 64.44% <85.34%> (+0.21%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
src/errors.rs 10.43% <0.00%> (-0.58%) ⬇️
src/name.rs 86.99% <87.33%> (-0.84%) ⬇️
src/reader/ns_reader.rs 63.39% <100.00%> (-0.33%) ⬇️

@wt wt force-pushed the bind_reserved_namespaces branch 2 times, most recently from 088f041 to 62c6215 Compare January 26, 2023 04:20
@wt
Copy link
Contributor Author

wt commented Jan 26, 2023

Okay, so I have declared the predefined namespaces in what I believe is a better way and added some of the docs you asked for. Is this heading in the right direction? By my calculation, the following still needs to be done:

  • tests for overriding xml namespace (were there other override tests you were looking for?)
  • ???

Copy link
Collaborator

@Mingun Mingun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For override tests: check the behavior of other mature XML libraries, for example, Java. Then write the tests in tests/namespaces.rs. For example, check this XML document:

<root xml:attribute="ok">
  <xml:element/>

  <override xmlns:xml="overrided non-overradable prefix" xml:attribute="oh!">
    <xml:element/>
  </override>

  <!-- remove standard prefix? -->
  <reset xmlns:xml="" xml:attribute="eh?">
    <xml:element/>
  </reset>
</root>

How other libraries will parse that document? If it will be parsed, what the QNames will be returned?

It also would be excellent if you add test to tests/encoding.rs with some non-UTF-8 document and check that we correctly process xml: and xmlns: attributes:

  • check that we detect and use non-UTF-8 encoding by querying .decoder()
  • check that we element and attribute with undeclared prefix resolved correctly (well, we can even made XML_NAMESPACE and XMLNS_NAMESPACE our public API -- as a Namespace constants)
  • at the same time check, that other undeclared prefix will resolved as unknown

src/name.rs Outdated Show resolved Hide resolved
src/name.rs Outdated Show resolved Hide resolved
src/name.rs Outdated Show resolved Hide resolved
@wt
Copy link
Contributor Author

wt commented Jun 7, 2023

Just so we know, the std versions of the oncecell stuff that I used in this PR is being stabilized as we speak. Part of the work landed in 1.70. The lazy initialized types have not yet landed in stable. However, that may happen in one of the next couple releases. My plan is to update this PR when that lands.

@dralley
Copy link
Collaborator

dralley commented Jul 10, 2023

@wt Even once it's merged, I don't know that we want to immediately bump the minimum supported Rust version up to the latest one. If that is the only thing holding this PR up, if it's ready and @Mingun likes it, then I would say just merge it now with once_cell, and we can drop the dependency later on.

You can leave a comment alongside, or perhaps file an issue, as a reminder to do so.

@dralley
Copy link
Collaborator

dralley commented Jul 13, 2023

@wt Do you intend to finish this work? I'm going to draft it for now.

@dralley dralley marked this pull request as draft July 13, 2023 15:28
@wt
Copy link
Contributor Author

wt commented Aug 14, 2023

Can we just use the once_cell crate via this then? This crate will provide backward compatibility until this is stabilized, and it has been around for while.

@dralley
Copy link
Collaborator

dralley commented Aug 14, 2023

Can we just use the once_cell crate via this then? This crate will provide backward compatibility until this is stabilized, and it has been around for while.

Yup that's exactly what I'm suggesting. Use the once_cell crate, so that we don't have to wait for it to hit the stdlib and/or immediately bump the MSRV when it does.

@wt wt force-pushed the bind_reserved_namespaces branch from 62c6215 to 2e32b59 Compare August 14, 2023 01:18
@dralley dralley marked this pull request as ready for review August 14, 2023 01:27
@dralley
Copy link
Collaborator

dralley commented Aug 14, 2023

Also this should have a changelog entry

@dralley
Copy link
Collaborator

dralley commented Aug 14, 2023

Looks like once_cell 1.18 has an MSRV of 1.60 whereas the current MSRV for this library is 1.52.

I'm fine with bumping the MSRV up to 1.60 (released April 2022). Most linux distros will have something newer than that but still a few versions old which is the main reason to not bump all the way to the current release.

The MSRV bump should get a changelog entry also.

@wt
Copy link
Contributor Author

wt commented Aug 14, 2023

Okay, so I hammered out a test xml parser with Xerces. It does in fact map to the well known namespaces to their appropriate URIs the same way this change does. See this for more.

This doc:

<document
    xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
    xsi:noNamespaceSchemaLocation='document.xsd' 
    xml:id='fjdlksfjds'>
</document>

produces the following parsed attrs for the document tag:

--> [document], Child Element Count = 0
attr: http://www.w3.org/XML/1998/namespace:id
attr: http://www.w3.org/2000/xmlns/:xsi
attr: http://www.w3.org/2001/XMLSchema-instance:noNamespaceSchemaLocation

Also, the failing tests appear to not be caused by my change. Do you agree with this?

I am also trying to figure out how to address your comment about checking the encoding. Can you help me figure out what it is that you want there?

@wt wt force-pushed the bind_reserved_namespaces branch from 45a9e57 to 65982bc Compare August 14, 2023 09:09
@wt
Copy link
Contributor Author

wt commented Aug 14, 2023

FWIW, Xerces also does not allow overriding the xml namespace to another URI or to no URI. I guess this is the behavior we would want in quick-xml?

And I added a changelog.

Here's a possible task list to finish this change:

  • disallow overriding the xml and xmlns namespace prefixes
  • add integration test that tests a full document
  • properly handle non-ascii compatible encodings

Does that cover everything?

@wt wt force-pushed the bind_reserved_namespaces branch 3 times, most recently from bc413c6 to c530f9d Compare August 14, 2023 09:55
Copy link
Collaborator

@Mingun Mingun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just use the once_cell crate via this then? This crate will provide backward compatibility until this is stabilized, and it has been around for while.

There is no need to use hashmap and, as a result, once_cell, or Lazy, at all. Just two entries can be just hardcoded in place of use via match, or you can use &[(&str, &str)] constant and use linear search -- this will be faster that any map on such sizes and will not require lazy construction and additional dependencies.

And I still prefer to test each well-known prefix individually (one test for prefix, you can write a small macro if you don't like to duplicate code).

@Mingun
Copy link
Collaborator

Mingun commented Aug 14, 2023

Those failures appeared since serde v1.0.181 due to serde-rs/serde#2505:

failures:
    enum_::adjacently_tagged::nested_struct::elements
    enum_::adjacently_tagged::newtype::elements
    enum_::adjacently_tagged::struct_::elements
    enum_::adjacently_tagged::tuple_struct::elements
    enum_::adjacently_tagged::unit::elements

I've create #630 for investigation.

@wt
Copy link
Contributor Author

wt commented Aug 14, 2023

The MSRV test is weird. Why is is failing for 1.52 when the min stated version is 1.60 in this change?

@dralley
Copy link
Collaborator

dralley commented Aug 14, 2023

The MSRV test is weird. Why is is failing for 1.52 when the min stated version is 1.60 in this change?

It's hardcoded https://github.com/tafia/quick-xml/blob/master/.github/workflows/rust.yml#L17

But I suppose it's not necessary to change if you go w/ the linear search approach. With only a small number of strings to check, @Mingun is correct that it's not really going to be worth the Hashmap

@wt
Copy link
Contributor Author

wt commented Aug 15, 2023

I'll rewrite without the HashMap stuff and change the tests to be now explicit.

Do y'all agree that the semantics around using the xml and xmlns prefixes should be enforced as they are in Java and Xerces?

@wt wt force-pushed the bind_reserved_namespaces branch 2 times, most recently from d0d94eb to 7855788 Compare August 15, 2023 09:36
@wt wt force-pushed the bind_reserved_namespaces branch 2 times, most recently from 5f170cd to 5d4244f Compare August 22, 2023 07:21
@wt
Copy link
Contributor Author

wt commented Aug 22, 2023

rebased

@dralley
Copy link
Collaborator

dralley commented Aug 22, 2023

@wt There's are a few small compilation errors to address.

@Mingun Does this implementation strategy look reasonable to you?

@wt wt force-pushed the bind_reserved_namespaces branch 2 times, most recently from 8ed0c73 to b03ef36 Compare August 23, 2023 19:36
@wt
Copy link
Contributor Author

wt commented Aug 23, 2023

Fixed the lint issues for the compilation. I ran the individual tests, but no cargo test after the previous rebase.

@wt wt force-pushed the bind_reserved_namespaces branch 2 times, most recently from e775af5 to b2f1be3 Compare August 23, 2023 21:52
src/errors.rs Outdated
#[derive(Clone, Debug)]
pub struct ReservedNamespacePrefixError {
pub prefix: String,
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason to make separate structs instead of directly embedding the strings in the enum variants? I don't think we use this pattern in many, if any, places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've collapsed them into a single struct. I think improving the errors should probably be left for another change. I hope this addresses your concern.

src/name.rs Outdated Show resolved Hide resolved
src/name.rs Outdated Show resolved Hide resolved
@wt wt force-pushed the bind_reserved_namespaces branch 2 times, most recently from 6a7a543 to 1c7cf98 Compare August 25, 2023 09:25
@wt
Copy link
Contributor Author

wt commented Aug 25, 2023

Okay, I think the latest version is ready. I've added tests for adding namespace contexts to the resolver. I added these specific tests:

  • xml prefix explicit sets work when the uri is correct
  • xml prefix sets fail when setting to incorrect url
  • setting other prefixes to the xml uri fails
  • setting xmlns prefix explicitly fails
  • setting other prefixes to the xmlns uri fails
  • setting xml to the xmlns prefix fails (I added this as the logic is overlapping in a way that is easy to biff)

I believe that this is ready to a review. @Mingun would you mind taking a look? I would love to get this landed soon.

@wt wt force-pushed the bind_reserved_namespaces branch from 1c7cf98 to 9887b68 Compare August 25, 2023 09:47
@wt
Copy link
Contributor Author

wt commented Aug 28, 2023

What needs to be done to close out this change?

@Mingun
Copy link
Collaborator

Mingun commented Aug 28, 2023

Don't worry, I don't forgot about this. I was on vacation so tracked issues very lazily. I'll look during this week.

@Mingun
Copy link
Collaborator

Mingun commented Sep 3, 2023

Thanks for working on this PR and investigating all stuff about handling this in other libraries!

I'm not finishing my review yet. Instead of giving feedback in the form of comments, I decided that it would be easier to implement my vision myself. Changes I have broken down into several commits so you can see the reason for them.

I structurized tests and added missing parts (what became apparent after structuring). I also simplified checks and now, I think, it is more easily to understand the code.

I plan to finish review next week.

@wt
Copy link
Contributor Author

wt commented Sep 4, 2023

I'm glad you were able to find some improvement. I'm not picky as long as the functionality is there. When do you think this will land in the main branch?

@Mingun
Copy link
Collaborator

Mingun commented Sep 4, 2023

I think this week

Copy link
Collaborator

@Mingun Mingun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I think, it is done. I'll merge it next few days if there will be no objections

Mingun and others added 3 commits September 6, 2023 19:48
No need to store it outside of resolver, because its lifetime is the same as resolver lifetime
This adds xml and xmlns namespace bindings. These are defined at
https://www.w3.org/TR/xml-names11/#xmlReserved.
@Mingun Mingun merged commit 349cb3f into tafia:master Sep 6, 2023
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants