-
Notifications
You must be signed in to change notification settings - Fork 238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for binary blobs inside xml tags #555
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not complete review.
It would be much easier for review if non-functional changes (like renames) was in its own commit, so I can review it separately and then focus only on functional changes. In that case, however, I prefer to not rename Text
to Content
. The "text" here should be treated as defined in the XML specification -- the piece of input that does not form a markup.
I'm still not sure how I feel about adding a bunch of non-standard XML extensions in general, but at minimum I think it would need to be under of a non-default-enabled feature flag. I agree with @Mingun, that the current API needs to reflect standard XML definitions and procedures, so you should make a new type rather than changing Text. However this may also be difficult to integrate with the planned work on encodings because it breaks the expectation that a file that declares its encoding as X can actually be decoded in its entirety as X. That makes me uncomfortable. I'm also not sure this is reliable if the binary blobs can contain, for example, un-escaped < characters. How does this format handle that? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I agree that it is better to support non-standard extensions wherever possible, the way of doing that here will be incompatible with changes from #520.
Moreover, this PR contains two changes:
- the first one to support raw binary data. I would be happy has this feature, but it require more design efforts. I prefer to focus on valid ways of storing binary data first (by specifying rules for converting binary data to strings, for example, in base64 encoding). Unfortunately, I have to reject PR in his current form. I'll leave it open if you plan to adapt it after landing Merge consequent text and CDATA events into one string #520, but that will probably require a significant rework.
- serializing structs (and not maps) for text serializer, that should produce only text content, and not structured content. This change is totally unrelated to the binary serialization and should be reverted
@@ -133,7 +133,7 @@ impl<'i, W: Write> Serializer for ContentSerializer<'i, W> { | |||
type SerializeTupleStruct = Self; | |||
type SerializeTupleVariant = ElementSerializer<'i, W>; | |||
type SerializeMap = Impossible<Self::Ok, Self::Error>; | |||
type SerializeStruct = Impossible<Self::Ok, Self::Error>; | |||
type SerializeStruct = Struct<'i, W>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the wrong change. Content serializer represents data in a text, or, in your case, a binary, form. Structs and map a complex data objects with internal structure and cannot be serialized in such unstructured manner directly. If you want that, you should use #[serde(serialize_with)]
attribute and convert your types to text / binary data manually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, thanks!
@@ -189,7 +192,7 @@ mod without_root { | |||
e | |||
), | |||
} | |||
assert_eq!(buffer, ""); | |||
assert!(buffer.is_empty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert_eq
used for purpose because it is from pretty_assertions
crate and will produce a diff in case of failure
Combined binary support with text events Fix unblocked tests due to this change.
This commit creates an optional feature (disabled by default) to be able to store binary data in xml text.
227f3d6
to
429a15c
Compare
Thank you @dralley and @Mingun. Sorry for the late update. It looks like there was a PR that refactored DeEvent further, which invalidated the changes I made. I also hear the reservations for adding builtin handling for binary data in XML formats, which I don't disagree, it seems not straightforward to implement for various reasons, and should be tested and maintained. So in light of these 2 things, I want to rework how I handle binary data in XML, and maybe instead, open up a custom way to handle text events within quick-xml without being so intrusive and putting the maintenance burden on quick-xml for parsing binary data. I will update this PR with the changes in the near future I hope, since it will serve the same purpose. Let me know if this sounds reasonable. |
Yes, I think so. |
I think that perhaps the least invasive way might be to just have a backdoor way to "read X bytes" from the reader directly, thus bypassing XML parsing entirely for the data section - rather than trying to make the text event itself handle it. I doubt that can be made to work with the existing Serde implementation though. It does also still require a way to ensure that the reader does not try to bulk-decode / validate the data upfront as some particular encoding (say, utf-8), which would obviously fail. Or to not use this raw encoding method at all and instead use base64 as suggested by the docs, if that is an option. This embedded binary option seems to be opt-in only.
Obviously it's part of the format so that's not ideal, but whether it's acceptable maybe depends on how common such files are. |
I'm going to close this PR. If you want to continue the work that would be fine, so long as it doesn't run afoul of the mentioned concerns, but please file an issue first so we can discuss any proposed design before expending a bunch of work on it. |
In summary, this PR combines binary support with text events, to avoid copious code duplication. Also:
String
forVec<u8>
in a few places involving Text (not everywhere) to facilitate this functionality.Some versions ago
quick-xml
had exposed a more low level API, which allowed reading/writing bytes directly between xml tags. This was very useful for working with XML based VTK files and I really wish to have this functionality again inquick-xml
. This PR attempts to recreate this capability and is being tested in a new release for vtkio. Please let me know if there are any design changes needed here.