-
Notifications
You must be signed in to change notification settings - Fork 238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for embedded raw binary #623
Comments
It would be valuable, if you could link or attach some examples of files and provide links to the explanation, how VTK parser should understand that some byte is not a markup, but a part of a binary data. I found an archive with some files in elrnv/vtkio#27 and fight now I suspect the following algorithm for resolving ambiguity:
So I think we could work on this issue in a few steps:
|
Thank you for the quick response!
Here are a few examples:
There doesn't seem to be any documentation on how to parse this kind of binary section, just on how to decode it. The official vtk file format is documented here
That makes sense to me, and should be robust enough to support practically all such VTK files with embedded binary blobs. Although, I think ideally this logic is best left in
This makes sense to me, but for the writer, I am still in the dark as to how It is easiest for me to prototype a complete solution for reading and writing and test it directly using the tests I already have in |
This commit sadly breaks support for VTK files with binary AppendedData blobs. The code needed to support this is currently not functional and depends on the completion of this feature in quick-xml (tracked in tafia/quick-xml#623) All binary related code is now featured gated, which is a regression from previous version of vtkio, but will be re-enabled in the future.
Probably reading binary data is possible today, if you get inner reader when you've ready to read binary and read from it. The possible drawback is that internal offsets in the |
Would this also work from the Serde API? |
No, serde requires more work. Suggested approach lefts to the user the decision where to stop reading binary data and return to reading XML events. In serde API you should somehow to tell the Reader these things. That is subject for design. I do not plan to design such API in the near time, so you can try to do that yourself. |
Just to get some clarification, am I correct in my understanding that this would also allow proper handling of embedded DSLs or de facto embedded DSLs (eg. HTML's (I know that, given a restricted HTML subset that lacks those "embedded DSL elements" but contains self-closing tags like |
Yes, you understand correctly. You even don't need to disable |
Yes but, unless the cut content is length-prefixed, which embedded DSLs generally aren't, there'd need to be some way to "rewind" the stream to put the closing tag back into what the XML parser sees after the DSL parser has seen it. Hence, disabling |
While that may be needed in some cases, I think, that usually you shouldn't do that. |
Huh. You're right... and it looks like I'm not the only person who was confused by the naming and documentation of (Sorry for the delayed response. It took me a couple of days to realize that I was remembering a response I'd intended to send instead of one I'd actually sent.) |
There are use cases where embedding binary data into an XML document is convenient. Although not officially supported by the XML spec there are valid and popular uses that exist in the wild (e.g. VTK). Binary VTK files are quite common, in fact the tag (which can contain binary data) is the default when writing vtk files, and there are plenty of files like this in the wild.
In the past
quick-xml
was able to support binary data, but this has changed and in the current state requires some non-trivial changes for adding support for being able to write and read uninterpreted binary data.My first attempt didn't really address the issue correctly since binary data can have valid '<' and '>' tags, which can trip up the parser, meaning that at least reading must be augmented from the lowest level (probably as part of
Reader
).As discussed previously the implementation for this should work as an additional feature.
Reading
I think it is not difficult to add a customization point for the reader in
quick-xml
without being too invasive by adding a "context" field (supplied externally) to Reader that can call ask how many bytes should be read by feeding it context about what has already been read.Writing
All types in the serialize part of the code is bound by
fmt::Write
. I haven't found a way to maintain that bound everywhere and also support for writing binary files to an optional or customio::Write
type. To enable this I think we have to switch back toio::Write
. I understand that there are some performance concerns about having to write to a buffer ifio::Write
is used, but I don't see another way of adding support for binary writing withoutio::Write
. I can really use some help from the maintainers to address this point.The text was updated successfully, but these errors were encountered: