-
-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experimental support for BSON (de)serialization #1254
Conversation
…endian input to get rid of `binary_reader::get_number_little_endian`
@julian-becker Great work! Can you tell me when you think you're done so I could go through the code? |
@nlohmann I'm done so far with the consolidation of the code. Feedback and ideas for improvement welcome. |
switch (j.type()) | ||
{ | ||
default: | ||
JSON_THROW(type_error::create(317, "JSON value cannot be serialized to requested format")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the type of the value to the exception text.
@nlohmann Thanks a lot for your feedback! Sorry for my initial sloppiness regarding the documentation. I have improved it somewhat in my latest commit. EDIT: |
…of the BSON-output. This way, the output_adapter can work on simple output iterators and no longer requires random access iterators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @julian-becker, I really appreciate your work on BSON and I am looking forward to having this in the library. I hope you are not bugged out by my constant review remarks, but I spent so much effort into CBOR, MessagePack, and UBJSON that I know about the subtleties a binary format brings. Therefore, I would like to have as may things "fixed" early.
@nlohmann No worries, that's what a review is for and that's why it's useful. Your feedback is very welcome, and I will do my best to tidy things up. |
@nlohmann I will start by looking into some more thorough and intensive tests over the next couple of days ... After that, I could check the documentation. |
Alright - if you look at the CBOR test suite, you know what I mean. It would be great if some of the existing test files could be converted to BSON. And if a well-accepted BSON test suite exists, it would be great to have it in the project as well. I would still want to merge the branch once you have nothing more to add to the parser, because then Google's OSS Fuzz can already start searching for bugs. |
@nlohmann My ongoing efforts for more intense tests revealed one thing that may need addressing before merging: |
So BSON is does not support integers larger than |
As far as I understand the spec, there is only a signed int32 (type 0x10), and a signed int64 (type 0x12). There is an uint64 (type 0x11), however, this is to be interpreted explicitly as a timestamp, so is not really an option. |
…std::uint64_t` is serialized to BSON. Also, added a missing EOF-check to binary_reader.
also: keys are encoded as c-strings, with the effect that code point U+0000 cannot be serialized. This is not yet caught at the moment and will result in incorrect results if U+0000 is contained. I should probably adjust the error-handling to catch & report this. |
…ialized to BSON contains a U+0000
…re/bson Conflicts: include/nlohmann/detail/input/binary_reader.hpp single_include/nlohmann/json.hpp src/unit-bson.cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more comments.
…in the key-string as part of `out_of_range.409`-message
@@ -48,46 +48,53 @@ TEST_CASE("BSON") | |||
SECTION("null") | |||
{ | |||
json j = nullptr; | |||
REQUIRE_THROWS_AS(json::to_bson(j), json::type_error); | |||
REQUIRE_THROWS_AS(json::to_bson(j), json::type_error&); | |||
CHECK_THROWS_WITH(json::to_bson(j), "[json.exception.type_error.317] JSON value of type 0 cannot be serialized to requested format"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think "type 0" etc. is a helpful description for users of the library. Please use type_name()
to explicitly name the type (null, boolean, etc.).
Furthermore, "to requested format" should be replaced by "BSON" to be explicit.
I only had another small comment to the latest commit. Furthermore, it would be great if you could add a BSON translation to the files
with a |
@julian-becker I have some time to work on this this week. I branched your branch and make the changes there myself. |
I created branch https://github.com/nlohmann/json/compare/julian-becker-feature/bson and shall make more changes there. |
@nlohmann Sure, feel free to continue this work. Sorry that I have not found as much time as I had hoped to put into this. One thing I noticed though was that for large containers, things get rather slow... which I suspect is mostly due to the size-precomputation. Thinking about the present implementation some more, I came to realize that -- in the worst case -- it is of order O(n^2), for the case of a deeply nested object containing an object containing an object ... |
No worries - I thank you for your work! We may speed up the code by adding a cache like |
There may be a problem in the current code: I added the examples from http://bsonspec.org/faq.html as tests. The first example is OK, but the second example does not properly roundtrip: Parsing the BSON input, we get the correct JSON value. But serializing it again yields
(46 characters) However, the original input is
(49 characters) Apparently, our encoding takes fewer bytes, but I have not yet understood why. Do you have an idea? |
our representation assigns an empty It seems I have not read the notes on the array in the spec:
So we have to generate increasing integer names for the array elements. |
…o the array elements
Just pushed a fix for the array serialization. |
Thanks for the fix. As I already made a lot of changes in the https://github.com/nlohmann/json/compare/julian-becker-feature/bson branch, I see that I can take over the patch. Once I did that, I propose to open a new PR based on that branch and close this PR. |
This PR includes basic support for BSON (de)serialization, as proposed in #1244.
It includes support for BSON records and record-entries of the following types (c.f. bsonspec):
It presently does not include support for the following BSON entry types:
The tests for the supported conversions can be found in unit-bson.cpp.
Feedback and discussions welcome, in particular with respect to the parts of BSON which are presently not yet supported.
For the initial discussion of this change and easier review/smaller patch, I have restrained as much as possible from modifying existing code. In particular, I have restrained from refactorings in the
binary_reader
andbinary_writer
, which could be beneficial to improve separation of concerns (not mixing procedures targeting different file formats in a single class).Let me know what you think and I will happily adjust the code where needed/desired.