Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JsonEncoder produces invalid JSON #309

Closed
fsteeg opened this issue Sep 24, 2019 · 11 comments
Closed

JsonEncoder produces invalid JSON #309

fsteeg opened this issue Sep 24, 2019 · 11 comments
Assignees
Labels

Comments

@fsteeg
Copy link
Member

fsteeg commented Sep 24, 2019

The JsonEncoder (encode-json) produces invalid JSON for multiple entries. Each entry is a valid JSON object, but they are simply serialized one after the other, not wrapped into a JSON array.

@fsteeg fsteeg added the Bug label Sep 24, 2019
@blackwinter
Copy link
Member

Duplicate of #152?

@dr0i
Copy link
Member

dr0i commented Oct 25, 2019

As @blackwinter noted:
does using 'encode-json | write("stdout", header="[ ", footer=" ]", separator=",\n")' fixes the problem @fsteeg ?

@fsteeg
Copy link
Member Author

fsteeg commented Oct 25, 2019

Hm, I don't think the argument from #152 is (still) true.

First, the current behavior is not consistent, since encode-xml does produce valid XML for multiple records, while encode-json does not. See http://test.lobid.org/fix, click Load sample, Process, then change Flux to encode-json, Process again. Here are direct links for the produced XML and JSON.

Second, I don't get the argument from #152 that the encoder should (only) produce valid JSON for single records so they can be embedded. Why not produce singe JSON objects for single records, and JSON arrays for multiple records? That would not change the current behavior for single records.

@fsteeg fsteeg assigned dr0i and unassigned fsteeg Oct 25, 2019
@blackwinter
Copy link
Member

Why not produce singe JSON objects for single records, and JSON arrays for multiple records? That would not change the current behavior for single records.

I'd consider producing JSON Lines output a valid use case (in fact, that's what we rely on; and so do you, if I'm not mistaken). This presents a compatibility issue.

As for the consistency with SimpleXmlEncoder, I agree. @cboehme's objection that it would pass incomplete/invalid serializations is already true for XML; either that should be "fixed" (which probably nobody wants) or the JSON encoder should be allowed the same discretion (only with reversed defaults to remain compatible).

@dr0i dr0i assigned fsteeg and unassigned dr0i Mar 16, 2020
@fsteeg
Copy link
Member Author

fsteeg commented Apr 17, 2020

[...] that it would pass incomplete/invalid serializations is already true for XML; either that should be "fixed" (which probably nobody wants) or the JSON encoder should be allowed the same discretion (only with reversed defaults to remain compatible).

OK, so that would mean we add a boolean writeJsonArray = false option to JsonEncoder?

@blackwinter
Copy link
Member

If you're asking me, then yes, that would be my preference. Maybe even invert the logic here: boolean writeJsonLines = true to emphasize that it outputs JSON Lines format by default; then allow the user to call setJsonLines(false) to force a JSON array instead (the array merely being the default representation or an implementation detail, not particularly noteworthy).

@fsteeg
Copy link
Member Author

fsteeg commented Apr 17, 2020

Maybe even invert the logic here: boolean writeJsonLines = true to emphasize that it outputs JSON Lines format by default; then allow the user to call setJsonLines(false) to force a JSON array instead (the array merely being the default representation or an implementation detail, not particularly noteworthy).

Sounds good to me. We should then consider the behavior of the prettyPrinting = true option, which makes no sense with a new default writeJsonLines = true. Processing should probably abort with an error if both are true.

@blackwinter
Copy link
Member

Well, not so fast ;) I mean, you certainly have a point, but we actually do use pretty printing with the current output format (example). So maybe we shouldn't call it JSON Lines after all...

@fsteeg
Copy link
Member Author

fsteeg commented Apr 20, 2020

So maybe boolean writeJsonArray = false is the better option?

@blackwinter
Copy link
Member

Yeah, it probably is... Sorry for the noise.

@fsteeg
Copy link
Member Author

fsteeg commented Apr 20, 2020

No, it was a good idea, thanks for contributing. In particular it made me aware that by default, the JSON encoder actually writes JSON lines for multiple records, which makes a lot of sense. I'm now thinking that your original suggestion – to use write("stdout", header="[ ", footer=" ]", separator=",\n") after encode-json – is actually the best approach. We' won't be totally consistent with encode-xml anyway (due to to reversed defaults), and it does make sense to view the serialisation of individual JSON documents as an implementation detail of the writer (even if that serialization is itself JSON again). Closing this, if anyone objects feel free to comment or reopen.

@fsteeg fsteeg closed this as completed Apr 20, 2020
blackwinter pushed a commit that referenced this issue Dec 13, 2024
The are a copy of the current implementation of set_array and set_hash. This is a preparation for #309
blackwinter pushed a commit that referenced this issue Dec 13, 2024
As a preliminary work for #309 since set_array will change its behaviour but add_array will keep the functionality.
blackwinter pushed a commit that referenced this issue Dec 13, 2024
As a preliminary work for #309 since set_array will change its behaviour but add_array will keep the functionality.
blackwinter pushed a commit that referenced this issue Dec 13, 2024
…374

Additional preliminary work for #309 when the set-fixes change their functionality.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants