Use a different JSON library #47

benjaminjkraft · 2021-05-20T17:06:31Z

There are a few potential ideas here, which may coordinate or conflict.

First, any code that's using genqlient should be a great match for one of the libraries that does JSON via codegen rather than reflection, which are theoretically way better for perf. @StevenACoffman and @csilvers pointed me to easyjson, ffjson, go-codec, ~~jsonparser,~~ and gojay, and Steve posted some more details about them below; I'd also want to run our own benchmarks on actual genqlient-generated code.

On the other hand, right now we do a lot of work to try to do what we want within the encoding/json API, so it might instead be better to use a library that has a different set of extension points more suitable to our needs. Steve pointed me to mapstructure which is definitely aimed at what we are trying to do in the case of abstract types and custom unmarshalers, which are the most complex case on the decoding side. Sadly it looks like it has somewhat limited support for encoding (see mitchellh/mapstructure#166), and while it has more flexible support for embedding than the standard library, it will only simplify our unmarshalers (by making it easier to merge embedded structs), not eliminate them.

Switching to mapstructure would also preclude switching to one of the other code-gen libraries, which tend to be compatible with encoding/json. Some do add additional options, so we could explore that. (Of those linked above, go-codec seems to have the most extension points, and would likely have similar benefits to mapstructure, although like mapstructure its support for embedding is likely not flexible enough to solve all our problems. It also has support for other encodings, which could be useful for e.g. putting your responses in a cache, or if you use some strange transport for your GraphQL (not likely). Note that it's possible golang/go#5901 will add some of the features we'd like to the standard library, although the timeline on that is very unclear to me.)

One potential future need comes from #30; which for fields will require special handling in (un)marshaling to (un)marshal several layers of JSON structure into/from one layer of Go structure. I didn't see support from any of the libraries for that, although mapstructure might at least provide better tools for us to do it ourselves, or one could imagine any of the libraries might add it (essentially what we want is support for a tag json:"a.b.c"). Another potential future need is more options for how to handle omitempty; right now we match JSON because I think that's the least-bad default (see #103) but it's got some problems for sure. (And now we probably do want to try to keep that behavior, at least as the default, for better or worse, which could be tricky for libraries that have different behavior.)

Finally, as mentioned in #120, another option is to roll our own, making use of what we know about our types to simplify things. I suspect it's not worth it yet, but if our (un)marshalers start having to get even more complicated, it may be; we don't need to implement nearly all of the behavior of the general-purpose libraries since we know what we want our types to look like.

For any of the third-party codegen options, we do need to make sure this won't cause trouble with custom scalars: it might be a little surprising if you have to rerun genqlient because you changed a json tag in a custom scalar, although it's not a huge problem in practice I expect (especially since probably most custom scalars will have UnmarshalJSON methods). In any case we could also put this behind a flag. A first-party implementation could call out to encoding/json for custom scalars, of course.

The text was updated successfully, but these errors were encountered:

@csilvers

When genqlient generates output types, it generates whatever code is necessary to unmarshal them. Conversely, when it generates input types, it generates whatever code is necessary to marshal. This is all that's needed for genqlient itself: it never needs to marshal output types or unmarshal input types. But maybe you do! (For example, to put the responses in a cache, which is the use case that @csilvers hit at Khan, although there are others one can imagine.) While we can't support every serialization format you might want (at least not without adding plugins or some such), it's not unreasonable to expect that since genqlient can read JSON, it can write it too. Sadly, in the past this was not true for types requiring custom unmarshaling logic, for several reasons. In this commit I implement logic to always write both marshalers and unmarshalers whenever they're needed to be able to correctly round-trip the types, even though genqlient doesn't do so. I wasn't starting from scratch, since of course we already write both marshalers and unmarshalers in some cases. But this ended up requiring surprisingly large changes on the marshaling side, mostly to correctly support embedding (which we use for named fragments). Specifically, as the comments in `types.go` discuss, the most difficult issue is spreads with duplicate fields, which translate to Go embedded fields which end up hidden from the json-marshaler. Ultimately, I had to do things quite differently from unmarshaling, and essentially flatten the type when we write marshaler. But in the end it's not so ugly -- indeed arguably it's cleaner! Mainly it's just different. In general, I begin to wonder whether using `encoding/json` at all is really right for genqlient: we're doing a lot of work to appease it, despite knowing what our types look like. I think it would still be a significant increase in lines of code to roll our own, but that code would perhaps be simpler, and would surely be faster (although if we just want the speed gains we could use another JSON-generator library, see also #47). Anyway, something to think about in the future. Test plan: make tesc Reviewers: marksandstrom, steve, adam, miguel, mahtab, jvoll

benjaminjkraft · 2021-09-29T05:12:34Z

As mentioned in #120, another option is to roll our own, making use of what we know about our types. I suspect it's not worth it yet, but if our (un)marshalers start having to get even more complicated, it may be, since we end up doing a lot of work to get encoding/json to do what we want.

csilvers · 2021-09-29T11:54:44Z

go-codec also supports codegen for json (and other encoding formats as well) I believe, via codecgen.

StevenACoffman · 2021-09-29T14:25:38Z

ugorji/go-codec is worth adding to the list above. It is described here.
Supported Serialization formats are:

msgpack: https://github.com/msgpack/msgpack
binc: http://github.com/ugorji/binc
cbor: http://cbor.io http://tools.ietf.org/html/rfc7049
json: http://json.org http://tools.ietf.org/html/rfc7159

I hadn't mentioned it before, because I'm not sure that anything but json is relevant to us, and so it's possible part of ugorji/go popularity is only due to non-relevant formats. However, if it's well maintained, and has the right features, it's worth considering.

library	order by page rank	number of github stars
go-codec	145	1371
github.com/mailru/easyjson	185	3331
github.com/buger/jsonparser	338	4185
github.com/pquerna/ffjson	1281	2787
github.com/francoispqt/gojay	1335	1999

I like to use popularity as a proxy for package quality, but comparing other things like contribution rate is a little more work.

benjaminjkraft · 2021-09-29T16:36:42Z

I've updated the issue description to add the wider range of considerations we've added here, and some notes from a quick look at a few of the options.

@csilvers

## Summary: When genqlient generates output types, it generates whatever code is necessary to unmarshal them. Conversely, when it generates input types, it generates whatever code is necessary to marshal. This is all that's needed for genqlient itself: it never needs to marshal output types or unmarshal input types. But maybe you do! (For example, to put the responses in a cache, which is the use case that @csilvers hit at Khan, although there are others one can imagine.) While we can't support every serialization format you might want (at least not without adding plugins or some such), it's not unreasonable to expect that since genqlient can read JSON, it can write it too. Sadly, in the past this was not true for types requiring custom unmarshaling logic, for several reasons. In this commit I implement logic to always write both marshalers and unmarshalers whenever they're needed to be able to correctly round-trip the types, even though genqlient doesn't do so. I wasn't starting from scratch, since of course we already write both marshalers and unmarshalers in some cases. But this ended up requiring surprisingly large changes on the marshaling side, mostly to correctly support embedding (which we use for named fragments). Specifically, as the comments in `types.go` discuss, the most difficult issue is spreads with duplicate fields, which translate to Go embedded fields which end up hidden from the json-marshaler. Ultimately, I had to do things quite differently from unmarshaling, and essentially flatten the type when we write marshaler. But in the end it's not so ugly -- indeed arguably it's cleaner! Mainly it's just different. One thing to note is that we do marshal `__typename` based on what we know about the types; users need not fill it in (and if they do we'll ignore it). This seemed to me to be a better UX, and didn't add much complexity. In general, I begin to wonder whether using `encoding/json` at all is really right for genqlient: we're doing a lot of work to appease it, despite knowing what our types look like. I think it would still be a significant increase in lines of code to roll our own, but that code would perhaps be simpler, and would surely be faster (although if we just want the speed gains we could use another JSON-generator library, see also #47). Anyway, something to think about in the future. ## Test plan: make tesc Author: benjaminjkraft Reviewers: csilvers, StevenACoffman, benjaminjkraft, dnerdy, aberkan, jvoll, mahtabsabet, MiguelCastillo Required Reviewers: Approved By: StevenACoffman, dnerdy Checks: ✅ Test (1.17), ✅ Test (1.16), ✅ Test (1.15), ✅ Test (1.14), ✅ Lint, ✅ Test (1.17), ✅ Test (1.16), ✅ Test (1.15), ✅ Test (1.14), ✅ Lint Pull Request URL: #120

benjaminjkraft · 2024-02-18T02:41:45Z

To do: take a look at how encoding/json/v2 might change the calculus here, and whether we have any suggestions for it.

x448 · 2024-02-25T23:03:11Z

encoding/json/v2 has many improvements, such as omitempty working like it does in fxamacker/cbor (a CBOR codec that supports RFC 8949 and RFC 8742).

It's unfortunate that encoding/json doesn't support toarray struct tag and JSON doesn't support efficient binary encoding, which are useful for improving speed, memory use, and reducing encoded size.

If using CBOR becomes an option for you then also see Extended Diagnostic Notation (it represents binary CBOR data as JSON with extensions).

DISCLAIMER: I contribute docs & ci to fxamacker/cbor. It's secure, fast, memory efficient, and has deterministic encoding.

benjaminjkraft · 2024-03-05T06:34:25Z

@x448 it sounds like the library you describe implements some non-JSON encoding, no? We have to use JSON because it's what GraphQL servers use.

benjaminjkraft · 2024-08-13T00:35:40Z

Quick notes on reading the encoding/json/v2 proposal discussion:

if we want to go "roll our own", the new jsontext is for exactly that kind of thing.
the change to make json omitempty omit structs that would map to empty JSON objects will be painful; we'll have to decide whether to change behavior as well (and whether to switch to omitzero, support both, etc.).
the new [Un]Marshalers is interesting, in principle it might let us avoid generating quite so many marshal methods, although it may also be that in our context it's easier to continue doing so.
the benchmarks look to be comparable to the codegen-based libraries, although maybe v2 versions of those will be even faster.
Anyway, since that's progressing it may be best to just wait until it stabilizes; we can't build on v2 just yet.

benjaminjkraft added enhancement New feature or request help wanted Issues that anyone could pick up and implement if useful to them labels Sep 11, 2021

benjaminjkraft mentioned this issue Sep 29, 2021

Allow genqlient types to be marshaled safely #120

Merged

benjaminjkraft changed the title ~~Use a codegen-based JSON library~~ Use a different JSON library Sep 29, 2021

benjaminjkraft mentioned this issue May 12, 2022

None type passed instead of Empty Array #193

Closed

benjaminjkraft added this to the v1.0 milestone Aug 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a different JSON library #47

Use a different JSON library #47

benjaminjkraft commented May 20, 2021 •

edited

Loading

benjaminjkraft commented Sep 29, 2021

csilvers commented Sep 29, 2021

StevenACoffman commented Sep 29, 2021 •

edited

Loading

benjaminjkraft commented Sep 29, 2021

benjaminjkraft commented Feb 18, 2024

x448 commented Feb 25, 2024 •

edited

Loading

benjaminjkraft commented Mar 5, 2024

benjaminjkraft commented Aug 13, 2024

Use a different JSON library #47

Use a different JSON library #47

Comments

benjaminjkraft commented May 20, 2021 • edited Loading

benjaminjkraft commented Sep 29, 2021

csilvers commented Sep 29, 2021

StevenACoffman commented Sep 29, 2021 • edited Loading

benjaminjkraft commented Sep 29, 2021

benjaminjkraft commented Feb 18, 2024

x448 commented Feb 25, 2024 • edited Loading

benjaminjkraft commented Mar 5, 2024

benjaminjkraft commented Aug 13, 2024

benjaminjkraft commented May 20, 2021 •

edited

Loading

StevenACoffman commented Sep 29, 2021 •

edited

Loading

x448 commented Feb 25, 2024 •

edited

Loading