encoding/json: the Decoder.Decode API lends itself to misuse #36225

dsnet · 2019-12-19T22:26:12Z

I'm observing the existence of several production servers that are buggy because the json.Decoder.Decode API lends itself to misuse.

Consider the following:

r := strings.NewReader("{} bad data")

var m map[string]interface{}
d := json.NewDecoder(r)
if err := d.Decode(&m); err != nil {
	panic(err) // not triggered
}

json.NewDecoder is often used because the user has an io.Reader on hand or wants to configure some of the options on json.Decoder. However, the common case is that the user only wants to decode a single JSON value. As it stands the API does not make the common case easy since Decode is designed with the assumption that the user will continue to decode more JSON values, which is rarely the case.

The code above executes just fine without reporting an error and silently allows the decoder to silently accept bad input without reporting any problems.

The text was updated successfully, but these errors were encountered:

OneOfOne · 2019-12-19T22:47:05Z

I don't think that's a bug, or misuse, I actually depend on it in a project where there's a json header then there's other data.

This change would break a lot of working code.

bcmills · 2019-12-20T02:09:36Z

The only example for the Decoder type calls Decode in a loop. Perhaps there should be a lint or vet check that either Decode is read until a non-nil error or its Buffered method is called?

The call to the Buffered method seems like a reliable indication of the “header before other data” case.

bcmills · 2019-12-20T02:10:13Z

A vet check in particular, if it is feasible, seems like it could address this problem without an API change.

dsnet · 2019-12-20T03:05:40Z

@OneOfOne, I'm not proposing that we change the json.Decoder.Decode behavior. In and of itself, the method's current behavior is entirely reasonable. However, there's fairly compelling evidence that there is a misuse problem, though. A sampling of json.Decoder.Decode usages at Google shows that a large majority are using it in a way that actually expects no subsequent JSON values.

From my observation, the source of misuse is due to:

many users having an io.Reader on hand (especially from a http.Request.Body), and the lack of API in the json package to unmarshal a single JSON value from a io.Reader. As a result, they turn to json.Decoder, but don't realize that it's intended for reading multiple JSON values back-to-back.
some users wanting to configure the optional arguments (e.g., DisallowUnknownFields or UseNumber) and unfortunately those options are only hung off json.Decoder.

The first issue could be alleviated by adding:

func UnmarshalReader(r io.Reader, v interface{}) error

where UnmarshalReader consumes the entire input and guarantees that it unmarshals exactly one JSON value.

The second issue could be alleviated by having an options pattern that doesn't require the creation of a streaming json.Decoder. In the v2 protobuf API, we use an options struct with methods hanging off of it.

@bcmills, a vet check is interesting, but I'm not sure calling Decoder.Buffer is the right approach. Rather, I think you need to get the next token and check that it returns io.EOF:

d := json.NewDecoder(r)
if err := d.Decode(&v); err != nil {
	panic(err) // not triggered
}
if _, err := d.Token(); err != io.EOF {
	panic("some unused trailing data")
}

bcmills · 2019-12-20T03:50:11Z

I'm not sure calling Decoder.Buffer is the right approach. Rather, I think you need to get the next token and check that it returns io.EOF

That would work too. So that gives three patterns that should suppress the warning:

Call Decode until it does not return nil.
Call Token until it does not return nil.
Call Buffered.

(1) and (2) suppress the warning if the user has read enough of the input to detect either EOF or an earlier syntax error. (3) suppresses the warning if a non-JSON blob follows the JSON value.

josharian · 2019-12-20T04:07:34Z

The first issue could be alleviated by adding UnmarshalReader

Or a Decoder option to mark it as being single-object only.

That way fixing existing instances of this problem (of which there are many, including all over my own code) involves just adding a line, instead of refactoring.

cespare · 2019-12-20T08:33:36Z

Or a Decoder option to mark it as being single-object only.

That way fixing existing instances of this problem (of which there are many, including all over my own code) involves just adding a line, instead of refactoring.

I like how concise using a decoder often is:

if err := json.NewDecoder(r).Decode(&t); err != nil {

It would be nice if the fix didn't require breaking up the line. I like how @dsnet's UnmarshalReader suggestion offers something similarly concise:

if err := json.UnmarshalReader(r, &t); err != nil {

We could also use an alternative Decode method:

if err := json.NewDecoder(r).DecodeOne(&t); err != nil {

mvdan · 2019-12-20T23:42:09Z

As a result, they turn to json.Decoder, but don't realize that it's intended for reading multiple JSON values back-to-back.

To give further evidence that this is a common issue, I'm one of the encoding/json owners and I've now just realised I've been making this mistake for years.

I'm personally in favour of UnmarshalReader. If anyone wants to be in control of the decoder or any options, they can copy the contents of UnmarshalReader and adapt it as needed.

I think a vet check would also be useful, because many existing users might not realise that this common json decoding pattern is buggy. Coupled with UnmarshalReader, the users would just need to slightly modify one line. A vet check that required the user to refactor the code and add a few lines of non-trivial code seems unfortunate to me, so I generally agree with @cespare.

jimmyfrasche · 2019-12-20T23:55:28Z

Maybe a method like Done() error that reads the rest of the input and errors if there's something other than whitespace? That could also be used for reading multiple documents from the stream and quitting once you get to one with a certain value.

networkimprov · 2019-12-20T23:59:20Z

Shouldn't this be a concrete proposal (with possible variations)? The issue description doesn't seem actionable.

mvdan · 2019-12-21T00:00:47Z

@networkimprov usually, a proposal brings forward a specific solution to be implemented. It seems like the main point of this issue was to bring up the problem, so the proposal could come later if needed.

networkimprov · 2019-12-21T01:07:12Z

Time and again, I've seen issues closed and further discussion directed to golang-nuts/dev because the item isn't actionable.

leafrhythms · 2019-12-21T15:45:10Z

I agree with a new method. It re-use options for json.Decoder, and is very easy to adopt. But instead of DecodeOne, I would suggest it DecodeWhole.

TennyZhuang · 2019-12-22T03:27:56Z

I suggest the name DecodeFull, it is more consistent with ReadFull.

beono · 2019-12-27T11:12:54Z

I agree that DecodeFull makes it more clear, that it decodes the the entire stream.
Whereas DecodeOne sounds like it does the same thing as Decode – decodes only one json value per call.

mark-kubacki · 2019-12-27T11:16:10Z

Remove this entirely, don't let close-to–duplicate functionality lying around. Especially one that invites misuse.

zikaeroh · 2019-12-27T23:21:58Z

I don't want to bikeshed names too hard, but my suggestion would be DecodeSingle. I find this a bit better than "full" where my first thought is that it's trying to decode the "full reader" to completion.

(Also, this is a function name in C#'s LINQ, where it signifies that an IEnumerable contains a single thing, and throws if not, which feels similar.)

dsnet · 2019-12-31T22:26:52Z

Removing Go2 label and adding NeedsDecision instead. When I first filed this issue, it was more so to point out an API wart that we should avoid repeating if there were ever a hypothetical v2 of encoding/json. However, it seems that this discussion has gone in the direction where it might make sense to do something about this today.

Let's avoid bikeshedding on names; it brings unnecessary noise to the discussion. Bikeshedding can occur in future CLs that implement this assuming we should do anything about this.

To summarize the discussion so far, there seems to be 3 (non-mutually exclusive) approaches:

Add a top-level function that takes in an io.Reader as an input and consume it entirely. This approach is provides a minor convenience benefit over approach 2.
Add a method to Decoder that consume the entire io.Reader. This approach is most flexible as it permits options to be specified and provides an obvious fix to existing misuses.
Add a vet check as suggested by @bcmills in encoding/json: the Decoder.Decode API lends itself to misuse #36225 (comment)

An issue with approaches 1 or 2 is that it assumes that io.EOF will always be hit by the io.Reader. For the case of an http.Request.Body, I suspect that this may possibly be an issue if the client keeps the connection open, in which case, an io.EOF will never be observed and the server will deadlock trying to read all input from the client, but the client won't end the close the request since it's waiting for a response from the server.

Personally, I like approach 2 and 3.

networkimprov · 2019-12-31T23:26:44Z

I think the io.EOF issue is a showstopper, unless a new function/method takes a size argument, which could be zero or -1 for read-to-eof.

Maybe add the list of solutions to the issue text, and retitle it "proposal: ..." ? I think that places it on the proposal review committee's agenda...

josharian · 2020-01-01T00:05:56Z

An issue with approaches 1 or 2 is that it assumes that io.EOF will always be hit by the io.Reader.

If the code doing the decoding knows the length of the data, then they could use an io.LimitedReader to work around this.

I suspect, though, that the more common case is that the code doesn't know, and doesn't care: It just wants to decode until the first json object is done. If it so happens that you can detect that there's extra (non-whitespace?) data, because it came along with a read you issued anyway, then sure, return an error. But I think most users would be perfectly happy to be a bit sloppy here and have only best-effort "extra data" error detection, as evidenced by the multitude of us happily, obliviously using json.Decoder for years and years.

So I guess I would advocate "fixing" this problem by defining it away.

FWIW, I am also a fan of approaches 2 and 3.

bcmills · 2020-02-19T14:14:04Z

@dsnet, I notice that this issue is currently without a milestone. Given #36225 (comment), do you want to turn it into a proposal?

NewDecoder(r).Decode() can lead to bugs, because it just reads from a reader until the end of the serialised object, but it doesn't have to deplete the reader. If you have multiple JSON objects in this reader, you will miss out. golang/go#36225

earthboundkid · 2020-10-16T16:14:05Z

2. Add a method to Decoder that consume the entire io.Reader. This approach is most flexible as it permits options to be specified and provides an obvious fix to existing misuses.

A variation on this would be to add a configuration method along the line of DisallowUnknownFields(), with a name like ConsumeWholeReader().

FeldrinH · 2022-08-24T13:06:56Z

Has there been any progress on this issue?

As further evidence of how widespread the problem is:
A quick look at the json binding logic of Gin (https://github.com/gin-gonic/gin/blob/master/binding/json.go) and Echo (https://github.com/labstack/echo/blob/master/json.go) seems to indicate that both of those major libraries allow trailing garbage after json data in request body because of this pitfall.

johnmaguire · 2022-10-03T16:53:17Z

Here's another example, from go-chi/render: https://github.com/go-chi/render/blob/master/decoder.go#L44

dpw · 2023-01-26T12:26:27Z

Hi @bcmills . A couple of years ago in #36225 (comment) you suggested that the next step for this issue is for someone to distill a proposal out of the discussion. That doesn't seem to have happened, but I am willing to do it. Would it be best to create a new issue with that proposal, or add it in a comment on this one?

mvdan · 2023-01-26T12:37:32Z

@dsnet, myself, and others have been working on a new API for encoding/json for some time - we'll share soon. This issue is fixed in the new design, for example, so I'd recommend not spending time on a proposal for now :)

dpw · 2023-01-26T14:55:43Z

@mvdan I'm in no way opposed to a rethink of the API encoding/json (I think it's obvious to anyone who uses it extensively that it is old and has gathered some dust). But I wonder if making this issue dependent on such an effort may be a case of the perfect being the enemy of the good. How confident is your group that an implementation of your design will be incorporated into the go standard library in a timely fashion?

An advantage of addressing this issue with a new method on Decoder is that it not only solves the nuisance for authors of new code in a modest way, but that it will also make it easy to fix code that currently uses Decode (its behaviour for trailing data seems to be unwelcome and unintended for the vast majority of its uses). The simpler the fix is to adopt, the more existing code will get fixed.

earthboundkid · 2023-01-26T14:58:45Z

You can import the JSON experiment in your packages now if you want: https://github.com/go-json-experiment/json

mvdan · 2023-01-26T15:02:47Z

@dpw I am of course not opposed to smaller overlapping proposals, but we (the maintainers) also have to decide where to spend the time we have. My personal opinion for encoding/json is that small incremental steps aren't the best path forward.

dpw · 2023-01-27T20:27:32Z

@mvdan I appreciate that proposals and other contributions do not review themselves, and maintainers have to decide how to invest their time. But on #56733 I see ongoing discussion, involving go maintainers, about another proposal to add a method to Decoder. That suggests that some maintainers consider at least some proposals to be worth considering.

dsnet · 2023-01-27T20:38:49Z

I hold to a sentiment similar to @mvdan that I'd rather invest most of my efforts on what's to come rather than small incremental steps on what already is, especially if what's to come resolves many issues beyond just this one. You'll notice that most of my arguments in #56733 are from the background of what's best for JSON in the long run rather than simply what's possible to implement today in the current API. That is, I needed to be mentally confident that it could be implemented in both the current and new API with the same semantics.

Even if a proposal is accepted, it doesn't mean that it will be implemented immediately. For example, #48298 has been accepted for over a year but not implemented yet in the current API. However, it is already implemented in the new API that @mvdan alluded to.

Neither @mvdan nor I are paid to work on Go, but are simply freelance contributors. Unfortunately, we have to be selective about what we spend our attention towards.

mitar · 2023-07-25T11:02:14Z

I also came here because I just wanted to use DisallowUnknownFields and I got surprised that extra data is silently ignored. If anyone needs it, this is now available in my x.UnmarshalWithoutUnknownFields function:

func UnmarshalWithoutUnknownFields(data []byte, v interface{}) errors.E {
	decoder := json.NewDecoder(bytes.NewReader(data))
	decoder.DisallowUnknownFields()
	err := decoder.Decode(v)
	if err != nil {
		return errors.WithStack(err)
	}
	_, err = decoder.Token()
	if err == nil || !errors.Is(err, io.EOF) {
		return errors.New("invalid data after top-level value")
	}
	return nil
}

I think just having an extra example next to the DisallowUnknownFields documentation would probably help many. We do not need to find a perfect solution here, I think just an example to even hint at the issue would be useful.

dsnet · 2023-10-06T06:03:33Z

Hi all, we kicked off a discussion for a possible "encoding/json/v2" package that addresses the spirit of this proposal.
In the proposed v2 package, there is a UnmarshalRead method that unmarshals directly from an io.Reader, verifying that it encounters io.EOF before reporting success.

1f604 · 2023-10-26T09:10:21Z

I also came here because I just wanted to use DisallowUnknownFields and I got surprised that extra data is silently ignored. If anyone needs it, this is now available in my x.UnmarshalWithoutUnknownFields function:
func UnmarshalWithoutUnknownFields(data []byte, v interface{}) errors.E {
	decoder := json.NewDecoder(bytes.NewReader(data))
	decoder.DisallowUnknownFields()
	err := decoder.Decode(v)
	if err != nil {
		return errors.WithStack(err)
	}
	_, err = decoder.Token()
	if err == nil || !errors.Is(err, io.EOF) {
		return errors.New("invalid data after top-level value")
	}
	return nil
}
I think just having an extra example next to the DisallowUnknownFields documentation would probably help many. We do not need to find a perfect solution here, I think just an example to even hint at the issue would be useful.

Why is the err == nil required? Can't you just do if !errors.Is(err, io.EOF)?

mitar · 2023-10-26T09:21:13Z

Can't you just do if !errors.Is(err, io.EOF)?

Yes, you can. It is just a bit more readable to me this way.

3052 · 2023-11-21T05:39:22Z

As it stands the API does not make the common case easy since Decode is designed with the assumption that the user will continue to decode more JSON values, which is rarely the case.

I wouldn't say rarely, for example, Facebook and others sometimes use JSON records, which the current API works well for:

package main

import (
   "encoding/json"
   "fmt"
   "strings"
)

const s = `
{"month":1,"day":2}
{"month":3,"day":4}
`

func main() {
   d := json.NewDecoder(strings.NewReader(s))
   for {
      var date struct { Month, Day int }
      if d.Decode(&date) != nil {
         break
      }
      fmt.Printf("%+v\n", date)
   }
}

dsnet added the v2 An incompatible library change label Dec 19, 2019

dsnet changed the title ~~encoding/json: the Decoder .Decode API lends itself to misuse~~ encoding/json: the Decoder.Decode API lends itself to misuse Dec 20, 2019

dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Dec 21, 2019

dmitshur added this to the Backlog milestone Dec 21, 2019

dmitshur removed the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Dec 21, 2019

dmitshur removed this from the Backlog milestone Dec 21, 2019

dsnet added NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. and removed v2 An incompatible library change labels Dec 31, 2019

dsnet mentioned this issue Sep 10, 2020

encoding/{protojson,prototext}: add MarshalWriter/UnmarshalReader functionality golang/protobuf#1203

Open

fraenkel mentioned this issue Oct 10, 2020

json.NewDecoder Decode not return err when input invalid #41899

Closed

mvdan mentioned this issue Oct 16, 2020

encoding/json: add doc note that Decoder is not more efficient for single values #42016

Closed

josharian mentioned this issue Jan 3, 2021

encoding/json: Decode does not error on invalid leading zeros #43474

Closed

refs mentioned this issue May 27, 2021

improve wopiserver owncloud/ocis-wopiserver#5

Merged

mvdan mentioned this issue Oct 26, 2021

http-transport legs filecoin-project/go-legs#23

Merged

roperzh mentioned this issue Aug 23, 2022

add a mechanism to deprecate AppConfig settings w/ backwards-compat fleetdm/fleet#7353

Merged

2 tasks

seankhliao added this to the Unplanned milestone Aug 27, 2022

mvdan mentioned this issue Oct 3, 2022

encoding/json: Encoder.Encode newlines can cause issues with Go HTTP clients #56012

Closed

earthboundkid mentioned this issue Oct 3, 2022

DecodeJSON accepts "{}\n some garbage data" incorrectly go-chi/render#42

Open

This was referenced Nov 30, 2023

Library uses json.NewDecoder(r).Decode which might not error on trailing extra data fxamacker/webauthn#21

Open

Library uses json.NewDecoder(body).Decode which might not error on trailing extra data go-webauthn/webauthn#189

Closed

mbaxter mentioned this issue May 14, 2024

cannon: Make jsonutil error when loading json payload with extra data ethereum-optimism/optimism#10542

Merged

encoding/json: the Decoder.Decode API lends itself to misuse #36225

encoding/json: the Decoder.Decode API lends itself to misuse #36225

Comments

dsnet commented Dec 19, 2019 • edited Loading

OneOfOne commented Dec 19, 2019

bcmills commented Dec 20, 2019

bcmills commented Dec 20, 2019

dsnet commented Dec 20, 2019 • edited Loading

bcmills commented Dec 20, 2019 • edited Loading

josharian commented Dec 20, 2019

cespare commented Dec 20, 2019

mvdan commented Dec 20, 2019

jimmyfrasche commented Dec 20, 2019

networkimprov commented Dec 20, 2019

mvdan commented Dec 21, 2019

networkimprov commented Dec 21, 2019

leafrhythms commented Dec 21, 2019

TennyZhuang commented Dec 22, 2019

beono commented Dec 27, 2019

mark-kubacki commented Dec 27, 2019

zikaeroh commented Dec 27, 2019

dsnet commented Dec 31, 2019 • edited Loading

networkimprov commented Dec 31, 2019

josharian commented Jan 1, 2020

bcmills commented Feb 19, 2020 • edited Loading

earthboundkid commented Oct 16, 2020

FeldrinH commented Aug 24, 2022

johnmaguire commented Oct 3, 2022

dpw commented Jan 26, 2023

mvdan commented Jan 26, 2023

dpw commented Jan 26, 2023

earthboundkid commented Jan 26, 2023

mvdan commented Jan 26, 2023

dpw commented Jan 27, 2023

dsnet commented Jan 27, 2023

mitar commented Jul 25, 2023

dsnet commented Oct 6, 2023

1f604 commented Oct 26, 2023

mitar commented Oct 26, 2023 • edited Loading

3052 commented Nov 21, 2023 • edited Loading

dsnet commented Dec 19, 2019 •

edited

Loading

dsnet commented Dec 20, 2019 •

edited

Loading

bcmills commented Dec 20, 2019 •

edited

Loading

dsnet commented Dec 31, 2019 •

edited

Loading

bcmills commented Feb 19, 2020 •

edited

Loading

mitar commented Oct 26, 2023 •

edited

Loading

3052 commented Nov 21, 2023 •

edited

Loading