-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
encoding/json: add (*Decoder).SetLimit #56733
Comments
Should the bytes that are already in the decoder's buffer count against this limit on subsequent calls to |
The feature seems reasonable in general, but it's unclear whether the limit should be value-based or token-based. Value-based limits make sense when calling As a work-around, you could reset the limit on an |
@dsnet The annoying thing with LimitedReader is that as you pointed out you need to reset it every time. |
The goal is to help limit how much memory one single message is allowed to keep alive to help prevent memory exhaustion attacks, |
I think when the limit is reached, This mean if you have 2 messages, the first one is too big but not the second one, decode would error the first time, however if you call it again it would succesfully read the second message. |
This proposal has been added to the active column of the proposals project |
If you call SetValueLimit and then only call Token, is there just no limit? Using an io.LimitedReader where you reset the limit is also a possibility, of course, Maybe part of the problem is when the reset happens. What if there was just a SetInputLimit that applied to all future reads, and you have to reset it yourself when you want to allow more data? |
@rolandshoemaker |
I think in cases where you are directly calling the Decoder, the usage of a io.LimitedReader that you reset yourself is reasonable, but it doesn't address the case where you pass a Decoder to something else which calls Token or Decode, where you may not be able to determine when you need to reset your readers limit. I've paged out the context about how the parser consumes bytes, but it seems like generally the limit could apply to both values and tokens equally, such that it is essentially only limiting the number of bytes being read for the reader for a particular parsing step. I may be misunderstanding though, @dsnet were you saying that it may make more sense to have different limits for values vs. tokens, because tokens are inherently smaller and as such you may want to have a more constrained limit? |
I'm not sure I understand this use case. Why would the called function need to check whether a limit exists or be responsible for resetting the readers? That seems to be the responsibility of the caller, not the callee:
Possibly yes. There are applications that parse a stream of JSON, where the stream is massive. In such an application, you would want to ensure no token exceeds a certain size. Technically, this can be achieved by constantly resetting the per-Value limit right before reading every single token, but that's expensive and probably a frustrating to use API. It would be preferable to specify something like Also, with value-specific limiting, it's unclear whether limits can be nested. For example:
Also, what's the behavior of this functionality with regard to JSON whitespace? Are whitespace counted towards or against the limit? Usually, the purpose of a limit is to ensure some reasonable bounds on memory usage. While whitespace needs to be parsed, the implementation could be done in a way that it only takes O(1) memory to skip N bytes of whitespace. |
I talked to @rolandshoemaker about this. The API we think makes sense is
Decode and Token would both reset the limit when they are called, and then the input reader would enforce the limit as it read more input. Right now Token happens to call Decode as somewhat confusing way to get at the string and number parsers. We would change Token to call an unexported decode that is Decode without the limit reset. Except for that one case, Decode and Token are not called from inside the json package as far as I can see. The semantics would be that
is exactly the same as
except for the errors being returned when the limit is reached. It's true that users can construct this themselves with io.LimitedReader today, but it is a pain, and JSON decoding is an out-of-memory footgun that we should try to make easier to protect against. Adding one call to dec.Limit to your program is much easier than working out how to use io.LimitedReader correctly. The goal is simply to have a big, effective, easily understood hammer to allow limiting the memory impact of a given Decoder method call. In a JSON v2 that streamed the input instead, the limit would still apply, to disallow things like decoding an infinite stream [[],[],[],[],[],[],[],[],... into an any and taking up an arbitrary amount of memory. It's true that the hammer may prevent processing certain inputs that wouldn't or don't take an arbitrary amount of memory, such as inputs containing gigabytes of spaces. That's fine with me. It's a hammer not a scalpel, and users who care about time spent processing the input may want to disallow such inputs anyway. |
Any thoughts on or objections to the previous comment's API? @dsnet? |
The proposed API assumes that Suppose we wanted to introduce the following API: type Unmarshaler2 interface {
UnmarshalJSON2(*json.Decoder) error
} where types that implement In this situation, the implementation of func (d *Decoder) Decode(p any) error {
if d.topLevel {
d.topLevel = false
defer func() {
d.resetLimit()
d.topLevel = true
}()
}
...
} The difference in semantics between top-level and recursive calls makes me feel a little uneasy. |
I think the fundamental problem is that a memory limit is a property of a particular Ideally, we would have something like: func UnmarshalReader(r io.Reader, p any, o UnmarshalOptions) error where |
I thought more about this while washing up... I propose instead: // Limit sets a limit on the number of bytes of input that may be consumed
// for each top-level JSON value.
// After Limit(n), if a Decode or Token call would need to consume more than
// a total of n bytes to make progress on the current top-level JSON value,
// the call will instead report an error.
// Calling Limit(n) when the Decoder has already begun parsing a top-level JSON value
// sets the limit for the remainder of the current value.
// Subsequent top-level JSON values are subject to the entirety of the limit.
// Calling Limit(-1) corresponds to not having a limit, which is the default for a new Decoder.
func (dec *Decoder) Limit(n int64) Properties:
I like that it operates as a hammer when you want to limit on a per-top-level value basis (which is by far the most common use case), but can be used as a scalpel when you want to limit on a per-token basis (which is only for rare and more advanced use cases). |
Thanks for the analysis. Once we start thinking about recursive cases, it's not hard to imagine wanting to know what the current limit is before starting some expensive operation. So perhaps we should call this method |
SetLimit sounds fine for now. I'm not sure whether we'll need Limit but leaving the door open sounds fine. |
Have all concerns about this proposal been addressed? |
Based on the discussion above, this proposal seems like a likely accept. |
No change in consensus, so accepted. 🎉 |
Change https://go.dev/cl/471875 mentions this issue: |
I propose renaming this as In #58786, @bigpigeon mentioned needing a maximum depth limit, which is a reasonable feature to potentially add. |
Typically, the advice to avoid reading excessively large values into memory when passing an untrusted io.Reader is to wrap the reader in a io.LimitedReader. For encoding/json.NewDecoder this is not necessarily a reasonable approach, since the Decoder may be intended to read from a long lived stream (i.e. a net.Conn) where the user may not want to limit the total amount of bytes read from the Reader across multiple reads, but does want to limit the number of bytes read during a single call to Decoder.Decode (i.e. reading 100 500 byte messages across the lifetime of the Decoder may be perfectly acceptable, but a single 50,000 byte read is not.)
A solution to this problem would be to add a method to Decoder, which allows limiting the amount read from the Reader into the internal Decoder buffer on each call to Decoder.Decode, returning an error if the number of bytes required to decode the value exceeds the set limit. Something along the lines of:
cc @dsnet
The text was updated successfully, but these errors were encountered: