Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mishandling of large numbers #1959

Closed
mendess opened this issue Aug 8, 2019 · 22 comments
Closed

Mishandling of large numbers #1959

mendess opened this issue Aug 8, 2019 · 22 comments

Comments

@mendess
Copy link

mendess commented Aug 8, 2019

Describe the bug
Jq fails to parse and output large numbers, mangling the number on the output.

To Reproduce
For the input json:

{"key":418502930602131457}

jq . test.json produces this:

{
  "key": 418502930602131460
}

For the input:

{"key":489819608690327552}

jq . test.json produces this:

{
  "key": 489819608690327550
}

Expected behavior
The numbers should not change.

Environment:

  • OS and Version: Linux 5.2.6-arch1-1-ARCH #1 SMP PREEMPT Sun Aug 4 14:58:49 UTC 2019 x86_64 GNU/Linux
  • jq version `jq-1.6

Additional Notes
The output number seems to always end in 0, but sometimes it seems to "round". Refer to both of the provided examples.

@wtlangford
Copy link
Contributor

This is a known issue (see https://github.com/stedolan/jq/issues?utf8=%E2%9C%93&q=is%3Aissue+label%3Aieee754 for a history on the topic).

In short, jq uses IEEE754 doubles to store numbers (which is permitted by the JSON specification). This means that very large integers might get adjusted to the nearest representable value. (Even if you weren't using jq, it's possible some other tool you use would do this to you)

There's a PR (#1752) adding support for large numbers, but it comes with some performance penalties, and we haven't had the time to get it merged yet. Hopefully Soon™.

I generally suggest that a large number that you aren't doing math on is actually a string. If you're able to represent it as a string instead of a number, then that's your best bet until we get that big number support merged.

@mendess
Copy link
Author

mendess commented Aug 8, 2019

Thanks for the quick response and sorry for the duplicate issue, should I close it?

The numbers I am working with used to be strings actually but we switched to a more typesafe language (Rust) and having them be actual numbers was more ergonomic. Good luck with the PR :)

@cblp
Copy link

cblp commented Aug 19, 2019

jq uses IEEE754 doubles to store numbers (which is permitted by the JSON specification)

Please show where the standard permits it.

ECMA-404:

JSON is agnostic about the semantics of numbers. In any programming language, there can be a variety of number types of various capacities and complements, fixed or floating, binary or decimal. That can make interchange between different programming languages difficult. JSON instead offers only the representation of numbers that humans use: a sequence of digits. All programming languages know how to make sense of digit sequences even if they disagree on internal representations. That is enough to allow interchange.

@cblp
Copy link

cblp commented Aug 19, 2019

Alternative spec, RFC 7159:

This specification allows implementations to set limits on the range
and precision of numbers accepted. Since software that implements
IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is
generally available and widely used, good interoperability can be
achieved by implementations that expect no more precision or range
than these provide, in the sense that implementations will
approximate JSON numbers within the expected precision. A JSON
number such as 1E400 or 3.141592653589793238462643383279 may indicate
potential interoperability problems, since it suggests that the
software that created it expects receiving software to have greater
capabilities for numeric magnitude and precision than is widely
available.

I think, "limits on numbers accepted" can mean rejection of some values, but not an alteration of textual representation.

@pkoppstein
Copy link
Contributor

pkoppstein commented Aug 19, 2019

RFC 8259 in the Parsers section says:

A JSON parser MUST accept all texts that conform to the JSON grammar.
... An implementation may set limits on the range and precision of numbers.

jq sets limits, and accepts valid texts. If it raised. an error if a limit was violated, it would violate the “must accept” requirement, wouldn’t it? So it seems to me there’s good reason to be unhappy about the mishmash that became of Crockford’s original intention.

https://tools.ietf.org/html/rfc8259#page-10

@cblp
Copy link

cblp commented Aug 19, 2019

@pkoppstein, jq doesn't transform JSON into another representation, it is not a parser.

@wtlangford
Copy link
Contributor

jq doesn't transform JSON into another representation, it is not a parser.

jq does have a parser. How else would it transform JSON text (which is "a text format for the serialization of structured data") into actual values/data that it can run your program on?
jq also has a generator (as defined by the RFC), in that it produces JSON texts which strictly conform to the standard.

I think, "limits on numbers accepted" can mean rejection of some values, but not an alteration of textual representation.

The issue here is that to our parser and generator, the JSON texts for these "altered" numbers represent the same value (because to us, values are IEEE754 doubles, and those have precision issues for very large and very small numbers).

We're aware this is a pain point for people. Lots of tools output very large integer IDs, and from the perspective of a user, jq ends up mangling those IDs. As I mentioned above, we have a PR (#1752) in progress that will add some big number support to jq, at the cost of some performance. We (the maintainers) haven't had time to finalize and merge it yet, but it's high on our jq priority list.

@pkoppstein
Copy link
Contributor

@cblp - jq includes a parser. The architecture precludes the non-parser part of jq from seeing the input as anything other than what the parser reveals. As I’ve already indicated, we’re all aware of the problem, so whether you blame jq’s architecture or the state of JSON requirements seems somewhat pointless.

@cblp
Copy link

cblp commented Aug 19, 2019 via email

@nicowilliams
Copy link
Contributor

@cblp a user who only ever runs jq . probably doesn't expect jq to be a JSON parser. Any user who writes jq programs more complex than . will understand (on some level) that jq is indeed a JSON parser with an internal representation. At any rate, jq really does parse JSON into an internal representation.

The next release of jq will have better range and precision for numerics.

@cblp
Copy link

cblp commented Aug 19, 2019

I'm sorry for looking blaming, I just wanted to clarify the reason for current design.

@pkoppstein
Copy link
Contributor

@cblp - If there is a reason, it probably is some mix of a desire to achieve efficiency in a quick and simple way, a sense that the JSON spec allows implementations to set limits, and perhaps a sense or belief that in practice, the issue is relatively unimportant. Feel free to assign whichever weights you like :-)

@wjmelements
Copy link

I am still seeing this for big integers. For example, 5474205234507702943235 becomes 5474205234507703000000. The difference is meaningful for me and prevents me from using jq.

@pkoppstein
Copy link
Contributor

pkoppstein commented Apr 1, 2020

@wjmelements - Good news! The issue has been addressed in the "master" version of jq:

$ jqMaster --version
jq-1.6-107-g24564b2

$ jqMaster -n 5474205234507702943235
5474205234507702943235

The enhancement dates from Oct 19, 2019, which is after the release of jq 1.6.

@jprupp
Copy link

jprupp commented Feb 12, 2021

I have the version that comes with Fedora 34, and it still has this issue: it cannot handle large numbers. I would love it if it could.

@AlaaHamoudah
Copy link

This is a very serious issue, I would really appreciate if this issue is fixed

@scottyob
Copy link

+1 on this. Can also confirm master helps our one use case:

abc@202d6ad3f0b5:/tmp/jq$ ./jq --version
jq-1.6
abc@202d6ad3f0b5:/tmp/jq$ echo '{"a":9011153322235679}' | ./jq '.a'
9011153322235680

abc@202d6ad3f0b5:/tmp/jq/jq$ ./jq --version
jq-1.6-137-gd18b2d0-dirty
abc@202d6ad3f0b5:/tmp/jq/jq$ echo '{"a":9011153322235679}' | ./jq '.a'
9011153322235679

@wpietri
Copy link

wpietri commented Jun 29, 2021

If fixing this is a problem, then perhaps you could make it blow up when it mangles data? We're having an issue where somebody used JQ at the beginning of a research project. It silently corrupted a bunch of ids, so the downstream work now has to be redone or hackily fixed.

This means we can't really trust jq unless we carefully validate all the data as jq-safe. So in practice it looks like we'll just have to stop using jq. It's a lovely tool, but not so lovely that we want to end up looking like fools by coming to the wrong conclusion.

@scottyob
Copy link

@wpietri: Pretty sure you could just use master to solve most of your problems. My understanding is they store these in cStrings and only throw away resolution when you try and do mathematical operations on them.

It's really a shame we've not had a release in ages with these fixes in them.

@gdamore
Copy link

gdamore commented May 30, 2023

Hi from the future! It's 2023, and we still don't have a release with a fix for this!

@leonid-s-usov
Copy link
Contributor

This is handled by #1752 and should appear in the next build

@emanuele6
Copy link
Member

jq 1.7 released with the fix. closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests