Mishandling of large numbers #1959

mendess · 2019-08-08T19:25:17Z

Describe the bug
Jq fails to parse and output large numbers, mangling the number on the output.

To Reproduce
For the input json:

{"key":418502930602131457}

jq . test.json produces this:

{
  "key": 418502930602131460
}

For the input:

{"key":489819608690327552}

jq . test.json produces this:

{
  "key": 489819608690327550
}

Expected behavior
The numbers should not change.

Environment:

OS and Version: Linux 5.2.6-arch1-1-ARCH #1 SMP PREEMPT Sun Aug 4 14:58:49 UTC 2019 x86_64 GNU/Linux
jq version `jq-1.6

Additional Notes
The output number seems to always end in 0, but sometimes it seems to "round". Refer to both of the provided examples.

The text was updated successfully, but these errors were encountered:

wtlangford · 2019-08-08T19:39:32Z

This is a known issue (see https://github.com/stedolan/jq/issues?utf8=%E2%9C%93&q=is%3Aissue+label%3Aieee754 for a history on the topic).

In short, jq uses IEEE754 doubles to store numbers (which is permitted by the JSON specification). This means that very large integers might get adjusted to the nearest representable value. (Even if you weren't using jq, it's possible some other tool you use would do this to you)

There's a PR (#1752) adding support for large numbers, but it comes with some performance penalties, and we haven't had the time to get it merged yet. Hopefully Soon™.

I generally suggest that a large number that you aren't doing math on is actually a string. If you're able to represent it as a string instead of a number, then that's your best bet until we get that big number support merged.

mendess · 2019-08-08T20:11:43Z

Thanks for the quick response and sorry for the duplicate issue, should I close it?

The numbers I am working with used to be strings actually but we switched to a more typesafe language (Rust) and having them be actual numbers was more ergonomic. Good luck with the PR :)

cblp · 2019-08-19T11:47:46Z

jq uses IEEE754 doubles to store numbers (which is permitted by the JSON specification)

Please show where the standard permits it.

ECMA-404:

JSON is agnostic about the semantics of numbers. In any programming language, there can be a variety of number types of various capacities and complements, fixed or floating, binary or decimal. That can make interchange between different programming languages difficult. JSON instead offers only the representation of numbers that humans use: a sequence of digits. All programming languages know how to make sense of digit sequences even if they disagree on internal representations. That is enough to allow interchange.

cblp · 2019-08-19T12:12:56Z

Alternative spec, RFC 7159:

This specification allows implementations to set limits on the range
and precision of numbers accepted. Since software that implements
IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is
generally available and widely used, good interoperability can be
achieved by implementations that expect no more precision or range
than these provide, in the sense that implementations will
approximate JSON numbers within the expected precision. A JSON
number such as 1E400 or 3.141592653589793238462643383279 may indicate
potential interoperability problems, since it suggests that the
software that created it expects receiving software to have greater
capabilities for numeric magnitude and precision than is widely
available.

I think, "limits on numbers accepted" can mean rejection of some values, but not an alteration of textual representation.

pkoppstein · 2019-08-19T16:07:38Z

RFC 8259 in the Parsers section says:

A JSON parser MUST accept all texts that conform to the JSON grammar.
... An implementation may set limits on the range and precision of numbers.

jq sets limits, and accepts valid texts. If it raised. an error if a limit was violated, it would violate the “must accept” requirement, wouldn’t it? So it seems to me there’s good reason to be unhappy about the mishmash that became of Crockford’s original intention.

https://tools.ietf.org/html/rfc8259#page-10

cblp · 2019-08-19T16:13:51Z

@pkoppstein, jq doesn't transform JSON into another representation, it is not a parser.

wtlangford · 2019-08-19T16:26:09Z

jq doesn't transform JSON into another representation, it is not a parser.

jq does have a parser. How else would it transform JSON text (which is "a text format for the serialization of structured data") into actual values/data that it can run your program on?
jq also has a generator (as defined by the RFC), in that it produces JSON texts which strictly conform to the standard.

I think, "limits on numbers accepted" can mean rejection of some values, but not an alteration of textual representation.

The issue here is that to our parser and generator, the JSON texts for these "altered" numbers represent the same value (because to us, values are IEEE754 doubles, and those have precision issues for very large and very small numbers).

We're aware this is a pain point for people. Lots of tools output very large integer IDs, and from the perspective of a user, jq ends up mangling those IDs. As I mentioned above, we have a PR (#1752) in progress that will add some big number support to jq, at the cost of some performance. We (the maintainers) haven't had time to finalize and merge it yet, but it's high on our jq priority list.

pkoppstein · 2019-08-19T16:36:40Z

@cblp - jq includes a parser. The architecture precludes the non-parser part of jq from seeing the input as anything other than what the parser reveals. As I’ve already indicated, we’re all aware of the problem, so whether you blame jq’s architecture or the state of JSON requirements seems somewhat pointless.

cblp · 2019-08-19T16:49:40Z

Yes, jq has a parser inside, but it's an implementation detail, below the abstraction line. The user does not expect jq to behave like a parser. Besides that, usage of double in that parser is unnecessary and harmful, as we see. It is possible to parse JSON without modifying it. So, the reference to "parser" doesn't make this behavior valid.

…

On Mon, 19 Aug 2019, 19:36 pkoppstein, ***@***.***> wrote: @cblp <https://github.com/cblp> - jq includes a parser. The architecture precludes the non-parser part of jq from seeing the input as anything other than what the parser reveals. As I’ve already indicated, we’re all aware of the problem, so whether you blame jq’s architecture or the state of JSON requirements seems somewhat pointless. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1959>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAAPQB3YX7RVIS7ZDT2W2V3QFLD2NANCNFSM4IKNXMQQ> .

nicowilliams · 2019-08-19T17:00:56Z

@cblp a user who only ever runs jq . probably doesn't expect jq to be a JSON parser. Any user who writes jq programs more complex than . will understand (on some level) that jq is indeed a JSON parser with an internal representation. At any rate, jq really does parse JSON into an internal representation.

The next release of jq will have better range and precision for numerics.

cblp · 2019-08-19T17:04:46Z

I'm sorry for looking blaming, I just wanted to clarify the reason for current design.

pkoppstein · 2019-08-20T05:12:07Z

@cblp - If there is a reason, it probably is some mix of a desire to achieve efficiency in a quick and simple way, a sense that the JSON spec allows implementations to set limits, and perhaps a sense or belief that in practice, the issue is relatively unimportant. Feel free to assign whichever weights you like :-)

wjmelements · 2020-04-01T00:13:20Z

I am still seeing this for big integers. For example, 5474205234507702943235 becomes 5474205234507703000000. The difference is meaningful for me and prevents me from using jq.

pkoppstein · 2020-04-01T01:47:13Z

@wjmelements - Good news! The issue has been addressed in the "master" version of jq:

$ jqMaster --version
jq-1.6-107-g24564b2

$ jqMaster -n 5474205234507702943235
5474205234507702943235

The enhancement dates from Oct 19, 2019, which is after the release of jq 1.6.

jprupp · 2021-02-12T17:49:27Z

I have the version that comes with Fedora 34, and it still has this issue: it cannot handle large numbers. I would love it if it could.

AlaaHamoudah · 2021-05-06T09:52:07Z

This is a very serious issue, I would really appreciate if this issue is fixed

scottyob · 2021-06-11T17:30:01Z

+1 on this. Can also confirm master helps our one use case:

abc@202d6ad3f0b5:/tmp/jq$ ./jq --version
jq-1.6
abc@202d6ad3f0b5:/tmp/jq$ echo '{"a":9011153322235679}' | ./jq '.a'
9011153322235680

abc@202d6ad3f0b5:/tmp/jq/jq$ ./jq --version
jq-1.6-137-gd18b2d0-dirty
abc@202d6ad3f0b5:/tmp/jq/jq$ echo '{"a":9011153322235679}' | ./jq '.a'
9011153322235679

wpietri · 2021-06-29T16:47:43Z

If fixing this is a problem, then perhaps you could make it blow up when it mangles data? We're having an issue where somebody used JQ at the beginning of a research project. It silently corrupted a bunch of ids, so the downstream work now has to be redone or hackily fixed.

This means we can't really trust jq unless we carefully validate all the data as jq-safe. So in practice it looks like we'll just have to stop using jq. It's a lovely tool, but not so lovely that we want to end up looking like fools by coming to the wrong conclusion.

scottyob · 2021-06-29T18:08:09Z

@wpietri: Pretty sure you could just use master to solve most of your problems. My understanding is they store these in cStrings and only throw away resolution when you try and do mathematical operations on them.

It's really a shame we've not had a release in ages with these fixes in them.

gdamore · 2023-05-30T20:59:27Z

Hi from the future! It's 2023, and we still don't have a release with a fix for this!

leonid-s-usov · 2023-05-31T10:32:40Z

This is handled by #1752 and should appear in the next build

emanuele6 · 2023-09-08T20:16:05Z

jq 1.7 released with the fix. closing

wtlangford added ieee754 dup labels Aug 8, 2019

itchyny mentioned this issue Apr 29, 2020

Big digital distortion #2109

Closed

itchyny mentioned this issue May 20, 2020

Identity filter increments large integer values unexpectedly #2131

Closed

itchyny mentioned this issue Sep 16, 2020

Jq doesn't work with long integers? #2182

Closed

itchyny mentioned this issue Oct 28, 2020

Numbers became corrupted on conversion #2201

Closed

Zelldon mentioned this issue Jan 29, 2021

I can't set variables on a workflow instance with a cluster of multiple partitions camunda/camunda#6214

Closed

petli-openshift mentioned this issue Jul 19, 2021

Using of JQ leads to the loss on int64 numbers openshift-online/ocm-cli#295

Closed

Zelldon mentioned this issue Nov 3, 2021

Two process instances with different process definitions share the same key camunda/camunda#8129

Closed

itchyny mentioned this issue Jan 13, 2022

Numbers longer 17 digits wiped by zeroes #2396

Closed

emanuele6 mentioned this issue Aug 1, 2022

Numbers longer 17 digits wiped by zeroes in Mac #2460

Closed

tomasz-sternumiot mentioned this issue Nov 22, 2022

Mishandling of large numbers hsen-dev/vscode-elastic#69

Open

itchyny added the fixed in master label Jun 5, 2023

emanuele6 closed this as completed Sep 8, 2023

weihanglo mentioned this issue Nov 30, 2023

Cargo fingerprints written to json serialize u64 hashes as JSON numbers even when they are too large rust-lang/cargo#13081

Closed

Hecate2 mentioned this issue Jul 19, 2024

Install jq 1.7 to handle large gwei balance safe-global/safe-singleton-factory#588

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mishandling of large numbers #1959

Mishandling of large numbers #1959

mendess commented Aug 8, 2019

wtlangford commented Aug 8, 2019

mendess commented Aug 8, 2019

cblp commented Aug 19, 2019

cblp commented Aug 19, 2019 •

edited

Loading

pkoppstein commented Aug 19, 2019 •

edited

Loading

cblp commented Aug 19, 2019

wtlangford commented Aug 19, 2019

pkoppstein commented Aug 19, 2019

cblp commented Aug 19, 2019 via email

nicowilliams commented Aug 19, 2019

cblp commented Aug 19, 2019

pkoppstein commented Aug 20, 2019

wjmelements commented Apr 1, 2020

pkoppstein commented Apr 1, 2020 •

edited

Loading

jprupp commented Feb 12, 2021

AlaaHamoudah commented May 6, 2021

scottyob commented Jun 11, 2021

wpietri commented Jun 29, 2021

scottyob commented Jun 29, 2021

gdamore commented May 30, 2023

leonid-s-usov commented May 31, 2023

emanuele6 commented Sep 8, 2023

Mishandling of large numbers #1959

Mishandling of large numbers #1959

Comments

mendess commented Aug 8, 2019

wtlangford commented Aug 8, 2019

mendess commented Aug 8, 2019

cblp commented Aug 19, 2019

cblp commented Aug 19, 2019 • edited Loading

pkoppstein commented Aug 19, 2019 • edited Loading

cblp commented Aug 19, 2019

wtlangford commented Aug 19, 2019

pkoppstein commented Aug 19, 2019

cblp commented Aug 19, 2019 via email

nicowilliams commented Aug 19, 2019

cblp commented Aug 19, 2019

pkoppstein commented Aug 20, 2019

wjmelements commented Apr 1, 2020

pkoppstein commented Apr 1, 2020 • edited Loading

jprupp commented Feb 12, 2021

AlaaHamoudah commented May 6, 2021

scottyob commented Jun 11, 2021

wpietri commented Jun 29, 2021

scottyob commented Jun 29, 2021

gdamore commented May 30, 2023

leonid-s-usov commented May 31, 2023

emanuele6 commented Sep 8, 2023

cblp commented Aug 19, 2019 •

edited

Loading

pkoppstein commented Aug 19, 2019 •

edited

Loading

pkoppstein commented Apr 1, 2020 •

edited

Loading