Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does jq automatically convert double values to int when possible? #356

Closed
opqpop opened this issue May 9, 2014 · 12 comments
Closed

Comments

@opqpop
Copy link

opqpop commented May 9, 2014

I have input_file:

{"hi":48.0, "yep":48.123}

I run this

jq . < input_file > output_file
cat output_file

My output:

{
  "yep": 48.123,
  "hi": 48
}

I would like the output to be

{
  "yep": 48.123,
  "hi": 48.0
}

How do I prevent jq from automatically converting my double values to an int?

@nicowilliams
Copy link
Contributor

jq doesn't convert doubles to ints. It parses numbers as doubles, and it
keeps numbers in memory as doubles. Losing a the fractional part when it
is zero is just how jq formats numbers. There are no options at this time
for number parsing or encoding. JSON is what it is, and so printing 48
is perfectly valid.

Any options would have to be global in effect, which might not be good
enough for some users. It seems better to just not go there at all, I
think.

@nicowilliams
Copy link
Contributor

See #369.

@schoo105
Copy link

I was planning to use jq for a project, but this problem alone is enough to make me not use it.

@nicowilliams
Copy link
Contributor

On Mon, Feb 23, 2015 at 6:06 PM, schoo105 notifications@github.com wrote:

I was planning to use jq for a project, but this problem alone is enough
to make me not use it.

I'm sorry jq doesn't do what you need. Do be careful with JSON though: the
format does not require that numbers be encoded the way that you want.
Indeed, it requires a lot less than you might expect. Read RFC7159
carefully!

@pkoppstein
Copy link
Contributor

@schoo105 - For what it's worth, I agree that the problem you mention is both serious and unfortunate. In fact, in my view, rfc7159 is, with respect to numbers, a step backwards relative to the original (Crockford) specification.

However, with respect to jq, I hope you will consider some possible workarounds. If you don't have to do arithmetic on the large numbers, could they be converted to strings first? If you do have to do arithmetic on large integers, there is a (not-yet-very-well-tested) "BigInt" library for jq at https://gist.github.com/pkoppstein/d06a123f30c033195841

@nicowilliams
Copy link
Contributor

nicowilliams commented Feb 24, 2015 via email

@pkoppstein
Copy link
Contributor

@nicowilliams wrote:

.... not much changed ...

The original JSON specification of number (http://www.json.org), which is faithfully reflected in ECMA-404
(http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf), is essentially syntactic, so the big difference between the original view and the RFC (http://rfc7159.net/rfc7159#rfc.section.1.2) is the addition of the paragraph beginning "This specification allows implementations to set limits ...".

@nicowilliams
Copy link
Contributor

@pkoppstein That's neither here nor there, but just in case @opqpop or @schoo105 are confused, and just because this comes up from time to time, so we might as well hash it out, let's we should cover this in some detail.

First, and for whatever it might be worth (a lot, IMO), RFC4627 long predated the ECMA attempt at a JSON standard.

Second, both permit leaving out a zero fractional part of a number (e.g., 0, 0.0 are both permitted) and neither specifies (in the sense of requiring, or even recommending) any particular range, precision, nor representation for numbers, nor do they specify a semantic model for arithmetic operations on numbers, not even equality. In practice most implementations interoperate just fine for a subset of numbers that can be represented in any JSON specification, and with a modicum of semantics.

Anyone complaining that jq's data model is not the one they prefer could glare at Douglas Crockford, the IETF, and/or the ECMA for that matter, but the horse left the barn many years ago, and long before jq happened on the scene.

As for the text in RFC7159 that you object to, a) it is informative, not normative (i.e., it requires nothing of implementors), b) it wasn't added to to change JSON, it was added to reflect actual deployments. To be sure, an informative note could create harm if taken as normative, but this text is clearly informative and harmless, and it is useful: it tells encoders what subset of numbers are likely to interoperate.

Incidentally, JSON implementations come in these flavors as to numbers: those that implement IEEE754 as their numeric value range, precision, and semantics (minus NaNs and infinities, because JSON does not permit them), those that implement some sort of bigreal, and those that implement something much more constrained than IEEE754, none being strictly a superset or subset of the others. Obviously implementations with bigreal interoperate the best as parsers, but implementations more constrained than IEEE754 interoperate better than all the others _as encoders. Since the IETF is in the business of making interoperable specifications, an interoperability note had to be made -- nothing else would have made it possible to publish any update to RFC4627. But this is not relevant to either @opqpop or @schoo105, since they just want a fractional part to always be included, and that's permitted (but not required) by the standard, and has nothing to do with IEEE754.

We might like to write or accept a patch to add an option to always include a .0 fractional part for integers, but I'd like to have a list of all the number formatting options (not involving switching to something other than IEEE754) that people want. I don't want such a list so much because I want more work to do, or so we can do it all at once, nor even to plan for doing it, but because first we should want to see whether this way lies madness, so that if it does, we don't go there :)

Admittedly, JSON number representation is not something I spend a lot of time thinking about: because by and large what jq does (and all the other C JSON implementations I've used do) interoperates. But I'd really like to know what parser(s) @schoo105 is using that don't parse numbers encoded by jq, so that we can consider the importance of this issue/request in that context, and so that we can go file issues/requests against whatever that parser is.

@nicowilliams
Copy link
Contributor

Regarding your gist, @pkoppstein, that's really cool!

Also, to be clear, I have no idea what @schoo105 needs here, only what @opqpop wrote. I'm not at all certain that what @schoo105 wants is infinite (or even just better than IEEE754) range and precision numbers.

@nicowilliams
Copy link
Contributor

BTW, the reason that IEEE754 came to be of prominence in JSON implementations is that IEEE754 is the common denominator in ECMAScript implementations, some even going so far as to use NaN coding, where immediate values (numbers, pointers) are all encoded as C doubles, which are half the size of libjq's jv C type.

@pkoppstein
Copy link
Contributor

@nicowilliams wrote:

I have no idea what @schoo105 needs here

Same here, but I remain discombobulated largely by the discrepancy between some of the "marketing materials" at github.com/stedolan and the reality. For example, . is characterized as a pretty-printer, and there are indeed json pretty printers, such as jsonpp. Contrast therefore:

$  echo 123456789123456789 | jsonpp
123456789123456789
$ echo 123456789123456789 | jq .
123456789123456780

and for non-integral decimals:

$ echo 12345678912345678912345678912345678912345678912345678.9 | jsonpp
12345678912345678912345678912345678912345678912345678.9
$ echo 12345678912345678912345678912345678912345678912345678.9 | jq .
1.2345678912345678e+52

So my own beef is primarily threefold:

  • Regarding "number", the original Crockford specification, the spec. still at json.org, and the ECMA version, though all imperfect, are all preferable to the RFC because they emphasize the transmission of information* (in this case, integers-as-in-mathematics and finite-decimal-numbers-as-in-arithmetic);
  • the aforementioned discrepancy;
  • the fact that for JSON pretty-printing, there is at least one tool which does a better job than jq.

I must admit, however, that my discombobulation will be reduced to the point of insignificance if/when
jq is endowed with "big integers".

@mitar
Copy link

mitar commented Feb 19, 2019

I just use Python and its json module. Now that Python maintains the order in its dicts, it is easy to modify JSON and get output match the input, expect for the changes you want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants