-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhancement request: avoid displaying floating point approximations to integers as integers #529
Comments
This is a duplicate of many earlier issues about the limitations of IEEE754. |
Dup of #218 and others. |
@nicowilliams wrote:
The problem that I was trying to pinpoint has nothing to do with the limitations of IEEE754, and everything to do with the details in the implementation of jvp_dtoa_fmt. I am just asking that certain floating point numbers be printed with the "e" notation to make it clear that they are floating point. I am not sure how best to do this in a portable way, but one possibility would be to check the proposed output before printing it and then adjusting the representation appropriately. Essentially, using the notation trial(x) to mean the trial string representation of x, the adjustment would be: if trial(x) has no e or decimal point and if trial(x+1) is the same as trial(x) then use the e notation. |
@pkoppstein IEEE754 can represent integers exactly in the range -2^52..2^52. 9999999999999995 is larger than 2^53, therefore well out of the range of IEEE754 exact integer range, therefore also well out of jq's. |
All numbers in jq are "floating point" in that they are C doubles (generally IEEE 754). There is no canonical way to print numbers in JSON. This has been the subject of much debate. Check the IETF JSON WG list archives :( (or don't: you'll find it likely to suck up a lot of your time). |
@nicowilliams wrote:
Great! If my simplistic algorithm offends, why not print anything outside that range with an e (so long as jq has no BigInt support)? That is easily defensible and fits in with the line of reasoning that you have previously used in discussing these number representation issues. Most important, it would be far better than the current situation, which invites confusion and disappointment (at best). That is, this change could be implemented without waiting for the larger issues to be resolved. |
Look at past issues about this, and at various mailing lists. I don't think we can pick a way to format numbers that everyone will be happy with. |
@nicowilliams wrote:
The motivation for writing this ER was based on my review of the issues and other documentation! Specifically, I was able to disentangle some of the issues, and it seemed to me (as it still does) that one way to address one class of issues would be to make a (very small) change in jvp_dtoa_fmt. As I've said, I cannot see any sound reason NOT to make such a change -- it's a Pareto improvement! |
I'm not clear as to what your proposed change is. Do you have a PR I could review? |
If you meant that any values outside the -2^52..2^52 range should be output in scientific notation. I think that'd be fine, but you couldn't rely on that to indicate that the number is too large: you'd still have to parse it. Now what about integers in that range? When should they be output in scientific notation, and when shouldn't they? E.g., 1e9: 1000000000, or 1e9? IF we never print those in scientific notation and always print integers outside the -2^52..2^52 range in scientific notation then that might help one detect data loss at a glance, which I think is what you want, and you're right, it'd be easy to make this happen (I think). BUT, there are users who want input form preserved for numbers that are passed through to output unchanged. (There's a couple of issues about that.) I'm certain that we can't make everyone happy as long as we stick to IEEE754, and yet IEEE754 is the industry standard -- switching to bignums won't necessarily help in all cases. |
I'm not naysaying, BTW. |
@nicowilliams wrote:
Excellent!!! I think that that would completely resolve this particular "issue", while avoiding the ruffling of any feathers.
The goal here is to have a defensible Pareto improvement, not to resolve all the issues related to numbers.
That is an interesting question, but it's totally separable from the issue here, which primarily concerns integers outside that range. Perhaps it would be better to open a different "incident report" to avoid muddying the waters here, but since you ask, let me offer two different answers from two slightly different perspectives:
That goal is worthy, but it's not necessary to achieve it in order to resolve this particular "issue" (#529).
Understood.
Are there any issues besides implementation and backward compatibility issues?
Yes, I completely agree. Switching to BigInt will raise some interesting issues too. |
@pkoppstein I didn't commit to making any changes; I'm tempted to leave it all as-is until we get to bignum support. Please look through the list of issues. |
There are some options supported internally already, but none that do what you'd like:
Looks like understanding the dtoa code is in order. |
@pkoppstein You're quite right BTW, that if we're to encourage people to use |
Incidentally, the same is true of |
Docs update pushed. See 2159f9f. |
The following enhancement request is related to but quite distinct from #369.
Currently, jq nicely transitions from integers to floats in a noble attempt to mitigate the absence of BigInt support, but by displaying approximations to integers as though they were integers, things can be needlessly confusing at best and misleading at worst.
Consider:
It is easy to understand that at some point we must lose precision, so the problem here is simply that at (**), jq is misleadingly (incorrectly?) displaying a floating point value (an approximation to 9999999999999997) as an exact integer (9999999999999996).
The text was updated successfully, but these errors were encountered: