Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add double-int to formats registry #3381

Merged
merged 1 commit into from
Nov 16, 2023
Merged

Conversation

mikekistler
Copy link
Contributor

This PR adds a new format, "double-int" to the OAI formats registry.

The double-int format should be used to specify an integer that can be stored in an IEEE 754 double-precision number without loss of precision.

Naming is one of the hardest problems in computer science. We considered many names other than "double-int" -- "int53", "jsonint", "safest", others. We even asked ChatGPT for suggestions.

In the end "double-int" seems most descriptive and accurate.

@bterlson
Copy link

I think it's worth calling out that the proposed double-int is the largest integer that avoids the interoperability problems called out by RFC8259. Right now the only options are using int32 which might not have enough range or int64 which is not widely interoperable. Also, if an application is intending to use the RFC 7493 I-JSON Message Format, the int64 format cannot be used.

@handrews
Copy link
Member

I'm not a fan of "double-int". What's wrong with sticking with the existing naming pattern and using "int53"?

@bterlson
Copy link

bterlson commented Sep 25, 2023

Far from an expert here but as someone who originally advocated for int53 and was convinced otherwise, I think there are two main problems with it:

  1. A double can store integers in a larger range than a hypothetical signed int53 and very slightly less than a signed int54. A signed int53 would have a range of -2^52 to 2^52-1, a double is -2^53+1 to 2^53-1, and a 54 bit signed integer is -2^53 to 2^53-1. Note that int54 is actually closer, but unfortunately -2^53 is just outside the double's safe range.
  2. int53 seems to indicate that the point of the integer is to store a number in 53 bits, but it's not - in most targets it will take 64 bits (whether a 64-bit integer or double), and really the point is to align the range with double's range so aligning on names too makes sense.

fwiw my aesthetic preference is for "doubleint" without the hyphen, but really the bikeshed can be any color as long as it can store a double-sized int without precision loss.

@mikekistler
Copy link
Contributor Author

My recollection of the discussion on this item in the Sep 28 meeting is that no one really liked the "double-int" name but all the other alternatives suggested, "int53", "int54", "safeint", all seemed worse (save the half-serious suggestion of "doubloon"!). I think we agreed to give folks a week or so to suggest other alternatives and if none come forward then we'd go with "double-int".

Sorry I waited a month to capture this but hopefully this matches other folks memory. If not please respond here. But if this is accurate I'd like to make one last call for suggestions and if none come forward that we can agree is better than "double-int" we'll call this done.

@baywet
Copy link
Contributor

baywet commented Nov 9, 2023

here is my suggestion: why don't we define a pattern instead by prefixing with the standard?
here the format would be ieee754-int53.
This way we only have to document the "valid" standard prefixes, and anybody can use the range under that.

@mikekistler
Copy link
Contributor Author

This PR was discussed in the TDC meeting on 11/9.

One point that was raised is that this type can be described using existing assertions, e.g.:

    type: integer
    minimum: -9007199254740991
    maximum: 9007199254740991

This is certainly true and works well for validation, but the intent of the format is not to express actual minimum and maximum values but rather as a guide for code generators on what type can be used to hold the value, in this case most especially in JavaScript.

Using format as a hint to generators is a well-established practice -- int32 and int64 are prime examples.

And using format rather than assertions for this purpose leaves the assertions available to express actual constraints on the value that are unrelated to how it is stored in a program.

@ralfhandl
Copy link
Contributor

ralfhandl commented Nov 16, 2023

using format rather than assertions for this purpose leaves the assertions available to express actual constraints on the value that are unrelated to how it is stored in a program

This seems to indicate that the existing format: "double" is sufficient for expressing the storage requirement, and the value constraint "integer" can be expressed with the assertion multipleOf: 1.

@webron webron merged commit 1d8ce42 into OAI:gh-pages Nov 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants