Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand upon and improve reading of integers from JSON #35999

Merged
merged 3 commits into from
Dec 15, 2019

Conversation

Davi-DeGanne
Copy link
Contributor

Summary

SUMMARY: Infrastructure "Expand upon and improve reading of integers from JSON"

Purpose of change

This is primarily to expand support for very large numbers (because I need them to work properly and consistently for #35910). Before, we read numbers the following way:

  1. Look for a negative sign. If present, set a flag neg to true.
  2. Read in the number, ignoring the decimal, as an int, we'll call it num
  3. For every digit after the decimal, decrease a counter exp by 1
  4. If the number was expressed in scientific notation, add the number after the E/e to the counter exp
  5. Combine above information via num * std::pow( 10.0, exp ) * ( neg ? -1.0 : 1.0 )
  6. Now (assuming we asked for something besides a double) cast the resulting double into one of the following types (without any checks if casting it makes sense): float, int, unsigned int, or int64_t.

There were a lot of problems with this:

  • Trying to read in a number whose absolute value was bigger than INT_MAX would result in undefined behavior due to overflow, so requesting an int64_t was functionally identical to requesting an int, which similar pointlessness for unsigned int.
  • Really large numbers might not get read accurately due to being cast to double by std::pow, then back to their respective integer data type.
  • Adding support for more integer types would be error prone and inaccurate due to the above two problems, especially for my purposes -- saving/loading a uint64_t hash.
  • The type casting was incredibly unsafe across the board; nothing was being done to make sure values would actually fit into the data type they were being cast to.

Describe the solution

Changed the above 6 steps as follows:

  • num is now a uint64_t so it is big enough to handle every other data type.
  • Changed the steps above such that steps 1-4 are still performed on all numbers, but step 5 is only done if a float or double was requested. Otherwise, the resultant num, neg, and exp variables are passed to dedicated functions that calculate the resultant integer in such a way that it is never cast to a double or float in the process.
  • Added deserialization function for uint64_t.

I also made number loading more strict. An error will now be thrown if:

  1. An unsigned integer is provided with a negative value.
  2. The supplied integer is too big (we now have int64 support, but even that has its limits).
  3. An integer is supplied with a decimal (previously it was just truncated with no error).

Implementing number 3 immediately caused a lot of JSON load errors in both the base game and in mods. I went through and fixed these.

Testing

I've loaded into the game and checked all kinds of items to see if the numerical properties were properly read from their JSON definitions. I also tested #35910 to ensure that monotony penalties were carrying over after save/load, and they were, which implies that the new uint64_t hashes are being correctly and precisely (de)serialized.

Additional context

Possibly, when this is merged, there might be some JSON load errors. It mostly will depend on whether anyone adds numbers with decimals to JSON properties that are integer-only. The most likely offender would be weight and/or volume entries.

Primarily to expand support for very large integers. Before, precision
was being lost due to conversion to float, and then back to integer.

I also made integer loading more strict. An error will now be thrown if:
 1. An unsigned integer is provided with a negative value.
 2. The supplied integer is too big (we now have int64 support, but even
    that has its limits)
 3. An integer is supplied with a decimal (previously it was truncated)
 4. The above rules hold even when scientific notation is used (yes, we
    have support for using scientific notation in JSON fields)

Number 3 immediately caused a lot of errors in both the base game
and in mods. I went through and fixed these.
@curstwist curstwist added [C++] Changes (can be) made in C++. Previously named `Code` [JSON] Changes (can be) made in JSON Code: Infrastructure / Style / Static Analysis Code internal infrastructure and style labels Dec 10, 2019
src/json.cpp Outdated
error( "Found a number greater than " + std::to_string( INT_MAX ) +
" which is unsupported in this context." );
} else if( n.negative && n.number > ( UINT_MAX - INT_MAX ) ) {
error( "Found a number less than " + std::to_string( INT_MIN ) +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reports INT_MIN in the error message, but the test is against a different term. If it's guaranteed that both evaluate to the same number, why use different terms in the code? If the numbers might be different (UINT_MAX - INT_MAX is not guaranteed to be the same as INT_MIN), the error will report a different error condition than what is checked.

Copy link
Contributor Author

@Davi-DeGanne Davi-DeGanne Dec 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UINT_MAX - INT_MAX is guaranteed to evaluate to the absolute value of INT_MIN. I can't do -INT_MIN because of overflow, and I can't do -( UINT_MAX - INT_MAX ) because an unsigned int can't be negative.

The only option I see to reconcile your issue with the type limitations, is to reframe the message like this:

error( "Found a number whose absolute value is more than " +
       std::to_string( UINT_MAX - INT_MAX ) +
       " which is unsupported in this context." )

Which I think is too convoluted to be reasonable, primarily due to the fact that it's not actually accurate -- if the number is positive, the absolute value limit is not UINT_MAX - INT_MAX, it's just INT_MAX.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, Kevin enlightened me to numeric_limits<int>::min(). Will fix this shortly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Err, nevermind again, I don't know the solution to this, and now I'm not even sure that this:

UINT_MAX - INT_MAX is guaranteed to evaluate to the absolute value of INT_MIN.

Is accurate. Still looking for a solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think I've solved this, would appreciate a second look if you have time.

The way the absolute value of INT_MIN was being determined before may
have given inconsistent results on different platforms.
Also, switched to using std::numeric_limits rather than the
corresponding macro constants.
src/json.cpp Outdated Show resolved Hide resolved
I wanted to be absolutely sure that these functions never risked
overflow at any point on any compiler, because I know that if at any
point they return an incorrect value, saved games will fail to load due
to incorrectly thinking it's unsafe to read INT_MIN from JSON.
Also, Qrox pointed out that they should be constexpr.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[C++] Changes (can be) made in C++. Previously named `Code` Code: Infrastructure / Style / Static Analysis Code internal infrastructure and style [JSON] Changes (can be) made in JSON
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants