Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report Loss_Of_Integer_Precision when an integer is not exactly representable as a float during conversion #7509

Merged
merged 24 commits into from
Aug 8, 2023

Conversation

radeusgd
Copy link
Member

@radeusgd radeusgd commented Aug 7, 2023

Pull Request Description

Closes #7353

I introduce a new type WithAggregatedProblems, because WithProblems was too simple - it only allowed to hold a List<Problem> but AggregatedProblems is more than that. Ideally we shouldn't multiply entities like this too much. We should probably unify all to use WithAggregatedProblems - but after starting this, I realised it will likely just take too much effort to do for this little PR. So instead, I created a follow-up task for this: #7514

Important Notes

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • The documentation has been updated, if necessary.
  • Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
  • All code follows the
    Scala,
    Java,
    and
    Rust
    style guides. In case you are using a language not listed above, follow the Rust style guide.
  • All code has been tested:
    • Unit tests have been written where possible.
    • If GUI codebase was changed, the GUI was tested when built using ./run ide build.

@radeusgd radeusgd added the CI: No changelog needed Do not require a changelog entry for this PR. label Aug 7, 2023
@radeusgd radeusgd force-pushed the wip/radeusgd/7353-long-to-double-loss-of-precision branch from f8251dc to 87f78ec Compare August 7, 2023 16:24
@radeusgd
Copy link
Member Author

radeusgd commented Aug 7, 2023

I've been switching over some instances of WithProblems to WithAggregatedProblems. I think overall we want to unify all of them, but that is out of scope of this PR - to be done as #7514

Comment on lines +103 to +107

/** @return any problems that occurred when building the Storage. */
public AggregatedProblems getProblems() {
return AggregatedProblems.of();
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a great pattern, in fact I think it is bad.

The proper solution would be to modify seal() to return a WithAggregatedProblems<Storage<?>> - that would ensure that the problems are handled throughout the codebase.

But we use seal() in really many places and I realised ensuring all of this is nicely handled will just make this PR far too large - thus I decided this will be better done as a separate task - #7514

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I think there's too much need to handle problem propagation explicitly in the Enso code, but I don't know enough about the big picture to suggest a solution yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I think there's too much need to handle problem propagation explicitly in the Enso code, but I don't know enough about the big picture to suggest a solution yet.

I'm not sure if I understand what you mean here? If anything, how error propagation in Enso code works is completely separate of how we do it in Java helpers.

I think the propagation on the Enso side works really well, although indeed there may be some places where we could want to improve them.

Comment on lines -56 to +94
} else {
} else if (NumericConverter.isDecimalLike(o)){
double value = NumericConverter.coerceToDouble(o);
data[currentSize++] = Double.doubleToRawLongBits(value);
} else if (NumericConverter.isCoercibleToLong(o)) {
long value = NumericConverter.coerceToLong(o);
double converted = convertIntegerToDouble(value);
data[currentSize++] = Double.doubleToRawLongBits(converted);
} else {
throw new IllegalStateException("Unexpected value type when appending to a DoubleBuilder: " + o.getClass().getCanonicalName() + "." +
" This is a bug in the Table library.");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We check decimal and integer cases separately, to avoid implicit rounding-loss-of-precision and handle the loss of precision explicitly with convertIntegerToDouble.

Comment on lines +134 to +136
public void setPreExistingProblems(AggregatedProblems preExistingProblems) {
this.preExistingProblems = preExistingProblems;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows an ObjectBuilder after retyping to inherit any problems of the earlier builder.

@@ -201,7 +203,7 @@ public boolean isNegated() {
return negated;
}

public Storage<?> iif(Value when_true, Value when_false, StorageType resultStorageType) {
public WithAggregatedProblems<Storage<?>> iif(Value when_true, Value when_false, StorageType resultStorageType) {
Copy link
Member Author

@radeusgd radeusgd Aug 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the pattern that should ideally be used in most places (IMO).

@radeusgd radeusgd marked this pull request as ready for review August 7, 2023 17:14
Indicates that an automatic conversion of an integer column to a decimal
column is losing precision because some of the large integers cannot be
exactly represented by the `double` type.
Warning (affected_rows_count : Integer) (example_value : Integer) (example_value_converted : Decimal)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this have a row number as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I would have to modify the builder a bit to be able to know the row count. Currently there is not enough context.

But do you think it's useful? Tbh I'm not sure if the row_number that we have in a few of these errors is useful at all - hence I didn't think of adding it.

IMO the example_value serves this purpose enough - the user can see one of the values that are concerned, if the row number is needed they can find it by value. Row count is useful for sure to know if this affects like one row or like 50% of the rows.

I propose to see in practice - if we start encountering these and similar warnings in practice, let's see what we do with them in practical scenarios (e.g. bookclubs, other projects) and if such row number is helpful. My current gut feeling is that example values are more useful, because they illustrate e.g. that the number in question is very large. Or e.g. for date parsing failures, ideally our error should show an example value that failed to parse - this would allow the user to compare what it is with the format they specified and see what needs to be amended in the format to make it work. Row number does not help much here. (Ofc. it is more general as from row numbers we can figure out also the values - but that requires additional actions and IMO its better to immediately show examples).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, IMO what would be more useful here is column_name to know with which column this warning is associated, in a multi-column table.

However, it was not trivial how to get this information from the context so I abandoned this. I'm thinking it may be worth adding it as part of the next ticket #7514. But let me know if you think that I should add it now. I imagine it should not be very problematic to do so.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed; the value itself is enough to find it in the data; the row number would be nice but it's not worth a big effort.

Comment on lines +103 to +107

/** @return any problems that occurred when building the Storage. */
public AggregatedProblems getProblems() {
return AggregatedProblems.of();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I think there's too much need to handle problem propagation explicitly in the Enso code, but I don't know enough about the big picture to suggest a solution yet.

@radeusgd radeusgd added CI: Ready to merge This PR is eligible for automatic merge and removed CI: Ready to merge This PR is eligible for automatic merge labels Aug 7, 2023
@radeusgd radeusgd added the CI: Ready to merge This PR is eligible for automatic merge label Aug 7, 2023
we can either keep the precision of large integers and warn about unexpected rounding OR keep the +- sign of 0 (only possible in floats); obviously we cannot do both in a single value
IMO keeping integers is more important here as there is real data loss; + vs - 0 does not change the value - moreover the issue only happens for `+0` and `-0` that have no decimal point, it is all preserved well for `+0.0` and `-0.0`.
@radeusgd radeusgd removed the CI: Ready to merge This PR is eligible for automatic merge label Aug 8, 2023
…parse `-0` as `-0.0` in decimal mode - remembering the zero-sign
@radeusgd radeusgd added the CI: Ready to merge This PR is eligible for automatic merge label Aug 8, 2023
@mergify mergify bot merged commit b656b33 into develop Aug 8, 2023
23 of 24 checks passed
@mergify mergify bot deleted the wip/radeusgd/7353-long-to-double-loss-of-precision branch August 8, 2023 12:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: No changelog needed Do not require a changelog entry for this PR. CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unreported loss of precision when mixing big long values with floating-point values in a Column
2 participants