Report `Loss_Of_Integer_Precision` when an integer is not exactly representable as a float during conversion #7509

radeusgd · 2023-08-07T13:15:50Z

Pull Request Description

I introduce a new type WithAggregatedProblems, because WithProblems was too simple - it only allowed to hold a List<Problem> but AggregatedProblems is more than that. Ideally we shouldn't multiply entities like this too much. We should probably unify all to use WithAggregatedProblems - but after starting this, I realised it will likely just take too much effort to do for this little PR. So instead, I created a follow-up task for this: #7514

Important Notes

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

The documentation has been updated, if necessary.
Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
All code follows the
Scala,
Java,
and
Rust
style guides. In case you are using a language not listed above, follow the Rust style guide.
All code has been tested:
- Unit tests have been written where possible.
- If GUI codebase was changed, the GUI was tested when built using ./run ide build.

…ation yet)

This reverts commit 1905cc1.

radeusgd · 2023-08-07T17:06:38Z

I've been switching over some instances of WithProblems to WithAggregatedProblems. I think overall we want to unify all of them, but that is out of scope of this PR - to be done as #7514

radeusgd · 2023-08-07T17:10:21Z

std-bits/table/src/main/java/org/enso/table/data/column/builder/Builder.java

+
+  /** @return any problems that occurred when building the Storage. */
+  public AggregatedProblems getProblems() {
+    return AggregatedProblems.of();
+  }


I don't think this is a great pattern, in fact I think it is bad.

The proper solution would be to modify seal() to return a WithAggregatedProblems<Storage<?>> - that would ensure that the problems are handled throughout the codebase.

But we use seal() in really many places and I realised ensuring all of this is nicely handled will just make this PR far too large - thus I decided this will be better done as a separate task - #7514

In general I think there's too much need to handle problem propagation explicitly in the Enso code, but I don't know enough about the big picture to suggest a solution yet.

In general I think there's too much need to handle problem propagation explicitly in the Enso code, but I don't know enough about the big picture to suggest a solution yet.

I'm not sure if I understand what you mean here? If anything, how error propagation in Enso code works is completely separate of how we do it in Java helpers.

I think the propagation on the Enso side works really well, although indeed there may be some places where we could want to improve them.

radeusgd · 2023-08-07T17:11:00Z

std-bits/table/src/main/java/org/enso/table/data/column/builder/DoubleBuilder.java

-    } else {
+    } else if (NumericConverter.isDecimalLike(o)){
      double value = NumericConverter.coerceToDouble(o);
      data[currentSize++] = Double.doubleToRawLongBits(value);
+    } else if (NumericConverter.isCoercibleToLong(o)) {
+      long value = NumericConverter.coerceToLong(o);
+      double converted = convertIntegerToDouble(value);
+      data[currentSize++] = Double.doubleToRawLongBits(converted);
+    } else {
+      throw new IllegalStateException("Unexpected value type when appending to a DoubleBuilder: " + o.getClass().getCanonicalName() + "." +
+          " This is a bug in the Table library.");


We check decimal and integer cases separately, to avoid implicit rounding-loss-of-precision and handle the loss of precision explicitly with convertIntegerToDouble.

radeusgd · 2023-08-07T17:12:53Z

std-bits/table/src/main/java/org/enso/table/data/column/builder/ObjectBuilder.java

+  public void setPreExistingProblems(AggregatedProblems preExistingProblems) {
+    this.preExistingProblems = preExistingProblems;
+  }


This allows an ObjectBuilder after retyping to inherit any problems of the earlier builder.

radeusgd · 2023-08-07T17:13:46Z

std-bits/table/src/main/java/org/enso/table/data/column/storage/BoolStorage.java

@@ -201,7 +203,7 @@ public boolean isNegated() {
    return negated;
  }

-  public Storage<?> iif(Value when_true, Value when_false, StorageType resultStorageType) {
+  public WithAggregatedProblems<Storage<?>> iif(Value when_true, Value when_false, StorageType resultStorageType) {


This is the pattern that should ideally be used in most places (IMO).

GregoryTravis · 2023-08-07T17:16:37Z

distribution/lib/Standard/Table/0.0.0-dev/src/Errors.enso

+       Indicates that an automatic conversion of an integer column to a decimal
+       column is losing precision because some of the large integers cannot be
+       exactly represented by the `double` type.
+    Warning (affected_rows_count : Integer) (example_value : Integer) (example_value_converted : Decimal)


Could this have a row number as well?

Hm, I would have to modify the builder a bit to be able to know the row count. Currently there is not enough context.

But do you think it's useful? Tbh I'm not sure if the row_number that we have in a few of these errors is useful at all - hence I didn't think of adding it.

IMO the example_value serves this purpose enough - the user can see one of the values that are concerned, if the row number is needed they can find it by value. Row count is useful for sure to know if this affects like one row or like 50% of the rows.

I propose to see in practice - if we start encountering these and similar warnings in practice, let's see what we do with them in practical scenarios (e.g. bookclubs, other projects) and if such row number is helpful. My current gut feeling is that example values are more useful, because they illustrate e.g. that the number in question is very large. Or e.g. for date parsing failures, ideally our error should show an example value that failed to parse - this would allow the user to compare what it is with the format they specified and see what needs to be amended in the format to make it work. Row number does not help much here. (Ofc. it is more general as from row numbers we can figure out also the values - but that requires additional actions and IMO its better to immediately show examples).

Actually, IMO what would be more useful here is column_name to know with which column this warning is associated, in a multi-column table.

However, it was not trivial how to get this information from the context so I abandoned this. I'm thinking it may be worth adding it as part of the next ticket #7514. But let me know if you think that I should add it now. I imagine it should not be very problematic to do so.

Agreed; the value itself is enough to find it in the data; the row number would be nice but it's not worth a big effort.

GregoryTravis · 2023-08-07T17:18:34Z

std-bits/table/src/main/java/org/enso/table/data/column/builder/Builder.java

+
+  /** @return any problems that occurred when building the Storage. */
+  public AggregatedProblems getProblems() {
+    return AggregatedProblems.of();
+  }


In general I think there's too much need to handle problem propagation explicitly in the Enso code, but I don't know enough about the big picture to suggest a solution yet.

std-bits/table/src/main/java/org/enso/table/data/column/builder/LossOfIntegerPrecision.java

we can either keep the precision of large integers and warn about unexpected rounding OR keep the +- sign of 0 (only possible in floats); obviously we cannot do both in a single value IMO keeping integers is more important here as there is real data loss; + vs - 0 does not change the value - moreover the issue only happens for `+0` and `-0` that have no decimal point, it is all preserved well for `+0.0` and `-0.0`.

…parse `-0` as `-0.0` in decimal mode - remembering the zero-sign

radeusgd added the CI: No changelog needed Do not require a changelog entry for this PR. label Aug 7, 2023

radeusgd added 16 commits August 7, 2023 18:23

add error

924673d

tests for lossy conversions

528408f

fix

695a0eb

add CSV test

3a3924c

fix

a4b8866

check conversion safety and store any encountered problems (no propag…

c94447b

…ation yet)

refactor class out, add conversion in Enso side

d43ccb9

checkpoint

4b2da7a

Temporarily Revert changes starting WithProblems refactor

6212646

This reverts commit 1905cc1.

fix

fd00ea4

for now reverting tests for column names

50c05b5

propagating warnings in cast

95911e1

propagating warnings in iif

841ca11

propagating warnings in Column.from_vector and friends

6d62f30

remove unused leading-zeros logic

8bbcb85

propagating warnings in DelimitedReader

87f78ec

radeusgd force-pushed the wip/radeusgd/7353-long-to-double-loss-of-precision branch from f8251dc to 87f78ec Compare August 7, 2023 16:24

radeusgd mentioned this pull request Aug 7, 2023

Clean-up problem handling in Table Java helper libraries #7514

Closed

3 tasks

javafmt

37c044e

radeusgd commented Aug 7, 2023

View reviewed changes

doc

6ebda84

radeusgd commented Aug 7, 2023

View reviewed changes

radeusgd marked this pull request as ready for review August 7, 2023 17:14

radeusgd requested review from jdunkerley and GregoryTravis as code owners August 7, 2023 17:14

GregoryTravis approved these changes Aug 7, 2023

View reviewed changes

radeusgd added CI: Ready to merge This PR is eligible for automatic merge and removed CI: Ready to merge This PR is eligible for automatic merge labels Aug 7, 2023

CR

aa5233b

radeusgd added the CI: Ready to merge This PR is eligible for automatic merge label Aug 7, 2023

radeusgd added 4 commits August 7, 2023 20:04

fix a typo

25e2df5

fix problem handling issue

b6d10b3

fix problem aggregation losing some warnings

6559c6d

radeusgd removed the CI: Ready to merge This PR is eligible for automatic merge label Aug 8, 2023

special case to keep large numbers as integers, but still be able to …

dd9c409

…parse `-0` as `-0.0` in decimal mode - remembering the zero-sign

radeusgd added the CI: Ready to merge This PR is eligible for automatic merge label Aug 8, 2023

mergify bot merged commit b656b33 into develop Aug 8, 2023
23 of 24 checks passed

mergify bot deleted the wip/radeusgd/7353-long-to-double-loss-of-precision branch August 8, 2023 12:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Report `Loss_Of_Integer_Precision` when an integer is not exactly representable as a float during conversion #7509

Report `Loss_Of_Integer_Precision` when an integer is not exactly representable as a float during conversion #7509

radeusgd commented Aug 7, 2023 •

edited

Loading

radeusgd commented Aug 7, 2023

radeusgd Aug 7, 2023

GregoryTravis Aug 7, 2023

radeusgd Aug 7, 2023

radeusgd Aug 7, 2023

radeusgd Aug 7, 2023

radeusgd Aug 7, 2023 •

edited

Loading

GregoryTravis Aug 7, 2023

radeusgd Aug 7, 2023

radeusgd Aug 7, 2023

GregoryTravis Aug 7, 2023

GregoryTravis Aug 7, 2023

Report Loss_Of_Integer_Precision when an integer is not exactly representable as a float during conversion #7509

Report Loss_Of_Integer_Precision when an integer is not exactly representable as a float during conversion #7509

Conversation

radeusgd commented Aug 7, 2023 • edited Loading

Pull Request Description

Important Notes

Checklist

radeusgd commented Aug 7, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

radeusgd Aug 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Report `Loss_Of_Integer_Precision` when an integer is not exactly representable as a float during conversion #7509

Report `Loss_Of_Integer_Precision` when an integer is not exactly representable as a float during conversion #7509

radeusgd commented Aug 7, 2023 •

edited

Loading

radeusgd Aug 7, 2023 •

edited

Loading