-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Report Loss_Of_Integer_Precision
when an integer is not exactly representable as a float during conversion
#7509
Report Loss_Of_Integer_Precision
when an integer is not exactly representable as a float during conversion
#7509
Conversation
This reverts commit 1905cc1.
f8251dc
to
87f78ec
Compare
I've been switching over some instances of |
|
||
/** @return any problems that occurred when building the Storage. */ | ||
public AggregatedProblems getProblems() { | ||
return AggregatedProblems.of(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a great pattern, in fact I think it is bad.
The proper solution would be to modify seal()
to return a WithAggregatedProblems<Storage<?>>
- that would ensure that the problems are handled throughout the codebase.
But we use seal()
in really many places and I realised ensuring all of this is nicely handled will just make this PR far too large - thus I decided this will be better done as a separate task - #7514
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I think there's too much need to handle problem propagation explicitly in the Enso code, but I don't know enough about the big picture to suggest a solution yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I think there's too much need to handle problem propagation explicitly in the Enso code, but I don't know enough about the big picture to suggest a solution yet.
I'm not sure if I understand what you mean here? If anything, how error propagation in Enso code works is completely separate of how we do it in Java helpers.
I think the propagation on the Enso side works really well, although indeed there may be some places where we could want to improve them.
} else { | ||
} else if (NumericConverter.isDecimalLike(o)){ | ||
double value = NumericConverter.coerceToDouble(o); | ||
data[currentSize++] = Double.doubleToRawLongBits(value); | ||
} else if (NumericConverter.isCoercibleToLong(o)) { | ||
long value = NumericConverter.coerceToLong(o); | ||
double converted = convertIntegerToDouble(value); | ||
data[currentSize++] = Double.doubleToRawLongBits(converted); | ||
} else { | ||
throw new IllegalStateException("Unexpected value type when appending to a DoubleBuilder: " + o.getClass().getCanonicalName() + "." + | ||
" This is a bug in the Table library."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We check decimal and integer cases separately, to avoid implicit rounding-loss-of-precision and handle the loss of precision explicitly with convertIntegerToDouble
.
public void setPreExistingProblems(AggregatedProblems preExistingProblems) { | ||
this.preExistingProblems = preExistingProblems; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This allows an ObjectBuilder after retyping to inherit any problems of the earlier builder.
@@ -201,7 +203,7 @@ public boolean isNegated() { | |||
return negated; | |||
} | |||
|
|||
public Storage<?> iif(Value when_true, Value when_false, StorageType resultStorageType) { | |||
public WithAggregatedProblems<Storage<?>> iif(Value when_true, Value when_false, StorageType resultStorageType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the pattern that should ideally be used in most places (IMO).
Indicates that an automatic conversion of an integer column to a decimal | ||
column is losing precision because some of the large integers cannot be | ||
exactly represented by the `double` type. | ||
Warning (affected_rows_count : Integer) (example_value : Integer) (example_value_converted : Decimal) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this have a row number as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I would have to modify the builder a bit to be able to know the row count. Currently there is not enough context.
But do you think it's useful? Tbh I'm not sure if the row_number
that we have in a few of these errors is useful at all - hence I didn't think of adding it.
IMO the example_value
serves this purpose enough - the user can see one of the values that are concerned, if the row number is needed they can find it by value. Row count is useful for sure to know if this affects like one row or like 50% of the rows.
I propose to see in practice - if we start encountering these and similar warnings in practice, let's see what we do with them in practical scenarios (e.g. bookclubs, other projects) and if such row number is helpful. My current gut feeling is that example values are more useful, because they illustrate e.g. that the number in question is very large. Or e.g. for date parsing failures, ideally our error should show an example value that failed to parse - this would allow the user to compare what it is with the format they specified and see what needs to be amended in the format to make it work. Row number does not help much here. (Ofc. it is more general as from row numbers we can figure out also the values - but that requires additional actions and IMO its better to immediately show examples).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, IMO what would be more useful here is column_name
to know with which column this warning is associated, in a multi-column table.
However, it was not trivial how to get this information from the context so I abandoned this. I'm thinking it may be worth adding it as part of the next ticket #7514. But let me know if you think that I should add it now. I imagine it should not be very problematic to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed; the value itself is enough to find it in the data; the row number would be nice but it's not worth a big effort.
|
||
/** @return any problems that occurred when building the Storage. */ | ||
public AggregatedProblems getProblems() { | ||
return AggregatedProblems.of(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I think there's too much need to handle problem propagation explicitly in the Enso code, but I don't know enough about the big picture to suggest a solution yet.
std-bits/table/src/main/java/org/enso/table/data/column/builder/LossOfIntegerPrecision.java
Outdated
Show resolved
Hide resolved
we can either keep the precision of large integers and warn about unexpected rounding OR keep the +- sign of 0 (only possible in floats); obviously we cannot do both in a single value IMO keeping integers is more important here as there is real data loss; + vs - 0 does not change the value - moreover the issue only happens for `+0` and `-0` that have no decimal point, it is all preserved well for `+0.0` and `-0.0`.
…parse `-0` as `-0.0` in decimal mode - remembering the zero-sign
Pull Request Description
Closes #7353
I introduce a new type
WithAggregatedProblems
, becauseWithProblems
was too simple - it only allowed to hold aList<Problem>
butAggregatedProblems
is more than that. Ideally we shouldn't multiply entities like this too much. We should probably unify all to useWithAggregatedProblems
- but after starting this, I realised it will likely just take too much effort to do for this little PR. So instead, I created a follow-up task for this: #7514Important Notes
Checklist
Please ensure that the following checklist has been satisfied before submitting the PR:
Scala,
Java,
and
Rust
style guides. In case you are using a language not listed above, follow the Rust style guide.
./run ide build
.