Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scale doubles to floats when necessary to match the field #78344

Merged

Conversation

not-napoleon
Copy link
Member

This fixes a bug where the range aggregation always treats the range end points as doubles, even if the field value doesn't have enough resolution to fill a double. This was creating issues where the range would have a "more precise" approximation of an unrepresentable number than the field, causing the value to fall in the wrong bucket.

Note 1: This does not resolve the case where we have a long value that is not precisely representable as a double. Since the wire format sends the range bounds as doubles, by the time we get to where this fix is operating, we've already lost the precision to act on a big long. Fixing that problem will require a wire format change, and I'm not convinced it's worth it right now.

Note 2: This is probably still broken for ScaledFloats, since they don't implement NumberFieldType.

Resolves #77033

@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Sep 27, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@not-napoleon not-napoleon changed the title 77033 range float rounding bug Scale doubles to floats when necessary to match the field Sep 27, 2021
# For testing missing values
- {"index": {}}
- {}

---
# Regression test for 77033
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I see this comment in 9 months I'm super unlikely to look up the issue- I'll either git-blame the code or puzzle it out from the title of the test or body of it. I don't think the link adds anything here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, okay. Do you also think it's not worth having a similar comment in the unit tests I added?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't. Honestly if you like the comments you can keep 'em. But I tend to skip those sorts of comments when reading.

* If this is a numeric field backed values source type, return the type of the numeric field backing it.
* Otherwise return null.
*/
public NumberFieldMapper.NumberType getNumericType() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be getNumberType instead? I'm always confused by the difference between NumericType and NumberType.

Or, should this be double round(double)? I think, maybe, NumberType is sort of a mapping concept. Or, at least, a NumberFieldMapper concept. Also, that way maybe it'd be possible to apply the fix to scaled_float without forcing it to implement some of the internals of NumberFieldMapper.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it should be getNumberType, if we keep it. You raise good points about why we might not want to keep it though. NumberType is a mapping concept, and I don't feel great exposing it here. Especially since we're weirdly abusing its parse function to make this work. But I do think we need to delegate this to the Field Type somehow, and this was the most likely candidate I saw for that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We sort of already have a method for this - rangeQuery and termQuery. They do the rounding we want. They just return queries we can't do anything with. I'd be ok with a new method on MappedFieldType that could round numeric bounds.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me that rangeQuery and termQuery just call parse, same as I'm doing here. I'm hesitant to add this to mapped field type, because it is a fundamentally number-oriented concept. So you'd have one more method that throws UnsupportedOperationException all over the place, except for a few special cases. I hate throwing UnsupportedOperationException; I feel like it breaks the whole idea of polymorphism. In this case especially, it feels like we're going through a lot of effort to support keeping ScaledFloatFieldType from having to extend NumberFieldType, which maybe feels like not a great goal?

Having said that, I do think exposing the NumberType is heavy handed. How would you feel about a method on NumberFieldType called double coercePrecision(Object) or something like that? It'd just call parse, but the meaning is a little clearer, I think (or at least it gives me a spot to drop javadoc and make the meaning clearer). It doesn't solve the scaled float case, but I'm not sure how big a deal that is?

Anyway, if you really feel like this belongs on MappedFieldType, I can do that too. WDYT?

@not-napoleon
Copy link
Member Author

@elasticmachine update branch

@not-napoleon
Copy link
Member Author

@elasticmachine update branch

@not-napoleon
Copy link
Member Author

@elasticmachine run elasticsearch-ci/1

@not-napoleon
Copy link
Member Author

@elasticmachine run elasticsearch-ci/part-1

* on the double in the case that the backing number type would have parsed the value differently. This is to address
* the problem where (e.g.) 0.04F < 0.04D, which causes problems for range aggregations.
*/
public double coerceToDouble(Double value) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe reduceToStoredPrecision or something? That gives us a hint about why it's important.

Also, it's kind of nice to explain the why in the first sentence of the javadoc because it's what I get when I mouse over. I get the sentiment that it's weird though.

Does it make sense for this to take a double? I know parse wants an Object when I see Double in the signature I tend to think it explicitly handles null somehow but this really doesn't.

Double from = range.from;
Double to = range.to;
// Trying to parse infinite values into ints/longs throws. Understandably.
Double from = Double.isFinite(range.from) ? fixPrecision.applyAsDouble(range.from) : range.from;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be clearer to move isFinite "up"? Like into the field mapper or into the config or something? Or just into the ternary above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the field mapper makes the most sense, looking at it now. I'll do that.

if (fieldContext() != null && fieldType() instanceof NumberFieldMapper.NumberFieldType) {
return ((NumberFieldMapper.NumberFieldType) fieldType())::coerceToDouble;
}
return null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be cleaner if this returned the identity function. That way you don't have to deal with null on the caller. I don't think you ever need to know its identity?

@@ -543,6 +603,7 @@ private void simpleTestCase(
}, verify, fieldType);
}

/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I need to get into your NOCOMMIT habit.

@not-napoleon
Copy link
Member Author

@elasticmachine update branch

@not-napoleon not-napoleon merged commit b968fcb into elastic:master Oct 11, 2021
@not-napoleon not-napoleon deleted the 77033-range-float-rounding-bug branch October 11, 2021 15:53
not-napoleon added a commit to not-napoleon/elasticsearch that referenced this pull request Oct 11, 2021
)

This fixes a bug where the range aggregation always treats the range end points as doubles, even if the field value doesn't have enough resolution to fill a double. This was creating issues where the range would have a "more precise" approximation of an unrepresentable number than the field, causing the value to fall in the wrong bucket.

Note 1: This does not resolve the case where we have a long value that is not precisely representable as a double. Since the wire format sends the range bounds as doubles, by the time we get to where this fix is operating, we've already lost the precision to act on a big long. Fixing that problem will require a wire format change, and I'm not convinced it's worth it right now.

Note 2: This is probably still broken for ScaledFloats, since they don't implement NumberFieldType.

Resolves elastic#77033
not-napoleon added a commit that referenced this pull request Oct 11, 2021
…) (#78932)

* Scale doubles to floats when necessary to match the field (#78344)

This fixes a bug where the range aggregation always treats the range end points as doubles, even if the field value doesn't have enough resolution to fill a double. This was creating issues where the range would have a "more precise" approximation of an unrepresentable number than the field, causing the value to fall in the wrong bucket.

Note 1: This does not resolve the case where we have a long value that is not precisely representable as a double. Since the wire format sends the range bounds as doubles, by the time we get to where this fix is operating, we've already lost the precision to act on a big long. Fixing that problem will require a wire format change, and I'm not convinced it's worth it right now.

Note 2: This is probably still broken for ScaledFloats, since they don't implement NumberFieldType.

Resolves #77033
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v7.16.0 v8.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Range Aggregation is not working properly for float fields
4 participants