-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Percentile/Ranks should return null instead of NaN when empty #30460
Percentile/Ranks should return null instead of NaN when empty #30460
Conversation
The other metric aggregations (min/max/etc) return `null` as their XContent value and string when nothing was computed (due to empty/missing fields). Percentiles and Percentile Ranks, however, return NaN which is inconsistent and confusing for the user. This fixes the inconsistency by making the aggs return `null`. This applies to both the value and the string getters. Note: like the metric aggs, this does not change the value if fetched directly from the percentiles object it will return as `NaN`/`"NaN"`. This only changes the XContent output.
Pinging @elastic/es-search-aggs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@polyfractal I agree with the argument that null
as an output for empty percentile aggs is more consistent with the rest of the aggs output. I also don't believe this should be considered a breaking change since both "NaN" and "null" are outputs that signal a missing value.
I left a comment about possible simplifications of the four test cases that I'd like to try out. Also I wonder if we ocasionally test the "empty" case in out xContent-parsing roundtrip tests. We should make sure we are not breaking e.g. the High Level client parsing with this. I don't think we do but can you check that this is covered by our current randomization?
@@ -103,4 +113,85 @@ protected InternalHDRPercentileRanks mutateInstance(InternalHDRPercentileRanks i | |||
} | |||
return new InternalHDRPercentileRanks(name, percents, state, keyed, formatter, pipelineAggregators, metaData); | |||
} | |||
|
|||
public void testEmptyRanksXContent() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just looking how similar the xContent output of InternalHDRPercentilesRanksTests and InternalTDigestPercentilesRanksTest, maybe these two test could be pushed up one level to InternalPercentilesRanksTestCase by calling the sub-tests createTestInstance() method with the appropriate values? I haven't really checked if the outputs are exactly the same, maybe I'm missing something, but it would be great to reduce the number of rather identical test cases.
Maybe pushing all four cases up to AbstractPercentilesTestCase would work as well? Not sure though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ this combined nicely into two tests at the InternalPercentile(Ranks)TestCase level. Couldn't move fully to the Abstract class as the API between percentile and ranks is slightly different.
Ran into a bit of a snag. Good call on adding the "empty" case to the general xcontent roundtrip tests @cbuescher. Exposed some broken behavior, where the xcontent was serialized with The issue is that Percentiles, Ranks, Stats and ExtendedStats all extend
Extended stats is similar, using a mix of
Thoughts @cbuescher @colings86 ? |
@polyfractal This is tricky because the reason Stats/ExtendedStats outputs those values is to match the outputs of the individual aggs they are combining in those cases. For example, the I'm not against it but we need to be more careful as the impact of the breaking change increases. If we go down this route then we might need to think about migration and bwc since we don't want to surprise users too much with the break /cc @clintongormley |
Additionally, outputting |
I went back through everything and think I understand how it works now. It's a bit more weird, as it turns out. Here's the situation for the metrics:
This seems to hold for all metrics, stats, etc. The test failures I was encountering was due to not adjusting the ParsedPercentiles It then follows the same properties as the other aggs: getters and pipeline aggs show our internal placeholder value ( †Interestingly, Min doesn't check which infinity is present, which means a legitimate |
Jenkins, run gradle build tests |
Hmm, there seems to be something consistently broken with the empty xcontent tests, as it keeps failing on CI. But I just can't reproduce locally, even with 100,000 iterations on the same or random seed.
Will keep poking at it. |
Ok, I've tested this extensively locally and could never reproduce... and the CI is now passing after merging master. So I'm tentatively claiming it was a weird CI issue. @cbuescher mind taking another look? I think I addressed the issues you raised. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but I left a comment regarding the test case. Can you re-check what the differences are any maybe push it up? Maybe I missed why this isn't possible. Also, since this is now a (small) breaking change to the REST output, could you add a note to the 7.0 migration docs?
@@ -39,4 +49,52 @@ protected final void assertFromXContent(T aggregation, ParsedAggregation parsedA | |||
Class<? extends ParsedPercentiles> parsedClass = implementationClass(); | |||
assertTrue(parsedClass != null && parsedClass.isInstance(parsedAggregation)); | |||
} | |||
|
|||
public void testEmptyRanksXContent() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mentioned earlier you cannot push this test up into AbstractPercentilesTestCase because of some subtle difference, but I cannot spot it. Do you remember what it was? Otherwise I'd give it another try to push it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's super tiny: Percentiles uses percent() / percentAsString()
while PercentileRanks uses percentile() / percentileAsString()
.
I could collapse them into a single test and then do an instanceOf
or getType()
and switch on that if you think it'd be cleaner. Less test code duplication, but a bit more fragile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see it now. What about pulling the test up and just doing the two lines of assertions that are different in their own little helper method that you overwrite differently in both cases? I'm usually also not a fan of doing so much code acrobatics in tests but in this case I think the gain in non-duplicated lines of code would justify it. I don't think its super important though, thanks for pointing out the difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ this cleaned up nicely. Thanks for the suggestion!
Tests cleaned up and nicely de-duplicated, and added a note to the breaking changes doc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks. LGTM now.
Thanks @cbuescher. The mystery failure is back, so there must be something here that I'm missing. Going to go back over the PR and see if I missed something. |
@polyfractal does it look the same as the one mentioned above? What I find strange it that is doesn't reproduce in that case. That probably needs some investigation too. |
Yeah, it's another one of these unprintable characters. Makes me think there's a serialization issue and something isn't being written/read correctly.
|
Aha! Figured it out... sorta. So it appears to be a difference in JDK 8 vs 9/10. My machine was running the tests as JDK10 and passing, but if I drop to 8 it fails. Doing a bit of digging, I found JDK-8202129 (which is mostly unrelated but got me on the right track). Starting in JDK9, Locale data is derived from the Unicode Consortium's Common Locale Data Repository (CLDR), which introduced some changes. Running the test on JDK10 with Further, I found this in the docs (although strangely it's still in the JDK10 docs too):
And So that's the issue... formatting a I'll work up a fix tomorrow... I think we just need to manually check for Related, this may be an issue for Infinity too:
Which is the |
Note that we run with |
Jenkins, run gradle build tests |
Alrighty, looks like we're back to a green build. @cbuescher would you mind taking a look at the most recent commit to see if you approve? ❤️ |
@@ -79,7 +79,12 @@ protected MultiValue(StreamInput in) throws IOException { | |||
public abstract double value(String name); | |||
|
|||
public String valueAsString(String name) { | |||
return format.format(value(name)).toString(); | |||
// Explicitly check for NaN, since it formats to "�" or "NaN" depending on JDK version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this only the case for certain locales? I would bne suprised if some JDKs would return a weird UTF8 character in all cases. To make this comment more readable it would probably also make sense to put in the bad utf8 value as octal or hex codepoint and to clarify under which circumstances this happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not actually sure how this behaves across Locales, but I don't think it matters for us. We seem to always initialize the Decimal DocValueFormat
with Locale.Root
which I believe uses the JRE's default symbol table.
So for JDK8 the root locale will use JRELocaleProviderAdapter
to get the symbols, which loads sun.text.resources.FormatData
, and you can see the NaN
symbol is \uFFFD
For JDK 9+, the root locale will use CLDRLocaleProviderAdapter
, which loads sun.text.resources.cldr.FormatData
. And in that resource file you can see the NaN
symbol is "NaN"
(Can't find a link to the code, but you can see it in your IDE).
++ to making the comment more descriptive. I'll try to distill this thread into a sane comment, and probably leave a reference to the comments here in case anyone wants to see more info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an aside, I really wonder why Oracle thought � would be a good default representation of "NaN"... :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I don't like how this was implemented, looking at it. Going to move it over to the DocValueFormat itself, so that it only applies to the Decimal formatter when looking at doubles... otherwise it'll be checked against all formatters (geo, IP, etc). Harmless I think, but no need.
@polyfractal thanks, the last commit looks good, however I left a small comment just to clarify the circumstances that make this workaround necessary. Otherwise we might not remember why we are not relying on the simple Double.NaN.toString() in this case (which I think is the intuitive thing ppl would expect). If you could clarify this for future reference, that would be great. Not sure if this requirtes yet another CI run, but maybe it also doesn't matter that much. |
* | ||
* Since the character � isn't very useful, and makes the output change depending on JDK version, | ||
* we manually check to see if the value is NaN and return the string directly. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Great comment, my future self will be glad its here ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:) Me too!
* master: Add get stored script and delete stored script to high level REST API - post backport fix Add get stored script and delete stored script to high level REST API (#31355) Core: Combine Action and GenericAction (#31405) Fix reference to XContentBuilder.string() (#31337) Avoid sending duplicate remote failed shard requests (#31313) Fix defaults in GeoShapeFieldMapper output (#31302) RestAPI: Reject forcemerge requests with a body (#30792) Packaging: Remove windows bin files from the tar distribution (#30596) Docs: Use the default distribution to test docs (#31251) [DOCS] Adds testing for security APIs (#31345) Clarify that IP range data can be specified in CIDR notation. (#31374) Use system context for cluster state update tasks (#31241) Percentile/Ranks should return null instead of NaN when empty (#30460) REST high-level client: add validate query API (#31077) Move language analyzers from server to analysis-common module. (#31300) [Test] Fix :example-plugins:rest-handler on Windows Expose lucene's RemoveDuplicatesTokenFilter (#31275) Reload secure settings for plugins (#31383) Remove some cases in FieldTypeLookupTests that are no longer relevant. (#31381) Ensure we don't use a remote profile if cluster name matches (#31331) [TEST] Double write alias fault (#30942) [DOCS] Fix version in SQL JDBC Maven template [DOCS] Improve install and setup section for SQL JDBC SQL: Fix rest endpoint names in node stats (#31371) Support for remote path in reindex api - post backport fix Closes #22913 [ML] Put ML filter API response should contain the filter (#31362) Support for remote path in reindex api (#31290) Add byte array pooling to nio http transport (#31349) Remove trial status info from start trial doc (#31365) [DOCS] Adds links to release notes and highlights add is-write-index flag to aliases (#30942) Add rollover-creation-date setting to rolled over index (#31144) [ML] Hold ML filter items in sorted set (#31338) [Tests] Fix edge case in ScriptedMetricAggregatorTests (#31357)
The other metric aggregations (min/max/etc) return
null
as their XContent value and string when nothing was computed (due to empty/missing fields). Percentiles and Percentile Ranks, however, returnNaN
which is inconsistent and confusing for the user. This fixes the inconsistency by making the aggs returnnull
. This applies to both the numeric value and the "as string" value.Note: like the metric aggs, this does not change the value if fetched directly from the percentiles object, which will return as
NaN
/"NaN"
. This only changes the XContent output.I looked through all the other metric aggs and they appear to return null (or 0.0, in the case of cardinality/value_count/sum). So percentiles were the only outliers.
This is sorta a bwc break, but could also be seen as a bugfix. I'm not sure what we want to do with regards to backporting.
Closes #29066