Percentile/Ranks should return null instead of NaN when empty #30460

polyfractal · 2018-05-08T14:04:14Z

The other metric aggregations (min/max/etc) return null as their XContent value and string when nothing was computed (due to empty/missing fields). Percentiles and Percentile Ranks, however, return NaN which is inconsistent and confusing for the user. This fixes the inconsistency by making the aggs return null. This applies to both the numeric value and the "as string" value.

Note: like the metric aggs, this does not change the value if fetched directly from the percentiles object, which will return as NaN/"NaN". This only changes the XContent output.

I looked through all the other metric aggs and they appear to return null (or 0.0, in the case of cardinality/value_count/sum). So percentiles were the only outliers.

This is sorta a bwc break, but could also be seen as a bugfix. I'm not sure what we want to do with regards to backporting.

Closes #29066

The other metric aggregations (min/max/etc) return `null` as their XContent value and string when nothing was computed (due to empty/missing fields). Percentiles and Percentile Ranks, however, return NaN which is inconsistent and confusing for the user. This fixes the inconsistency by making the aggs return `null`. This applies to both the value and the string getters. Note: like the metric aggs, this does not change the value if fetched directly from the percentiles object it will return as `NaN`/`"NaN"`. This only changes the XContent output.

elasticmachine · 2018-05-08T14:04:16Z

Pinging @elastic/es-search-aggs

cbuescher

@polyfractal I agree with the argument that null as an output for empty percentile aggs is more consistent with the rest of the aggs output. I also don't believe this should be considered a breaking change since both "NaN" and "null" are outputs that signal a missing value.
I left a comment about possible simplifications of the four test cases that I'd like to try out. Also I wonder if we ocasionally test the "empty" case in out xContent-parsing roundtrip tests. We should make sure we are not breaking e.g. the High Level client parsing with this. I don't think we do but can you check that this is covered by our current randomization?

cbuescher · 2018-05-09T15:43:03Z

...sticsearch/search/aggregations/metrics/percentiles/hdr/InternalHDRPercentilesRanksTests.java

@@ -103,4 +113,85 @@ protected InternalHDRPercentileRanks mutateInstance(InternalHDRPercentileRanks i
        }
        return new InternalHDRPercentileRanks(name, percents, state, keyed, formatter, pipelineAggregators, metaData);
    }
+
+    public void testEmptyRanksXContent() throws IOException {


I'm just looking how similar the xContent output of InternalHDRPercentilesRanksTests and InternalTDigestPercentilesRanksTest, maybe these two test could be pushed up one level to InternalPercentilesRanksTestCase by calling the sub-tests createTestInstance() method with the appropriate values? I haven't really checked if the outputs are exactly the same, maybe I'm missing something, but it would be great to reduce the number of rather identical test cases.
Maybe pushing all four cases up to AbstractPercentilesTestCase would work as well? Not sure though.

++ this combined nicely into two tests at the InternalPercentile(Ranks)TestCase level. Couldn't move fully to the Abstract class as the API between percentile and ranks is slightly different.

…values_aggs

polyfractal · 2018-05-14T21:30:39Z

Ran into a bit of a snag. Good call on adding the "empty" case to the general xcontent roundtrip tests @cbuescher. Exposed some broken behavior, where the xcontent was serialized with null but deserialized into NaN, causing problems.

The issue is that Percentiles, Ranks, Stats and ExtendedStats all extend NumericMetricAggregator.MultiValue, which defines double metric(String name, long owningBucketOrdinal). While investigating if we can change that return signature to Double, I found this in the Stats agg:

if (valuesSource == null || owningBucketOrd >= counts.size()) {
            switch(InternalStats.Metrics.resolve(name)) {
                case count: return 0;
                case sum: return 0;
                case min: return Double.POSITIVE_INFINITY;
                case max: return Double.NEGATIVE_INFINITY;
                case avg: return Double.NaN;
                default:
                    throw new IllegalArgumentException("Unknown value [" + name + "] in common stats aggregation");
            }
        }

Extended stats is similar, using a mix of NaN and +/- Inf to signal "missing". I feel like we should:

Change the return signature of the method to Double so that we can return null here too
Include Stats/ExtendedStats in the refactoring to use null instead of the mixture of NaN and +/- Inf

Thoughts @cbuescher @colings86 ?

colings86 · 2018-05-15T08:36:19Z

@polyfractal This is tricky because the reason Stats/ExtendedStats outputs those values is to match the outputs of the individual aggs they are combining in those cases. For example, the min aggregation will return Double.POSITIVE_INFINITY in the empty case. This consistency between the individual aggs and the combined Stats/ExtendedStats aggs is important. So with what you propose we would have to change the min/max/etc. aggs too. This is starting to feel like a much bigger breaking change than the initial idea.

I'm not against it but we need to be more careful as the impact of the breaking change increases. If we go down this route then we might need to think about migration and bwc since we don't want to surprise users too much with the break /cc @clintongormley

colings86 · 2018-05-15T08:38:10Z

Additionally, outputting null might have an adverse reaction to the pipeline aggregations which I think will determine that the bucket_path is wrong rather than the value is just missing from the bucket if null is returned. We should make sure we test this

polyfractal · 2018-05-15T15:25:46Z

This is tricky because the reason Stats/ExtendedStats outputs those values is to match the outputs of the individual aggs they are combining in those cases. For example, the min aggregation will return Double.POSITIVE_INFINITY in the empty case.

I went back through everything and think I understand how it works now. It's a bit more weird, as it turns out. Here's the situation for the metrics:

InternalMin emits a null if the value is infinite† when serializing to XContent output
ParsedMin also emits a null like InternalMin when serializing to XContent
When deserializing from XContent, Min converts null values back into Inf
However, both InternalMin and ParsedMin return the actual double value from the getter. So Java TC users, and pipeline aggs, will get our internal placeholder value which is different values from REST

This seems to hold for all metrics, stats, etc. The test failures I was encountering was due to not adjusting the ParsedPercentiles doXContentBody to do the same trick as InternalPercentiles (emitting null if there were no percentiles). So if I follow the same pattern, xcontent serializes into null, but deserializes back into NaN internally.

It then follows the same properties as the other aggs: getters and pipeline aggs show our internal placeholder value (Inf, NaN, etc) while XContent shows null. This still feels messy but it should keeps the breaking changes to a minimum. I'll polish it up and push a commit soon.

†Interestingly, Min doesn't check which infinity is present, which means a legitimate -Inf value would be treated as if it were null.

polyfractal · 2018-05-15T19:43:10Z

Jenkins, run gradle build tests

…values_aggs

polyfractal · 2018-05-16T16:03:39Z

Hmm, there seems to be something consistently broken with the empty xcontent tests, as it keeps failing on CI. But I just can't reproduce locally, even with 100,000 iterations on the same or random seed.

./gradlew :server:test -Dtests.seed=2C59C2697A193E1 -Dtests.class=org.elasticsearch.search.aggregations.metrics.percentiles.tdigest.InternalTDigestPercentilesRanksTests -Dtests.method="testEmptyRanksXContent" -Dtests.security.manager=true -Dtests.locale=bg-BG -Dtests.timezone=America/North_Dakota/New_Salem
15:13:21 FAILURE 0.03s J1 | InternalTDigestPercentilesRanksTests.testEmptyRanksXContent <<< FAILURES!
15:13:21    > Throwable #1: java.lang.AssertionError: 
15:13:21    > Expected: "NaN"
15:13:21    >      but: was "�"
15:13:21    > 	at __randomizedtesting.SeedInfo.seed([2C59C2697A193E1:3C1FA52E15FFD98]:0)
15:13:21    > 	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)

Will keep poking at it.

…values_aggs

polyfractal · 2018-06-06T12:42:43Z

Ok, I've tested this extensively locally and could never reproduce... and the CI is now passing after merging master. So I'm tentatively claiming it was a weird CI issue.

@cbuescher mind taking another look? I think I addressed the issues you raised.

cbuescher

Looks good, but I left a comment regarding the test case. Can you re-check what the differences are any maybe push it up? Maybe I missed why this isn't possible. Also, since this is now a (small) breaking change to the REST output, could you add a note to the 7.0 migration docs?

cbuescher · 2018-06-07T10:42:00Z

.../elasticsearch/search/aggregations/metrics/percentiles/InternalPercentilesRanksTestCase.java

@@ -39,4 +49,52 @@ protected final void assertFromXContent(T aggregation, ParsedAggregation parsedA
        Class<? extends ParsedPercentiles> parsedClass = implementationClass();
        assertTrue(parsedClass != null && parsedClass.isInstance(parsedAggregation));
    }
+
+    public void testEmptyRanksXContent() throws IOException {


You mentioned earlier you cannot push this test up into AbstractPercentilesTestCase because of some subtle difference, but I cannot spot it. Do you remember what it was? Otherwise I'd give it another try to push it up.

It's super tiny: Percentiles uses percent() / percentAsString() while PercentileRanks uses percentile() / percentileAsString().

I could collapse them into a single test and then do an instanceOf or getType() and switch on that if you think it'd be cleaner. Less test code duplication, but a bit more fragile.

Ah, I see it now. What about pulling the test up and just doing the two lines of assertions that are different in their own little helper method that you overwrite differently in both cases? I'm usually also not a fan of doing so much code acrobatics in tests but in this case I think the gain in non-duplicated lines of code would justify it. I don't think its super important though, thanks for pointing out the difference.

++ this cleaned up nicely. Thanks for the suggestion!

polyfractal · 2018-06-07T15:29:35Z

Tests cleaned up and nicely de-duplicated, and added a note to the breaking changes doc

cbuescher

Great, thanks. LGTM now.

polyfractal · 2018-06-07T17:38:20Z

Thanks @cbuescher. The mystery failure is back, so there must be something here that I'm missing. Going to go back over the PR and see if I missed something.

cbuescher · 2018-06-07T20:37:14Z

@polyfractal does it look the same as the one mentioned above? What I find strange it that is doesn't reproduce in that case. That probably needs some investigation too.

polyfractal · 2018-06-07T20:42:29Z

Yeah, it's another one of these unprintable characters. Makes me think there's a serialization issue and something isn't being written/read correctly.

REPRODUCE WITH: ./gradlew :server:test -Dtests.seed=8E1D067ECFCE26E4 -Dtests.class=org.elasticsearch.search.aggregations.metrics.percentiles.hdr.InternalHDRPercentilesRanksTests -Dtests.method="testEmptyRanksXContent" -Dtests.security.manager=true -Dtests.locale=zh-TW -Dtests.timezone=America/Panama
15:39:09 FAILURE 0.02s J2 | InternalHDRPercentilesRanksTests.testEmptyRanksXContent <<< FAILURES!
15:39:09    > Throwable #1: java.lang.AssertionError: 
15:39:09    > Expected: "NaN"
15:39:09    >      but: was "�"
15:39:09    > 	at __randomizedtesting.SeedInfo.seed([8E1D067ECFCE26E4:8F19600AB930489D]:0)
15:39:09    > 	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
15:39:09    > 	at org.elasticsearch.search.aggregations.metrics.percentiles.InternalPercentilesRanksTestCase.assertPercentile(InternalPercentilesRanksTestCase.java:48)
15:39:09    > 	at org.elasticsearch.search.aggregations.metrics.percentiles.AbstractPercentilesTestCase.testEmptyRanksXContent(AbstractPercentilesTestCase.java:111)

polyfractal · 2018-06-12T00:26:49Z

Aha! Figured it out... sorta.

So it appears to be a difference in JDK 8 vs 9/10. My machine was running the tests as JDK10 and passing, but if I drop to 8 it fails. Doing a bit of digging, I found JDK-8202129 (which is mostly unrelated but got me on the right track). Starting in JDK9, Locale data is derived from the Unicode Consortium's Common Locale Data Repository (CLDR), which introduced some changes.

Running the test on JDK10 with -Djava.locale.providers=COMPAT,CLDR makes the test fail, confirming the issue.

Further, I found this in the docs (although strangely it's still in the JDK10 docs too):

NaN is formatted as a string, which typically has a single character \uFFFD. This string is determined by the DecimalFormatSymbols object. This is the only value for which the prefixes and suffixes are not used.

And \uFFFD is indeed �

So that's the issue... formatting a NaN results in � under JDK8, but "NaN" in later versions. Seeing as this happens when getting the formatted value directly from the agg and not when converting to XContent, I think this has always been a lingering issue that we just didn't know about.

I'll work up a fix tomorrow... I think we just need to manually check for NaN and do the formatting ourselves, instead of relying on the formatter. Alternatively, we could have the test format correctly based on the Locale being used, but it seems better to make sure we consistently output "NaN" instead of whatever magic character the Locale decides?

Related, this may be an issue for Infinity too:

Infinity is formatted as a string, which typically has a single character \u221E, with the positive or negative prefixes and suffixes applied. The infinity string is determined by the DecimalFormatSymbols object.

Which is the ∞ character... I wonder if JDK9+ formats that to "Infinity" instead.

jasontedor · 2018-06-12T00:36:43Z

Note that we run with 9-:-Djava.locale.providers=COMPAT in the default jvm.options. I had been thinking last week about the fact that we do not run our tests with the options that we ship with in the default jvm.options file. This feels like a build infrastructure problem that we should address.

polyfractal · 2018-06-13T17:51:27Z

Jenkins, run gradle build tests

…values_aggs

polyfractal · 2018-06-14T14:53:31Z

Alrighty, looks like we're back to a green build. @cbuescher would you mind taking a look at the most recent commit to see if you approve? ❤️

cbuescher · 2018-06-14T15:23:55Z

...in/java/org/elasticsearch/search/aggregations/metrics/InternalNumericMetricsAggregation.java

@@ -79,7 +79,12 @@ protected MultiValue(StreamInput in) throws IOException {
        public abstract double value(String name);

        public String valueAsString(String name) {
-            return format.format(value(name)).toString();
+            // Explicitly check for NaN, since it formats to "�" or "NaN" depending on JDK version


Is this only the case for certain locales? I would bne suprised if some JDKs would return a weird UTF8 character in all cases. To make this comment more readable it would probably also make sense to put in the bad utf8 value as octal or hex codepoint and to clarify under which circumstances this happens.

I'm not actually sure how this behaves across Locales, but I don't think it matters for us. We seem to always initialize the Decimal DocValueFormat with Locale.Root which I believe uses the JRE's default symbol table.

So for JDK8 the root locale will use JRELocaleProviderAdapter to get the symbols, which loads sun.text.resources.FormatData, and you can see the NaN symbol is \uFFFD

For JDK 9+, the root locale will use CLDRLocaleProviderAdapter, which loads sun.text.resources.cldr.FormatData. And in that resource file you can see the NaN symbol is "NaN" (Can't find a link to the code, but you can see it in your IDE).

++ to making the comment more descriptive. I'll try to distill this thread into a sane comment, and probably leave a reference to the comments here in case anyone wants to see more info.

As an aside, I really wonder why Oracle thought � would be a good default representation of "NaN"... :(

Hm, I don't like how this was implemented, looking at it. Going to move it over to the DocValueFormat itself, so that it only applies to the Decimal formatter when looking at doubles... otherwise it'll be checked against all formatters (geo, IP, etc). Harmless I think, but no need.

cbuescher · 2018-06-14T15:26:45Z

@polyfractal thanks, the last commit looks good, however I left a small comment just to clarify the circumstances that make this workaround necessary. Otherwise we might not remember why we are not relying on the simple Double.NaN.toString() in this case (which I think is the intuitive thing ppl would expect). If you could clarify this for future reference, that would be great. Not sure if this requirtes yet another CI run, but maybe it also doesn't matter that much.

cbuescher · 2018-06-15T08:00:23Z

server/src/main/java/org/elasticsearch/search/DocValueFormat.java

+             *
+             * Since the character � isn't very useful, and makes the output change depending on JDK version,
+             * we manually check to see if the value is NaN and return the string directly.
+             */


+1
Great comment, my future self will be glad its here ;-)

* master: Add get stored script and delete stored script to high level REST API - post backport fix Add get stored script and delete stored script to high level REST API (#31355) Core: Combine Action and GenericAction (#31405) Fix reference to XContentBuilder.string() (#31337) Avoid sending duplicate remote failed shard requests (#31313) Fix defaults in GeoShapeFieldMapper output (#31302) RestAPI: Reject forcemerge requests with a body (#30792) Packaging: Remove windows bin files from the tar distribution (#30596) Docs: Use the default distribution to test docs (#31251) [DOCS] Adds testing for security APIs (#31345) Clarify that IP range data can be specified in CIDR notation. (#31374) Use system context for cluster state update tasks (#31241) Percentile/Ranks should return null instead of NaN when empty (#30460) REST high-level client: add validate query API (#31077) Move language analyzers from server to analysis-common module. (#31300) [Test] Fix :example-plugins:rest-handler on Windows Expose lucene's RemoveDuplicatesTokenFilter (#31275) Reload secure settings for plugins (#31383) Remove some cases in FieldTypeLookupTests that are no longer relevant. (#31381) Ensure we don't use a remote profile if cluster name matches (#31331) [TEST] Double write alias fault (#30942) [DOCS] Fix version in SQL JDBC Maven template [DOCS] Improve install and setup section for SQL JDBC SQL: Fix rest endpoint names in node stats (#31371) Support for remote path in reindex api - post backport fix Closes #22913 [ML] Put ML filter API response should contain the filter (#31362) Support for remote path in reindex api (#31290) Add byte array pooling to nio http transport (#31349) Remove trial status info from start trial doc (#31365) [DOCS] Adds links to release notes and highlights add is-write-index flag to aliases (#30942) Add rollover-creation-date setting to rolled over index (#31144) [ML] Hold ML filter items in sorted set (#31338) [Tests] Fix edge case in ScriptedMetricAggregatorTests (#31357)

$polyfractal$

$@polyfractal$ polyfractal added >bug review :Analytics/Aggregations Aggregations labels May 8, 2018

cbuescher requested changes May 9, 2018

View reviewed changes

cbuescher self-assigned this May 9, 2018

$@polyfractal$

Merge remote-tracking branch 'origin/master' into consistent_missing_…

9b3460b

…values_aggs

colings86 added the >breaking label May 15, 2018

$@polyfractal$

Centralize empty test, test xcontent roundtrip with empty occasionally

fc34656

$@polyfractal$

Merge remote-tracking branch 'origin/master' into consistent_missing_…

d6009e6

…values_aggs

$@polyfractal$

Merge remote-tracking branch 'origin/master' into consistent_missing_…

1e6801a

…values_aggs

cbuescher reviewed Jun 7, 2018

View reviewed changes

polyfractal added 2 commits June 7, 2018 15:24

$@polyfractal$

Review cleanup: centralize tests more

cd9bac7

$@polyfractal$

Review cleanup: add note to 7.0 breaking changes

433a654

cbuescher approved these changes Jun 7, 2018

View reviewed changes

$@polyfractal$

Explicitly check for NaN and manually return value

25ff123

$@polyfractal$

Merge remote-tracking branch 'origin/master' into consistent_missing_…

8be43b7

…values_aggs

cbuescher reviewed Jun 14, 2018

View reviewed changes

$@polyfractal$

Move NaN check to Decimal#DocValueFormat, better comments

98d9c9b

cbuescher reviewed Jun 15, 2018

View reviewed changes

$@polyfractal$ polyfractal added v7.0.0 and removed review labels Jun 18, 2018

$@polyfractal$ polyfractal merged commit 1502812 into elastic:master Jun 18, 2018

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Percentile/Ranks should return null instead of NaN when empty #30460

Percentile/Ranks should return null instead of NaN when empty #30460

$@polyfractal$ polyfractal commented May 8, 2018

elasticmachine commented May 8, 2018

cbuescher left a comment

cbuescher May 9, 2018

$@polyfractal$ polyfractal May 14, 2018

polyfractal commented May 14, 2018

colings86 commented May 15, 2018

colings86 commented May 15, 2018

polyfractal commented May 15, 2018

polyfractal commented May 15, 2018

polyfractal commented May 16, 2018

polyfractal commented Jun 6, 2018

cbuescher left a comment

cbuescher Jun 7, 2018

$@polyfractal$ polyfractal Jun 7, 2018

cbuescher Jun 7, 2018

$@polyfractal$ polyfractal Jun 7, 2018

polyfractal commented Jun 7, 2018

cbuescher left a comment

polyfractal commented Jun 7, 2018

cbuescher commented Jun 7, 2018

polyfractal commented Jun 7, 2018

polyfractal commented Jun 12, 2018

jasontedor commented Jun 12, 2018

polyfractal commented Jun 13, 2018

polyfractal commented Jun 14, 2018

cbuescher Jun 14, 2018

$@polyfractal$ polyfractal Jun 14, 2018

$@polyfractal$ polyfractal Jun 14, 2018

$@polyfractal$ polyfractal Jun 14, 2018

cbuescher commented Jun 14, 2018 •

edited

Loading

cbuescher Jun 15, 2018

$@polyfractal$ polyfractal Jun 15, 2018

Percentile/Ranks should return null instead of NaN when empty #30460

Percentile/Ranks should return null instead of NaN when empty #30460

Conversation

polyfractal commented May 8, 2018

elasticmachine commented May 8, 2018

cbuescher left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

polyfractal commented May 14, 2018

colings86 commented May 15, 2018

colings86 commented May 15, 2018

polyfractal commented May 15, 2018

polyfractal commented May 15, 2018

polyfractal commented May 16, 2018

polyfractal commented Jun 6, 2018

cbuescher left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

polyfractal commented Jun 7, 2018

cbuescher left a comment

Choose a reason for hiding this comment

polyfractal commented Jun 7, 2018

cbuescher commented Jun 7, 2018

polyfractal commented Jun 7, 2018

polyfractal commented Jun 12, 2018

jasontedor commented Jun 12, 2018

polyfractal commented Jun 13, 2018

polyfractal commented Jun 14, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cbuescher commented Jun 14, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

$@polyfractal$ polyfractal commented May 8, 2018

cbuescher commented Jun 14, 2018 •

edited

Loading