Add details section for dcg ranking metric #31177

cbuescher · 2018-06-07T14:58:20Z

While the other two ranking evaluation metrics (precicion and reciprocal rank)
already provide a more detailed output for how their score is calculated, the
discounted cumulative gain metric (dcg) and its normalized variant are lacking
this until now. Its not totally clear which level of detail might be useful for
debugging and understanding the final metric calculation, but this change adds a
metric_details section to REST output that contains some information about the
evaluation details (like number of unlabeled docs, normalization factor etc).

While the other two ranking evaluation metrics (precicion and reciprocal rank) already provide a more detailed output for how their score is calculated, the discounted cumulative gain metric (dcg) and its normalized variant are lacking this until now. Its not really clear which level of detail might be useful for debugging and understanding the final metric calculation, but this change adds a `metric_details` section to REST output that contains some information about the evaluation details.

elasticmachine · 2018-06-07T14:58:21Z

Pinging @elastic/es-search-aggs

cbuescher · 2018-06-07T15:05:30Z

modules/rank-eval/src/main/java/org/elasticsearch/index/rankeval/DiscountedCumulativeGain.java

+        }
+
+        @Override
+        public XContentBuilder innerToXContent(XContentBuilder builder, Params params) throws IOException {


For normalized DCG, the REST output this adds to the response for each rated request is e.g.:

{ "dcg": 0.55, "unrated_docs": 3 }

for non-normalized dcg and something like this for the normalized variant:

{ "dcg": 0.69, "ideal_dcg": 0.60, "normalized_dcg": 1.14, "unrated_docs": 2 }

While this isn't super helpful for plain dcg (the metric value is already reported elsewhere, but the number of unrated documents might be interesting to users or for display in a UI), the IDCG and normalization is somewhat interesting I believe.

Thanks @cbuescher, the details are useful. I am just wondering if we should report duplicate details (dcg in non-normalized version, and normalized_dcg in normalized version), since they are already reported. But I will leave the decision to you as the main architect of ranking evalution.

I am just wondering if we should report duplicate details (dcg in non-normalized version, and normalized_dcg in normalized version)

I was wondering the same actually, but then things like parsing the metric details on the client side suddenly gets much more complext, because in order to re-create the details object we would have to somehow detect which variant we currently parse, and if e.g. the "dcg" value was left out here, we'd have to reach out to the metrics score field which is parsed on another level, so that gets kind of ugly. I also don't like the redundancy very much but this way the objects stays kind of self contained. That said I'll give it another thought...

cbuescher · 2018-06-07T15:06:36Z

For reference: https://en.wikipedia.org/wiki/Discounted_cumulative_gain explains some of the calculations that appear in this metric.

mayya-sharipova · 2018-06-08T18:17:23Z

modules/rank-eval/src/main/java/org/elasticsearch/index/rankeval/DiscountedCumulativeGain.java

+        }
+
+        @Override
+        public XContentBuilder innerToXContent(XContentBuilder builder, Params params) throws IOException {


Thanks @cbuescher, the details are useful. I am just wondering if we should report duplicate details (dcg in non-normalized version, and normalized_dcg in normalized version), since they are already reported. But I will leave the decision to you as the main architect of ranking evalution.

While the other two ranking evaluation metrics (precicion and reciprocal rank) already provide a more detailed output for how their score is calculated, the discounted cumulative gain metric (dcg) and its normalized variant are lacking this until now. Its not really clear which level of detail might be useful for debugging and understanding the final metric calculation, but this change adds a `metric_details` section to REST output that contains some information about the evaluation details.

* master: Upgrade to Lucene-7.4.0-snapshot-518d303506 (#31360) Rankeval: Fold template test project into main module (#31203) Add QA project and fixture based test for discovery-ec2 plugin (#31107) [Docs] Remove reference to repository-s3 plugin creating an S3 bucket (#31359) REST Client: NodeSelector for node attributes (#31296) LLClient: Fix assertion on windows Add details section for dcg ranking metric (#31177) [ML] Re-enable tests muted in #30982

* 6.x: Upgrade to Lucene-7.4.0-snapshot-518d303506 (#31360) [ML] Implement new rules design (#31110) (#31294) Remove RestGetAllAliasesAction (#31308) CCS: don't proxy requests for already connected node (#31273) Rankeval: Fold template test project into main module (#31203) [Docs] Remove reference to repository-s3 plugin creating an S3 bucket (#31359) More detailed tracing when writing metadata (#31319) Add details section for dcg ranking metric (#31177)

cbuescher added >enhancement :Search Relevance/Ranking Scoring, rescoring, rank evaluation. v7.0.0 v6.4.0 labels Jun 7, 2018

cbuescher commented Jun 7, 2018

View reviewed changes

iter

34bc510

mayya-sharipova self-requested a review June 8, 2018 18:08

mayya-sharipova approved these changes Jun 8, 2018

View reviewed changes

Christoph Büscher added 2 commits June 13, 2018 22:24

iter

b5c2d95

Merge branch 'master' into add-DCG-Details

abdbd14

cbuescher merged commit a0d6c19 into elastic:master Jun 15, 2018

Mpdreamz mentioned this pull request Sep 25, 2018

[meta] 6.4.0 release elastic/elasticsearch-net#3397

Closed

89 tasks

Mpdreamz mentioned this pull request Oct 22, 2018

[meta] 6.5.0 Release elastic/elasticsearch-net#3457

Closed

codebrain mentioned this pull request Jan 28, 2019

[meta] 6.6.0 Release elastic/elasticsearch-net#3552

Closed

48 tasks

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add details section for dcg ranking metric #31177

Add details section for dcg ranking metric #31177

cbuescher commented Jun 7, 2018 •

edited

Loading

elasticmachine commented Jun 7, 2018

cbuescher Jun 7, 2018

mayya-sharipova Jun 8, 2018

cbuescher Jun 8, 2018 •

edited

Loading

cbuescher commented Jun 7, 2018

mayya-sharipova Jun 8, 2018

Add details section for dcg ranking metric #31177

Add details section for dcg ranking metric #31177

Conversation

cbuescher commented Jun 7, 2018 • edited Loading

elasticmachine commented Jun 7, 2018

cbuescher Jun 7, 2018

Choose a reason for hiding this comment

mayya-sharipova Jun 8, 2018

Choose a reason for hiding this comment

cbuescher Jun 8, 2018 • edited Loading

Choose a reason for hiding this comment

cbuescher commented Jun 7, 2018

mayya-sharipova Jun 8, 2018

Choose a reason for hiding this comment

cbuescher commented Jun 7, 2018 •

edited

Loading

cbuescher Jun 8, 2018 •

edited

Loading