-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Expected Reciprocal Rank metric #31891
Conversation
Pinging @elastic/es-search-aggs |
This change adds Expected Reciprocal Rank (ERR) as a ranking evaluation metric as descriped in: Chapelle, O., Metlzer, D., Zhang, Y., & Grinspan, P. (2009). Expected reciprocal rank for graded relevance. Proceeding of the 18th ACM Conference on Information and Knowledge Management. https://doi.org/10.1145/1645953.1646033 ERR is an extension of the classical reciprocal rank to the graded relevance case and assumes a cascade browsing model. It quantifies the usefulness of a document at rank `i` conditioned on the degree of relevance of the items at ranks less than `i`. ERR seems to be gain traction as an alternative to (n)DCG, so it seems like a good metric to support. Also ERR seems to be the default optimization metric used for training in RankLib, a widely used learning to rank library. Relates to elastic#29653
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cbuescher! Nice implementation and tests. Just very small editing comments.
I was wondering if we also want to add documentation on this metric to rank-eval.asciidoc
?
import static org.elasticsearch.index.rankeval.EvaluationMetric.joinHitsWithRatings; | ||
|
||
/** | ||
* Implemention of the Expected Reciprocal Rank metric described in:<p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
implementation?
|
||
private final double two_pow_maxRelevance; | ||
|
||
public static final String NAME = "err"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"err" doesn't sound like "error"? Have you though of using a diff name here and in code? It is fine to keep this name as well if you think so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see the problem. On the other hand, the metric is abreviated like this in the paper itself and in most places citing it. I'm on the fence here. So far we have:
- "precision" -> precision at K
- "dcg" -> discounted cumulative gain
- "mean_reciprocal_rank" -> well... mean reciprocal rank
I kind of like the shorter names, they are only ids in the end. But I can get this might be a source of confusion. I would go with the full "expected_reciprocal_rank" then, a bit long but one is not supposed to type it all the time. Will give it some additional thought though before changing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decided to go with "expected_reciprocal_rank" now, its long but only four chars more than "mean_reciprocal_rank", so what the heck...
for (int i = 0; i < relevanceRatings.length; i++) { | ||
rated.add(new RatedDocument("index", Integer.toString(i), relevanceRatings[i])); | ||
hits[i] = new SearchHit(i, Integer.toString(i), new Text("type"), Collections.emptyMap()); | ||
hits[i].shard(new SearchShardTarget("testnode", new Index("index", "uuid"), 0, null)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if SearchShardTarget
is the same for all hits, would it be reasonable to create it outside of for
loop? here and in the following test?
This change adds Expected Reciprocal Rank (ERR) as a ranking evaluation metric as descriped in: Chapelle, O., Metlzer, D., Zhang, Y., & Grinspan, P. (2009). Expected reciprocal rank for graded relevance. Proceeding of the 18th ACM Conference on Information and Knowledge Management. https://doi.org/10.1145/1645953.1646033 ERR is an extension of the classical reciprocal rank to the graded relevance case and assumes a cascade browsing model. It quantifies the usefulness of a document at rank `i` conditioned on the degree of relevance of the items at ranks less than `i`. ERR seems to be gain traction as an alternative to (n)DCG, so it seems like a good metric to support. Also ERR seems to be the default optimization metric used for training in RankLib, a widely used learning to rank library. Relates to #29653
* master: [TEST] Mute SlackMessageTests.testTemplateRender Docs: Explain closing the high level client [ML] Re-enable memory limit integration tests (#31328) [test] disable packaging tests for suse boxes Add nio transport to security plugin (#31942) XContentTests : Insert random fields at random positions (#30867) Force execution of fetch tasks (#31974) Fix unreachable error condition in AmazonS3Fixture (#32005) Tests: Fix SearchFieldsIT.testDocValueFields (#31995) Add Expected Reciprocal Rank metric (#31891) [ML] Get ForecastRequestStats doc in RestoreModelSnapshotIT (#31973) SQL: Add support for single parameter text manipulating functions (#31874) [ML] Ensure immutability of MlMetadata (#31957) Tests: Mute SearchFieldsIT.testDocValueFields() muted tests due to #31940 Work around reported problem in eclipse (#31960) Move build integration tests out of :buildSrc project (#31961) Tests: Remove use of joda time in some tests (#31922) [Test] Reactive 3rd party tests on CI (#31919) SQL: Support for escape sequences (#31884) SQL: HAVING clause should accept only aggregates (#31872) Docs: fix typo in datehistogram (#31972) Switch url repository rest tests to new style requests (#31944) Switch reindex tests to new style requests (#31941) Docs: Added note about cloud service to installation and getting started [DOCS] Removes alternative docker pull example (#31934) Add Snapshots Status API to High Level Rest Client (#31515) ingest: date_index_name processor template resolution (#31841) Test: fix null failure in watcher test (#31968) Switch test framework to new style requests (#31939) Switch low level rest tests to new style Requests (#31938) Switch high level rest tests to new style requests (#31937) [ML] Mute test failing due to Java 11 date time format parsing bug (#31899) [TEST] Mute SlackMessageTests.testTemplateRender Fix assertIngestDocument wrongfully passing (#31913) Remove unused reference to filePermissionsCache (#31923) rolling upgrade should use a replica to prevent relocations while running a scroll HLREST: Bundle the x-pack protocol project (#31904) Increase logging level for testStressMaybeFlush Added lenient flag for synonym token filter (#31484) [X-Pack] Beats centralized management: security role + licensing (#30520) HLRest: Move xPackInfo() to xPack().info() (#31905) Docs: add security delete role to api call table (#31907) [test] port archive distribution packaging tests (#31314) Watcher: Slack message empty text (#31596) [ML] Mute failing DetectionRulesIT.testCondition() test Fix broken NaN check in MovingFunctions#stdDev() (#31888) Date: Add DateFormatters class that uses java.time (#31856) [ML] Switch native QA tests to a 3 node cluster (#31757) Change trappy float comparison (#31889) Fix building AD URL from domain name (#31849) Add opaque_id to audit logging (#31878) re-enable backcompat tests add support for is_write_index in put-alias body parsing (#31674) Improve release notes script (#31833) [DOCS] Fix broken link in painless example Handle missing values in painless (#30975) Remove the ability to index or query context suggestions without context (#31007) Ingest: Enable Templated Fieldnames in Rename (#31690) [Docs] Fix typo in the Rollup API Quick Reference (#31855) Ingest: Add ignore_missing option to RemoveProc (#31693) Add template config for Beat state to X-Pack Monitoring (#31809) Watcher: Add ssl.trust email account setting (#31684) Remove link to oss-MSI (#31844) Painless: Restructure Definition/Whitelist (#31879) HLREST: Add x-pack-info API (#31870)
* 6.x: Force execution of fetch tasks (#31974) [TEST] Mute SlackMessageTests.testTemplateRender Docs: Explain closing the high level client [test] disable java packaging tests for suse XContentTests : Insert random fields at random positions (#30867) Add Get Snapshots High Level REST API (#31980) Fix unreachable error condition in AmazonS3Fixture (#32005) [6.x][ML] Ensure immutability of MlMetadata (#31994) Add Expected Reciprocal Rank metric (#31891) SQL: Add support for single parameter text manipulating functions (#31874) muted tests due to #31940 Work around reported problem in eclipse (#31960) Move build integration tests out of :buildSrc project (#31961) [Test] Reactive 3rd party tests on CI (#31919) Fix assertIngestDocument wrongfully passing (#31913) (#31951) SQL: Support for escape sequences (#31884) SQL: HAVING clause should accept only aggregates (#31872) Docs: fix typo in datehistogram (#31972) Switch url repository rest tests to new style requests (#31944) Switch reindex tests to new style requests (#31941) Switch test framework to new style requests (#31939) Docs: Added note about cloud service to installation and getting started [DOCS] Removes alternative docker pull example (#31934) ingest: date_index_name processor template resolution (#31841) Test: fix null failure in watcher test (#31968) Watcher: Slack message empty text (#31596) Switch low level rest tests to new style Requests (#31938) Switch high level rest tests to new style requests (#31937) HLREST: Bundle the x-pack protocol project (#31904) [ML] Mute test failing due to Java 11 date time format parsing bug (#31899) Increase logging level for testStressMaybeFlush rolling upgrade should use a replica to prevent relocations while running a scroll [test] port archive distribution packaging tests (#31314) HLRest: Move xPackInfo() to xPack().info() (#31905) Increase logging level for :qa:rolling-upgrade Backport: Add template config for Beat state to X-Pack Monitoring (#31809) (#31893) Fix building AD URL from domain name (#31849) Fix broken NaN check in MovingFunctions#stdDev() (#31888) Change trappy float comparison (#31889) Add opaque_id to audit logging (#31878) add support for is_write_index in put-alias body parsing (#31674) Ingest: Enable Templated Fieldnames in Rename (#31690) (#31896) Ingest: Add ignore_missing option to RemoveProc (#31693) (#31892) [Docs] Fix typo in the Rollup API Quick Reference (#31855) Watcher: Add ssl.trust email account setting (#31684) [PkiRealm] Invalidate cache on role mappings change (#31510) [Security] Check auth scheme case insensitively (#31490) HLREST: Add x-pack-info API (#31870) Remove link to oss-MSI (#31844) Painless: Restructure Definition/Whitelist (#31879)
This change adds Expected Reciprocal Rank (ERR) as a ranking evaluation metric
as descriped in:
Chapelle, O., Metlzer, D., Zhang, Y., & Grinspan, P. (2009).
Expected reciprocal rank for graded relevance.
Proceeding of the 18th ACM Conference on Information and Knowledge Management.
https://doi.org/10.1145/1645953.1646033
ERR is an extension of the classical reciprocal rank to the graded relevance
case and assumes a cascade browsing model. It quantifies the usefulness of a
document at rank
i
conditioned on the degree of relevance of the items at ranksless than
i
. ERR seems to be gain traction as an alternative to (n)DCG, so itseems like a good metric to support. Also ERR seems to be the default optimization
metric used for training in RankLib, a widely used learning to rank library.
Relates to #29653