Add support to optimize for NDCG at a given truncation level #3425

metpavel · 2020-09-30T23:10:37Z

In order to correctly optimize for NDCG@_k_, one should exclude pairs containing both documents beyond the top-_k_ (as they don't affect NDCG@_k_ when swapped).

ghost · 2020-09-30T23:10:52Z

All CLA requirements met.

guolinke · 2020-09-30T23:19:32Z

Any benchmarks for the change?

metpavel · 2020-10-01T22:49:06Z

Any benchmarks for the change?

For MSLR-WEB30K dataset (fold 1) using default train.conf, performance on the validation set:
Optimize for NDCG@FullRank:
ndcg@1 : 0.498897
ndcg@3 : 0.484631
ndcg@5 : 0.487863
208.163323 seconds elapsed, finished iteration 100

Optimize for NDCG@20:
ndcg@1 : 0.502574
ndcg@3 : 0.486615
ndcg@5 : 0.489424
124.763394 seconds elapsed, finished iteration 100

Optimize for NDCG@10:
ndcg@1 : 0.504389
ndcg@3 : 0.487059
ndcg@5 : 0.48924
93.361508 seconds elapsed, finished iteration 100

Optimize for NDCG@5:
ndcg@1 : 0.498769
ndcg@3 : 0.483403
ndcg@5 : 0.48611
96.900527 seconds elapsed, finished iteration 100

Optimize for NDCG@3:
ndcg@1 : 0.503184
ndcg@3 : 0.483097
ndcg@5 : 0.485419
79.423862 seconds elapsed, finished iteration 100

Optimize for NDCG@1:
ndcg@1 : 0.502715
ndcg@3 : 0.47437
ndcg@5 : 0.47624
71.433821 seconds elapsed, finished iteration 100

guolinke · 2020-10-09T07:28:31Z

I think this PR is a new feature.

guolinke · 2020-10-09T07:29:52Z

@metpavel Is the result of "full rank" the same as before?

guolinke · 2020-10-09T07:33:05Z

src/objective/rank_objective.hpp

-        const data_size_t low = sorted_idx[j];
+    // start accmulate lambdas by pairs that contain at least one document above truncation level
+    for (data_size_t i = 0; i < cnt - 1 && i < truncation_level_; ++i) {
+      for (data_size_t j = i + 1; j < cnt; ++j) {


Isn't this start from '0' ? if not, why?

updated:
I see. never mind.

src/objective/rank_objective.hpp

guolinke · 2020-10-09T07:51:21Z

src/objective/rank_objective.hpp

+        const double high_label_gain = label_gain_[high_label];
+        const double high_discount = DCGCalculator::GetDiscount(high_rank);
+
+        const data_size_t low_rank = label[sorted_idx[i]] > label[sorted_idx[j]] ? j : i;


remove the additional branching? get high_rank and low_rank by one if.

Co-authored-by: Guolin Ke <guolin.ke@outlook.com>

remove the additional branching: get high_rank and low_rank by one "if".

metpavel · 2020-10-09T11:40:13Z

@metpavel Is the result of "full rank" the same as before?

@guolinke, yes

guolinke · 2020-10-09T12:23:24Z

From the results, it seems we need a slightly larger truncation level, e.g. k + m, to optimize NDCG@k?

metpavel · 2020-10-09T16:30:50Z

From the results, it seems we need a slightly larger truncation level, e.g. k + m, to optimize NDCG@k?

@guolinke, perhaps one may want to treat that m as a tunable hyper-parameter to balance between better alignment with the desired cutoff k (smaller m) and more pairs to train on (larger m)

guolinke · 2020-10-09T23:15:45Z

@metpavel Yes, maybe you can add this description into the document of truncation level?

metpavel · 2020-10-10T21:18:47Z

@metpavel Yes, maybe you can add this description into the document of truncation level?

@guolinke, do you mean here: /blob/master/docs/Parameters.rst?

guolinke · 2020-10-11T01:13:04Z

@metpavel we usually write the doc for parameters in config.h, and then run helpers/parameter_generator.py to auto-update .rst file.

guolinke · 2020-10-18T07:02:36Z

BTW, can you also merge master, to ensure CI passes?

add description to lambdarank_truncation_level parameter

guolinke · 2020-10-22T13:27:36Z

@jameslamb can you help to fix the R's tests? It needs to update some ranking score constraints, as the ranking objective is updated.

update expected NDCG value for a test, as it was affected by the underlying change in the algorithm

update NDCG@3 reference value

jameslamb · 2020-10-23T04:46:35Z

@jameslamb can you help to fix the R's tests? It needs to update some ranking score constraints, as the ranking objective is updated.

ok sure, I can help

jameslamb · 2020-10-23T05:09:50Z

@metpavel I just created a pull request into this branch on your fork. If you merge it, it should fix the R tests

metpavel#1

jameslamb · 2020-10-23T05:13:01Z

src/objective/rank_objective.hpp

        // lambda is negative, so use minus to accumulate
        sum_lambdas -= 2 * p_lambda;
      }
-      // update
-      lambdas[high] += static_cast<score_t>(high_sum_lambda);
-      hessians[high] += static_cast<score_t>(high_sum_hessian);
    }
    if (norm_ && sum_lambdas > 0) {
      double norm_factor = std::log2(1 + sum_lambdas) / sum_lambdas;


/gha run-valgrind

/gha run r-valgrind

^ this comment started this run checking that these changes pass our valgrind tests: https://github.com/microsoft/LightGBM/runs/1296338580?check_suite_focus=true

this didn't show any new issues...but I'll run it again once my fixes are merged into this PR: #3425 (comment)

/gha run-valgrind

/gha run r-valgrind

if this passes, I'll come back and approve: https://github.com/microsoft/LightGBM/runs/1310275649?check_suite_focus=true

fix R learning-to-rank tests

include/LightGBM/config.h

Co-authored-by: Guolin Ke <guolin.ke@outlook.com>

guolinke · 2020-10-26T13:13:46Z

You need to re-run the parameter generator, due to the default value is changed.

github-actions · 2023-08-24T04:39:48Z

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

Add support to optimize for NDCG at a given truncation level

16955e0

In order to correctly optimize for NDCG@_k_, one should exclude pairs containing both documents beyond the top-_k_ (as they don't affect NDCG@_k_ when swapped).

metpavel requested review from btrotta, chivee and guolinke as code owners September 30, 2020 23:10

Update rank_objective.hpp

d938551

jameslamb added the fix label Oct 6, 2020

guolinke added feature and removed fix labels Oct 9, 2020

guolinke reviewed Oct 9, 2020

View reviewed changes

src/objective/rank_objective.hpp Show resolved Hide resolved

src/objective/rank_objective.hpp Outdated Show resolved Hide resolved

guolinke reviewed Oct 9, 2020

View reviewed changes

metpavel and others added 2 commits October 9, 2020 14:25

Apply suggestions from code review

b2736e6

Co-authored-by: Guolin Ke <guolin.ke@outlook.com>

Update rank_objective.hpp

9e3cc2b

remove the additional branching: get high_rank and low_rank by one "if".

metpavel added 2 commits October 20, 2020 22:43

Update config.h

c7890dc

add description to lambdarank_truncation_level parameter

Update Parameters.rst

7f36817

metpavel requested review from jameslamb, Laurae2 and StrikerRUS as code owners October 21, 2020 23:41

Merge branch 'master' into patch-1

4d7183b

Update test_sklearn.py

c15423f

update expected NDCG value for a test, as it was affected by the underlying change in the algorithm

metpavel requested review from henry0312 and wxchan as code owners October 22, 2020 13:39

Update test_sklearn.py

8e98bd4

update NDCG@3 reference value

fix R learning-to-rank tests

a06485a

jameslamb mentioned this pull request Oct 23, 2020

fix R learning-to-rank tests metpavel/LightGBM#1

Merged

jameslamb reviewed Oct 23, 2020

View reviewed changes

metpavel added 3 commits October 26, 2020 00:27

Merge pull request #1 from jameslamb/fix/r-ltr-tests

c51fb3a

fix R learning-to-rank tests

Merge branch 'master' into patch-1

7bfc652

Update rank_objective.hpp

af2fe37

guolinke reviewed Oct 26, 2020

View reviewed changes

include/LightGBM/config.h Outdated Show resolved Hide resolved

guolinke mentioned this pull request Oct 26, 2020

3.1.0 release #3484

Merged

Update include/LightGBM/config.h

b6bd92f

Co-authored-by: Guolin Ke <guolin.ke@outlook.com>

Update Parameters.rst

e73d3db

guolinke approved these changes Oct 26, 2020

View reviewed changes

guolinke merged commit ba0a1f8 into microsoft:master Oct 27, 2020

metpavel deleted the patch-1 branch October 27, 2020 18:29

jameslamb mentioned this pull request Nov 1, 2020

[R-package] learning-to-rank tests are broken on Solaris 10 and 32-bit Windows #3513

Open

guolinke mentioned this pull request Nov 13, 2020

[feature request] faster lambdarank #2701

Closed

StrikerRUS mentioned this pull request Jun 9, 2021

Consistent release version-correlated decrease of LGBMRanker performance #4349

Closed

github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support to optimize for NDCG at a given truncation level #3425

Add support to optimize for NDCG at a given truncation level #3425

metpavel commented Sep 30, 2020

ghost commented Sep 30, 2020 •

edited by ghost

Loading

guolinke commented Sep 30, 2020

metpavel commented Oct 1, 2020

guolinke commented Oct 9, 2020

guolinke commented Oct 9, 2020

guolinke Oct 9, 2020 •

edited

Loading

guolinke Oct 9, 2020

metpavel Oct 9, 2020

metpavel commented Oct 9, 2020

guolinke commented Oct 9, 2020 •

edited

Loading

metpavel commented Oct 9, 2020 •

edited

Loading

guolinke commented Oct 9, 2020

metpavel commented Oct 10, 2020

guolinke commented Oct 11, 2020 •

edited

Loading

guolinke commented Oct 18, 2020

guolinke commented Oct 22, 2020

jameslamb commented Oct 23, 2020

jameslamb commented Oct 23, 2020

jameslamb Oct 23, 2020

jameslamb Oct 23, 2020

jameslamb Oct 23, 2020

jameslamb Oct 23, 2020

jameslamb Oct 26, 2020

jameslamb Oct 26, 2020

jameslamb Oct 26, 2020

guolinke commented Oct 26, 2020

github-actions bot commented Aug 24, 2023

Add support to optimize for NDCG at a given truncation level #3425

Add support to optimize for NDCG at a given truncation level #3425

Conversation

metpavel commented Sep 30, 2020

ghost commented Sep 30, 2020 • edited by ghost Loading

guolinke commented Sep 30, 2020

metpavel commented Oct 1, 2020

guolinke commented Oct 9, 2020

guolinke commented Oct 9, 2020

guolinke Oct 9, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

metpavel commented Oct 9, 2020

guolinke commented Oct 9, 2020 • edited Loading

metpavel commented Oct 9, 2020 • edited Loading

guolinke commented Oct 9, 2020

metpavel commented Oct 10, 2020

guolinke commented Oct 11, 2020 • edited Loading

guolinke commented Oct 18, 2020

guolinke commented Oct 22, 2020

jameslamb commented Oct 23, 2020

jameslamb commented Oct 23, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guolinke commented Oct 26, 2020

github-actions bot commented Aug 24, 2023

ghost commented Sep 30, 2020 •

edited by ghost

Loading

guolinke Oct 9, 2020 •

edited

Loading

guolinke commented Oct 9, 2020 •

edited

Loading

metpavel commented Oct 9, 2020 •

edited

Loading

guolinke commented Oct 11, 2020 •

edited

Loading