Consistent release version-correlated decrease of LGBMRanker performance #4349

GerardBCN · 2021-06-07T13:45:43Z

Description

First of all thank you for your work in this amazing library, it has been extremely useful in our research. I open this issue to ask for your opinion about something we have observed in my research group and perhaps is relevant to other people here using LGBM ranker.

It all started when a colleague of mine couldn't reproduce my results and then we noticed that we were using different versions of the lightgbm library. We went on a small experiment to see the differences of performance (we use a custom metric function) depending on the release version and we saw that actually performance decreases quite consistently from old to new releases. See the plot below.

We tried to pinpoint which were the possible breaking changes and to do so we tracked the commit history of the lambdarank test function, located at tests/python_package_test/test_sklearn.py . Particularly, we were able to observe several changes in the lambdarank loss function and changes in default parameters. Unfortunately, we are not well versed with the inner workings of lambdarank so we can't fully grasp the relevance of such changes. What was surprising to us, was to see that the plaintext performances used in the equality tests also decrease quite consistently from old to new releases. Please see the plot below which shows the performance written as plaintext as a function of time (commit date).

Is there any guidance that the developers could offer us to justify the use of one or another version? Is there any reason why the changes in loss function were applied? Any bugs we should be aware of?

Reproducible example

The performances for the later plot were extracted from the following commits (from old to new):

# https://github.com/microsoft/LightGBM/commit/496a07d1dbd5c3a8cf28d50f5aad84428fddf2f4#diff-711a5439fdebb728fb5859f49561c5cd1388e25276dd03c409dc63c46f2f88d2
# https://github.com/microsoft/LightGBM/commit/aee92f63ba124e1f6a3168eb2864d032567cbf9e#diff-711a5439fdebb728fb5859f49561c5cd1388e25276dd03c409dc63c46f2f88d2
# https://github.com/microsoft/LightGBM/commit/0dfda82607633132e10a693eba9666ed75585ac8#diff-711a5439fdebb728fb5859f49561c5cd1388e25276dd03c409dc63c46f2f88d2
# https://github.com/microsoft/LightGBM/commit/509c2e50c25eded99fc0997afe25ebee1b33285d#diff-98ca62132fa18e4a80cd57f16e9337fe3d72a08d5862d02eae9935bed9e43486
# https://github.com/microsoft/LightGBM/commit/ba0a1f8d38d12aeb29f1c769596308eb8b1e5874

The text was updated successfully, but these errors were encountered:

jameslamb · 2021-06-07T14:06:57Z

Thanks very much for using LightGBM and for the thorough write-up!

Are you able to provide a reproducible example? Without an example to try (and to rule out some theories), I think it will be very difficult to find an answer to this question.

jameslamb · 2021-06-07T14:16:37Z

to see that the plaintext performances used in the equality tests also decrease quite consistently from old to new releases

After re-reading I understand what you mean by this. You're saying that for the lambdarank tests in LightGBM's tests suite, you can see hard-coded performance expectations being reduced.

e.g., from 509c2e5#diff-98ca62132fa18e4a80cd57f16e9337fe3d72a08d5862d02eae9935bed9e43486

I think maybe @shiyu1994 or @btrotta will be able to give you the best guidance on this question.

GerardBCN · 2021-06-07T14:18:19Z

Yes, that's right. Thank you for your prompt answer!

StrikerRUS · 2021-06-09T16:52:20Z

@GerardBCN Thanks a lot for sharing your observations! All commits were made to fix bugs or increase ranking performance on real data. I promise that no commit was merged with the aim to intentionally decrease ranking performance 😃 . You can see that some important changes were proved to increase the score on some kind of "standard" ranking datasets for benchmarking: #2322 (comment), #2331 (comment), #3425 (comment).
Regarding intentional score decreasing in the test_sklearn.py file, this was done given that improvements for the ranking algorithm in general or particular bug fixes are not always positively act on one particular dataset used in our tests. I guess that this is also applicable for your situation.

Thanks for providing the list of commits you've found related to ranking! Thanks to GitHub, we can easily check corresponding Pull Requests and find out what was the aim of those PRs.

496a07d#diff-711a5439fdebb728fb5859f49561c5cd1388e25276dd03c409dc63c46f2f88d2
std::sort was replaced with std::stable_sort in some places to fix reproducibility issue on different platforms caused by non-deterministic sorting of same values in case of std::sort.
aee92f6#diff-711a5439fdebb728fb5859f49561c5cd1388e25276dd03c409dc63c46f2f88d2
More accurate implementation according to the original LambdaMART paper.
0dfda82#diff-711a5439fdebb728fb5859f49561c5cd1388e25276dd03c409dc63c46f2f88d2
Brought normalization back after its' removal with the commit listed above. Also, made it optional by lambdamart_norm parameter.
509c2e5#diff-98ca62132fa18e4a80cd57f16e9337fe3d72a08d5862d02eae9935bed9e43486
Huge global LightGBM refactoring not related to ranking, but related to the speed. This was the main breaking change that forced us to bump major version (3.x.x) of LightGBM.
ba0a1f8
Fix to better handle k in NDCG@k.

no-response · 2021-07-12T14:43:37Z

This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one. Thank you for taking the time to improve LightGBM!

github-actions · 2023-08-23T14:27:21Z

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

GerardBCN changed the title ~~Consistent release version-correlated decrease of LGBMRanker custom metric performance~~ Consistent release version-correlated decrease of LGBMRanker performance Jun 7, 2021

jameslamb added the question label Jun 7, 2021

StrikerRUS added the awaiting response label Jun 12, 2021

StrikerRUS mentioned this issue Jun 16, 2021

Regression Tests and gitlab CI support #4228

Closed

no-response bot closed this as completed Jul 12, 2021

github-actions bot removed the awaiting response label Aug 23, 2023

github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consistent release version-correlated decrease of LGBMRanker performance #4349

Consistent release version-correlated decrease of LGBMRanker performance #4349

GerardBCN commented Jun 7, 2021

jameslamb commented Jun 7, 2021

jameslamb commented Jun 7, 2021

GerardBCN commented Jun 7, 2021

StrikerRUS commented Jun 9, 2021 •

edited

Loading

no-response bot commented Jul 12, 2021

github-actions bot commented Aug 23, 2023

Consistent release version-correlated decrease of LGBMRanker performance #4349

Consistent release version-correlated decrease of LGBMRanker performance #4349

Comments

GerardBCN commented Jun 7, 2021

Description

Reproducible example

jameslamb commented Jun 7, 2021

jameslamb commented Jun 7, 2021

GerardBCN commented Jun 7, 2021

StrikerRUS commented Jun 9, 2021 • edited Loading

no-response bot commented Jul 12, 2021

github-actions bot commented Aug 23, 2023

StrikerRUS commented Jun 9, 2021 •

edited

Loading