-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dask] Add a scikit-learn compatibility test #3894
Comments
Hey, I would like to be assigned this issue. I am in a software methodology class and passionate about data science. |
Thanks @lmjwang! You're welcome to take it. A few things you should know:
Feel free to ask questions here if you get stuck. Before beginning, please read the items that are linked in the issue's description. |
Hey @jameslamb, I'm going through test_sklearn.py and I'm curious why lgb.sklearn.LGBMRanker is not included here?
|
Good question! I do not actually know. @StrikerRUS do you know? By the way in the future, a link to code is more helpful than just copying it:
|
Scikit-learn doesn't have learning-to-rank applications. So there is no point to test LGBMRanker to be "compatible" with something that doesn't support ranking. |
I copied these lines of code to test_dask.py with their dependencies, but used lgb.DaskLGBMClassifier, lgb.DaskLGBMRegressor for _tested_estimators(): LightGBM/tests/python_package_test/test_sklearn.py Lines 1150 to 1182 in a4cae37
But running Does this satisfy the task or am I missing something? @jameslamb |
Seems like a good approach! If you feel like you have something that is mostly working, please open a pull request and I can give more specific help there. For the Creating an instance of the Dask estimators requires an active Dask client. We do this with a You can use this by passing the keyword argument
|
* add test_dask.py * Update tests/python_package_test/test_dask.py Co-authored-by: James Lamb <jaylamb20@gmail.com> * clients * remove ports * safe sklearn checks * safe sklearn checks * fix whitespace * fix whitespace-try 2 * fix whitespace-try 3 * isort * isort * sklearn_checks_to_learn Co-authored-by: James Lamb <jaylamb20@gmail.com>
We may want to get back for this after resolving upstream issue dask/dask-ml#796. |
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Summary
scikit-learn
supports a test that people writing scikit-learn extensions can use to check API compatibility with the rest of the scikit-learn ecosystem. Such a test should be added for the code inpython-package/lightgbm/dask.py
.Motivation
The discussion in #3883 focused a lot on compatibility of the dask estimators, like
DaskLGBMClassifier
, with the broader scikit-learn ecosystem. It was proposed in #3883 (comment) that we should have a test on the Dask module similar toLightGBM/tests/python_package_test/test_sklearn.py
Line 1169 in a4cae37
I think that actually several of the tests from that module should have Dask equivalents, to test compatibility with the scikit-learn ecosystem.
References
More information on the scikit-learn API is available in "Developing scikit-learn estimators". In general, changes to
lightgbm.dask
that make it align more closely with that spec are welcome.The text was updated successfully, but these errors were encountered: