Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators #3907

Closed
jameslamb opened this issue Feb 3, 2021 · 7 comments · Fixed by #3911

Comments

@jameslamb
Copy link
Collaborator

Summary

In a refactoring in #3883, I made a silly mistake and forgot to remove a keyword argument from a method in DaskLGBMRegressor.

We should have unit tests on the Dask module that check that estimators' .fit() and .predict() methods have a similar signature to their scikit-learn equivalents.

Description

The exposed keyword arguments in .fit() and .predict() in the Dask estimators are a subset of all the available keyword arguments in their scikit-learn equivalents.

For example,

# lightgbm.dask.DaskLGBMClassifier
def fit(self, X, y, sample_weight=None, **kwargs)

# lightgbm.sklearn.LGBM
def fit(self, X, y,
        sample_weight=None, init_score=None,
        eval_set=None, eval_names=None, eval_sample_weight=None,
        eval_init_score=None, eval_metric=None, early_stopping_rounds=None,
        verbose=True, feature_name='auto', categorical_feature='auto',
        callbacks=None, init_model=None)

These tests should check the following:

  • every keyword argument in the signature of the Dask estimator's method is in the sklearn estimator's method
  • every keyword argument in the signature of the Dask estimator's method is in the same position as in the sklearn estimator's method (e.g. X is the first argument after self)
  • every keyword argument in the signature of the Dask estimator's method has the same default value (or lack of default) as in the sklearn estimator's method

References

Carried over from this suggestion from @StrikerRUS #3906 (comment).

See

def test_dask_classes_and_sklearn_equivalents_have_identical_constructors_except_client_arg(classes):
for a reference on how to use inspect to do this.

@ghost
Copy link

ghost commented Feb 4, 2021

Hey! I would like to work on this. Pretty new to this, but I'm willing to put in the time

@jameslamb
Copy link
Collaborator Author

Sure, thank you very much for volunteering! We look forward to your contribution. Please ask here if you need any help.

@ghost
Copy link

ghost commented Feb 4, 2021

Cool, thanks!

@jameslamb jameslamb assigned ghost Feb 4, 2021
@ghost
Copy link

ghost commented Feb 4, 2021

I have a doubt ,
since checking for subsets (in order of course) is independent of the method (fit, predict) , and of the type of estimator (Regressor, Classifier, Ranker) , wouldn't a single test function do the job? something like

def test_dask_estimator_method_arguments_are_subset_of_sklearn_counterparts(methods):

@jameslamb
Copy link
Collaborator Author

I don't understand the question, sorry. You're welcome to open a draft pull request with a proposal when you feel you have something working that accomplishes the goal of this issue. Giving direct feedback on code might be easier than discussing this abstractly.

@ghost
Copy link

ghost commented Feb 4, 2021

ok sure

jameslamb pushed a commit that referenced this issue Feb 7, 2021
…cikit-learn estimators (#3911)

* [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes #3907)

* [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes #3907)

* [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes #3907)

* [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes #3907)

* [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes #3907)

* [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes #3907)

* [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes #3907)

* [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes #3907)

* [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes #3907)

* [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes #3907)

* [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes #3907)

* [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes #3907)

* [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes #3907)

* [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes #3907)

* [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes #3907)

* [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes #3907)

* [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes #3907)

* [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes #3907)

* [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes #3907)

* [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes #3907)
@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant