[dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators #3907

jameslamb · 2021-02-03T22:39:49Z

Summary

In a refactoring in #3883, I made a silly mistake and forgot to remove a keyword argument from a method in DaskLGBMRegressor.

We should have unit tests on the Dask module that check that estimators' .fit() and .predict() methods have a similar signature to their scikit-learn equivalents.

Description

The exposed keyword arguments in .fit() and .predict() in the Dask estimators are a subset of all the available keyword arguments in their scikit-learn equivalents.

For example,

# lightgbm.dask.DaskLGBMClassifier
def fit(self, X, y, sample_weight=None, **kwargs)

# lightgbm.sklearn.LGBM
def fit(self, X, y,
        sample_weight=None, init_score=None,
        eval_set=None, eval_names=None, eval_sample_weight=None,
        eval_init_score=None, eval_metric=None, early_stopping_rounds=None,
        verbose=True, feature_name='auto', categorical_feature='auto',
        callbacks=None, init_model=None)

These tests should check the following:

every keyword argument in the signature of the Dask estimator's method is in the sklearn estimator's method
every keyword argument in the signature of the Dask estimator's method is in the same position as in the sklearn estimator's method (e.g. X is the first argument after self)
every keyword argument in the signature of the Dask estimator's method has the same default value (or lack of default) as in the sklearn estimator's method

References

Carried over from this suggestion from @StrikerRUS #3906 (comment).

See

LightGBM/tests/python_package_test/test_dask.py

Line 816 in b1e000c

    
           def test_dask_classes_and_sklearn_equivalents_have_identical_constructors_except_client_arg(classes):

for a reference on how to use inspect to do this.

The text was updated successfully, but these errors were encountered:

ghost · 2021-02-04T05:34:05Z

Hey! I would like to work on this. Pretty new to this, but I'm willing to put in the time

jameslamb · 2021-02-04T06:02:20Z

Sure, thank you very much for volunteering! We look forward to your contribution. Please ask here if you need any help.

ghost · 2021-02-04T07:11:31Z

Cool, thanks!

ghost · 2021-02-04T16:42:46Z

I have a doubt ,
since checking for subsets (in order of course) is independent of the method (fit, predict) , and of the type of estimator (Regressor, Classifier, Ranker) , wouldn't a single test function do the job? something like

def test_dask_estimator_method_arguments_are_subset_of_sklearn_counterparts(methods):

jameslamb · 2021-02-04T16:46:40Z

I don't understand the question, sorry. You're welcome to open a draft pull request with a proposal when you feel you have something working that accomplishes the goal of this issue. Giving direct feedback on code might be easier than discussing this abstractly.

ghost · 2021-02-04T16:48:00Z

ok sure

…cikit-learn estimators (#3911) * [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes #3907) * [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes #3907) * [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes #3907) * [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes #3907) * [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes #3907) * [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes #3907) * [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes #3907) * [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes #3907) * [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes #3907) * [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes #3907) * [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes #3907) * [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes #3907) * [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes #3907) * [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes #3907) * [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes #3907) * [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes #3907) * [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes #3907) * [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes #3907) * [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes #3907) * [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes #3907)

github-actions · 2023-08-23T14:28:32Z

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

jameslamb added good first issue dask labels Feb 3, 2021

jameslamb assigned ghost Feb 4, 2021

jameslamb mentioned this issue Feb 5, 2021

[dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators #3911

Merged

jameslamb closed this as completed in #3911 Feb 7, 2021

github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators #3907

[dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators #3907

jameslamb commented Feb 3, 2021

ghost commented Feb 4, 2021

jameslamb commented Feb 4, 2021

ghost commented Feb 4, 2021

ghost commented Feb 4, 2021 •

edited by ghost

Loading

jameslamb commented Feb 4, 2021

ghost commented Feb 4, 2021

github-actions bot commented Aug 23, 2023

[dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators #3907

[dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators #3907

Comments

jameslamb commented Feb 3, 2021

Summary

Description

References

ghost commented Feb 4, 2021

jameslamb commented Feb 4, 2021

ghost commented Feb 4, 2021

ghost commented Feb 4, 2021 • edited by ghost Loading

jameslamb commented Feb 4, 2021

ghost commented Feb 4, 2021

github-actions bot commented Aug 23, 2023

ghost commented Feb 4, 2021 •

edited by ghost

Loading