-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dask] fix Dask docstrings and mimic sklearn wrapper importing way #3855
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much! Overall I'm +1, just a few suggestions
elif isinstance(seq[0], (DataFrame, Series)): | ||
return concat(seq, axis=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with importing from compat
, but could the names be changed on import to something like pd_DataFrame
?
from .compat import DataFrame as pd_DataFrame
from .compat import Series as pd_Series
Since both pandas
and dask
have a DataFrame
class, I think just calling this DataFrame
makes the code difficult to read. I know that I personally will read this in the future and think "wait does that mean pandas or Dask DataFrame".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm totally agree! But I think we can make import aliases in compat.py
and then import like from .compat import pd_DataFrame
. Otherwise in case of identical names it will be confusing to have
from dask import DataFrame
from pandas import DataFrame
in compat.py
.
I'm going to rename only Dask imports in this PR to not overcomplicate review. pandas will be done in a follow-up PR. Do you agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes sounds good, thank you
@@ -390,12 +409,9 @@ def _predict(model, data, raw_score=False, pred_proba=False, pred_leaf=False, pr | |||
|
|||
|
|||
class _LGBMModel: | |||
def __init__(self): | |||
def _fit(self, model_factory, X, y, sample_weight=None, group=None, client=None, **kwargs): | |||
if not all((DASK_INSTALLED, PANDAS_INSTALLED, SKLEARN_INSTALLED)): | |||
raise LightGBMError('dask, pandas and scikit-learn are required for lightgbm.dask') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this check is being moved out of the constructor, then can you please put it in the _predict()
function as well?
If someone tries to load a saved DaskLGBMClassifier
from a pickle file (for example) and then use its .predict()
method, I think we also want them to get an informative error about dask
not being available. They won't get an ImportError
on pickle.load()
because of the magic of .compat
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch! Will do.
if this check is being moved out of the constructor
I wish I could leave it in __init__()
, but I realized that parent's constructor in _LGBMModel
class is never called 🙁 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah! didn't think about that when reviewing the MRO change in #3822
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks!
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Will be reworked after discussion in #3808.
But for now this PR makes docstrings consistent with actual method signatures and reliable for early-adopters of lightgbm.dask.
Live demo: https://lightgbm.readthedocs.io/en/dask_docs/Python-API.html#dask-api.
With fixed importing way via
compat.py
module users are able to actually import classes, read their docstrings but in case of missing Dask dependencies informative error will be raised: