Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dask][docs] initial setup for Dask docs #3822

Merged
merged 5 commits into from
Jan 25, 2021
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,8 +87,6 @@ ML.NET (.NET/C#-package): https://github.com/dotnet/machinelearning

LightGBM.NET (.NET/C#-package): https://github.com/rca22/LightGBM.Net

Dask-LightGBM (distributed and parallel Python-package): https://github.com/dask/dask-lightgbm

Ruby gem: https://github.com/ankane/lightgbm

LightGBM4j (Java high-level binding): https://github.com/metarank/lightgbm4j
Expand Down
2 changes: 1 addition & 1 deletion docs/FAQ.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ You may also ping a member of the core team according to the relevant area of ex
- `@chivee <https://github.com/chivee>`__ **Qiwei Ye** (C++ code / Python-package)
- `@btrotta <https://github.com/btrotta>`__ **Belinda Trotta** (C++ code)
- `@Laurae2 <https://github.com/Laurae2>`__ **Damien Soukhavong** (R-package)
- `@jameslamb <https://github.com/jameslamb>`__ **James Lamb** (R-package)
- `@jameslamb <https://github.com/jameslamb>`__ **James Lamb** (R-package / Dask-package)
- `@wxchan <https://github.com/wxchan>`__ **Wenxuan Chen** (Python-package)
- `@henry0312 <https://github.com/henry0312>`__ **Tsukasa Omoto** (Python-package)
- `@StrikerRUS <https://github.com/StrikerRUS>`__ **Nikita Titov** (Python-package)
Expand Down
4 changes: 1 addition & 3 deletions docs/Parallel-Learning-Guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Follow the `Quick Start <./Quick-Start.rst>`__ to know how to use LightGBM first

**List of external libraries in which LightGBM can be used in a distributed fashion**

- `Dask-LightGBM`_ allows to create ML workflow on Dask distributed data structures.
- `Dask API of LightGBM <./Python-API.rst#dask-api>`__ (formerly it was a separate package) allows to create ML workflow on Dask distributed data structures.

- `MMLSpark`_ integrates LightGBM into Apache Spark ecosystem.
`The following example`_ demonstrates how easy it's possible to utilize the great power of Spark.
Expand Down Expand Up @@ -134,8 +134,6 @@ Example

- `A simple parallel example`_

.. _Dask-LightGBM: https://github.com/dask/dask-lightgbm

.. _MMLSpark: https://aka.ms/spark

.. _The following example: https://github.com/Azure/mmlspark/blob/master/notebooks/samples/LightGBM%20-%20Quantile%20Regression%20for%20Drug%20Discovery.ipynb
Expand Down
10 changes: 10 additions & 0 deletions docs/Python-API.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,16 @@ Scikit-learn API
LGBMRegressor
LGBMRanker

Dask API
--------

.. autosummary::
:toctree: pythonapi/

DaskLGBMClassifier
DaskLGBMRegressor
DaskLGBMRanker

Callbacks
---------

Expand Down
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@

# -- mock out modules
MOCK_MODULES = ['numpy', 'scipy', 'scipy.sparse',
'sklearn', 'matplotlib', 'pandas', 'graphviz']
'sklearn', 'matplotlib', 'pandas', 'graphviz', 'dask', 'dask.distributed']
for mod_name in MOCK_MODULES:
sys.modules[mod_name] = Mock()

Expand Down
12 changes: 11 additions & 1 deletion python-package/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -183,12 +183,22 @@ Run ``python setup.py install --bit32``, if you want to use 32-bit version. All

If you get any errors during installation or due to any other reasons, you may want to build dynamic library from sources by any method you prefer (see `Installation Guide <https://github.com/microsoft/LightGBM/blob/master/docs/Installation-Guide.rst>`__) and then just run ``python setup.py install --precompile``.


Build Wheel File
****************

You can use ``python setup.py bdist_wheel`` instead of ``python setup.py install`` to build wheel file and use it for installation later. This might be useful for systems with restricted or completely without network access.

Install Dask-package
''''''''''''''''''''

To install all additional dependencies required for Dask-package, you can append ``[dask]`` to LightGBM package name:

.. code:: sh

pip install lightgbm[dask]

Or replace ``python setup.py install`` with ``pip install -e .[dask]`` if you are installing the package from source files.

Troubleshooting
---------------

Expand Down
5 changes: 5 additions & 0 deletions python-package/lightgbm/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,10 @@
plot_tree, create_tree_digraph)
except ImportError:
pass
try:
from .dask import DaskLGBMRegressor, DaskLGBMClassifier, DaskLGBMRanker
except ImportError:
pass


dir_path = os.path.dirname(os.path.realpath(__file__))
Expand All @@ -30,5 +34,6 @@
__all__ = ['Dataset', 'Booster', 'CVBooster',
'train', 'cv',
'LGBMModel', 'LGBMRegressor', 'LGBMClassifier', 'LGBMRanker',
'DaskLGBMRegressor', 'DaskLGBMClassifier', 'DaskLGBMRanker',
'print_evaluation', 'record_evaluation', 'reset_parameter', 'early_stopping',
'plot_importance', 'plot_split_value_histogram', 'plot_metric', 'plot_tree', 'create_tree_digraph']
9 changes: 9 additions & 0 deletions python-package/lightgbm/compat.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,3 +105,12 @@ def _check_sample_weight(sample_weight, X, dtype=None):
_LGBMAssertAllFinite = None
_LGBMCheckClassificationTargets = None
_LGBMComputeSampleWeight = None

"""dask"""
try:
from dask import array
from dask import dataframe
from dask.distributed import Client
DASK_INSTALLED = True
except ImportError:
DASK_INSTALLED = False
12 changes: 8 additions & 4 deletions python-package/lightgbm/dask.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,8 @@
from dask import delayed
from dask.distributed import Client, default_client, get_worker, wait

from .basic import _ConfigAliases, _LIB, _safe_call
from .basic import _ConfigAliases, _LIB, _safe_call, LightGBMError
from .compat import DASK_INSTALLED, PANDAS_INSTALLED, SKLEARN_INSTALLED
from .sklearn import LGBMClassifier, LGBMRegressor, LGBMRanker

logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -384,6 +385,9 @@ def _predict(model, data, raw_score=False, pred_proba=False, pred_leaf=False, pr


class _LGBMModel:
def __init__(self):
if not all((DASK_INSTALLED, PANDAS_INSTALLED, SKLEARN_INSTALLED)):
raise LightGBMError('Dask, Pandas and Scikit-learn are required for this module')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise LightGBMError('Dask, Pandas and Scikit-learn are required for this module')
raise LightGBMError('dask, pandas and scikit-learn are required for lightgbm.dask')

Instead of "this module", could you use the specific name? I think that makes the log message a little more useful standalone. It can be helpful for cases where people don't have direct access to the stack trace, which is required to understand what "this module" refers to.

For example, user code or other frameworks might write things like this

try:
    dask_reg = DaskLGBMClassifier()
except LightGBMError as err:
    log.fatal(err)
    raise SomeOtherException("LightGBM training failed")

I also think packages should be referenced by their exact package names, not capitalized names.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Addressed in acac78f.


def _fit(self, model_factory, X, y=None, sample_weight=None, group=None, client=None, **kwargs):
"""Docstring is inherited from the LGBMModel."""
Expand Down Expand Up @@ -422,7 +426,7 @@ def _copy_extra_params(source, dest):
setattr(dest, name, attributes[name])


class DaskLGBMClassifier(_LGBMModel, LGBMClassifier):
class DaskLGBMClassifier(LGBMClassifier, _LGBMModel):
"""Distributed version of lightgbm.LGBMClassifier."""

def fit(self, X, y=None, sample_weight=None, client=None, **kwargs):
Expand Down Expand Up @@ -470,7 +474,7 @@ def to_local(self):
return self._to_local(LGBMClassifier)


class DaskLGBMRegressor(_LGBMModel, LGBMRegressor):
class DaskLGBMRegressor(LGBMRegressor, _LGBMModel):
"""Docstring is inherited from the lightgbm.LGBMRegressor."""

def fit(self, X, y=None, sample_weight=None, client=None, **kwargs):
Expand Down Expand Up @@ -506,7 +510,7 @@ def to_local(self):
return self._to_local(LGBMRegressor)


class DaskLGBMRanker(_LGBMModel, LGBMRanker):
class DaskLGBMRanker(LGBMRanker, _LGBMModel):
"""Docstring is inherited from the lightgbm.LGBMRanker."""

def fit(self, X, y=None, sample_weight=None, init_score=None, group=None, client=None, **kwargs):
Expand Down
2 changes: 1 addition & 1 deletion python-package/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -344,7 +344,7 @@ def run(self):
extras_require={
'dask': [
'dask[array]>=2.0.0',
'dask[dataframe]>=2.0.0'
'dask[dataframe]>=2.0.0',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wow, thank you!

Copy link
Collaborator Author

@StrikerRUS StrikerRUS Jan 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, first time I noticed that was LGTM site:
https://lgtm.com/projects/g/microsoft/LightGBM?mode=tree

'dask[distributed]>=2.0.0',
'pandas',
],
Expand Down