Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dask] Support custom objective functions #3934

Closed
jameslamb opened this issue Feb 10, 2021 · 3 comments · Fixed by #4920
Closed

[dask] Support custom objective functions #3934

jameslamb opened this issue Feb 10, 2021 · 3 comments · Fixed by #4920

Comments

@jameslamb
Copy link
Collaborator

Summary

The Dask estimators in lightgbm.dask should support the use of a custom objective function.

Motivation

This feature would bring Dask estimators closer to parity with the sklearn estimators.

Description

I haven't thought through this much yet, just writing up a placeholder issue for discussion. If you're reading this and have ideas, please comment and the issue can be re-opened.

References

See

A custom objective function can be provided for the ``objective`` parameter.
In this case, it should have the signature
``objective(y_true, y_pred) -> grad, hess`` or
``objective(y_true, y_pred, group) -> grad, hess``:
y_true : array-like of shape = [n_samples]
The target values.
y_pred : array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task)
The predicted values.
group : array-like
Group/query data.
Only used in the learning-to-rank task.
sum(group) = n_samples.
For example, if you have a 100-document dataset with ``group = [10, 20, 40, 10, 10, 10]``, that means that you have 6 groups,
where the first 10 records are in the first group, records 11-30 are in the second group, records 31-70 are in the third group, etc.
grad : array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task)
The value of the first order derivative (gradient) for each sample point.
hess : array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task)
The value of the second order derivative (Hessian) for each sample point.
For binary task, the y_pred is margin.
For multi-class task, the y_pred is group by class_id first, then group by row_id.
If you want to get i-th row y_pred in j-th class, the access way is y_pred[j * num_data + i]
and you should group grad and hess in this way as well.
for an explanation of how this works in the sklearn estimators.

Can look at how xgboost.dask handles this for inspiration:

@jameslamb
Copy link
Collaborator Author

Closing this in favor of putting it in #2302 with other feature requests. Anyone is welcome to pick up this feature! Please comment if interested and the issue can be re-opened.

@jameslamb
Copy link
Collaborator Author

Re-opening this as I'm working on this one right now.

@jameslamb jameslamb reopened this Dec 29, 2021
@jameslamb jameslamb added the dask label Dec 29, 2021
@jameslamb jameslamb self-assigned this Dec 29, 2021
StrikerRUS added a commit that referenced this issue Jan 17, 2022
* add test for custom objective with regressor

* add test for custom binary classification objective with classifier

* isort

* got tests working for multiclass

* update docs

* train deeper model for classifier

* Apply suggestions from code review

Co-authored-by: José Morales <jmoralz92@gmail.com>

* Apply suggestions from code review

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* update multiclass tests

* Apply suggestions from code review

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* fix multiclass probabilities

* linting

Co-authored-by: José Morales <jmoralz92@gmail.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed.
To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues
including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant