Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add alt text for gcc-tips #4175

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions docs/gcc-Tips.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,27 +32,34 @@ Some explanatory pictures:
.. image:: ./_static/images/gcc-table.png
:align: center
:target: ./_static/images/gcc-table.png
:alt: table comparing training runtime for different combinations of max depth, compiler flags, and number of threads. Faster training times are shown in green, slower times in red. For max depth 10 and 12, the fastest training was achieved with 5 threads and compiler flag dash O 2.

.. image:: ./_static/images/gcc-bars.png
:align: center
:target: ./_static/images/gcc-bars.png
:alt: picture of a simple bar chart against running time
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I'm seeing this image for the first time and it doesn't make much sense to me. It seems to be saying that the runtime for training was lower as max_depth was increased. That doesn't make sense to me.

@StrikerRUS do you understand it?

this one: https://lightgbm.readthedocs.io/en/latest/_static/images/gcc-bars.png


.. image:: ./_static/images/gcc-chart.png
:align: center
:target: ./_static/images/gcc-chart.png
:alt: a grid of vertical bar charts comparing run time for different combinations of max depth, compiler flags, and number of threads. The charts show that for shallow trees, using more threads is always expected to provide some reduction in run time. But for deeper trees (max depth greater than 10), using a value of num threads that is too high can actually result in slower training.

.. image:: ./_static/images/gcc-comparison-1.png
:align: center
:target: ./_static/images/gcc-comparison-1.png
:alt: a horizontal bar chart comparing Light G B M performance versus compilation flags. For most settings of max depth, best performance was achieved with flags dash O 3, dash M T U N E equals native.

.. image:: ./_static/images/gcc-comparison-2.png
:align: center
:target: ./_static/images/gcc-comparison-2.png
:alt: a set of 4 vertical bar charts comparing Light G B M performance versus compilation flags. For most settings of max depth, best performance was achieved with flags dash O 3, dash M T U N E equals native.

.. image:: ./_static/images/gcc-meetup-1.png
:align: center
:target: ./_static/images/gcc-meetup-1.png
:alt: grid of line charts showing the relative speed of training for different combinations of max depth, number of threads, and compilation flags. The grid shows that for models with max depth greater than 5, compiling Light G B M with default compiler flags produces faster training time than any of the customizations explored.

.. image:: ./_static/images/gcc-meetup-2.png
:align: center
:target: ./_static/images/gcc-meetup-2.png
:alt: comparison of cumulative speed versus the slowest density of each algorithm at various depths with v-2, O-3 remaining constant almost in all cases
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these stacked area plots (https://lightgbm.readthedocs.io/en/latest/_static/images/gcc-meetup-2.png) are really difficult to understand, and that this one should just be removed.

@StrikerRUS what do you think?

22 changes: 11 additions & 11 deletions python-package/lightgbm/sklearn.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ def __call__(self, preds, dataset):
elif argc == 3:
grad, hess = self.func(labels, preds, dataset.get_group())
else:
raise TypeError("Self-defined objective function should have 2 or 3 arguments, got %d" % argc)
raise TypeError(f"Self-defined objective function should have 2 or 3 arguments, got {argc}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please remove these unrelated changes from this PR? I think they were meant for #4181.

On whichever branch you use (based on my recommendation in #4181), you can remove them with a command like git reset --soft HEAD~1 followed by a force push.

Let me know if you get stuck!

"""weighted for objective"""
weight = dataset.get_weight()
if weight is not None:
Expand All @@ -88,7 +88,7 @@ def __call__(self, preds, dataset):
num_data = len(weight)
num_class = len(grad) // num_data
if num_class * num_data != len(grad):
raise ValueError("Length of grad and hess should equal to num_class * num_data")
raise ValueError(f"Length of grad and hess should equal to {num_class * num_data}")
for k in range(num_class):
for i in range(num_data):
idx = k * num_data + i
Expand Down Expand Up @@ -171,7 +171,7 @@ def __call__(self, preds, dataset):
elif argc == 4:
return self.func(labels, preds, dataset.get_weight(), dataset.get_group())
else:
raise TypeError("Self-defined eval function should have 2, 3 or 4 arguments, got %d" % argc)
raise TypeError(f"Self-defined eval function should have 2, 3 or 4 arguments, got {argc}")


# documentation templates for LGBMModel methods are shared between the classes in
Expand Down Expand Up @@ -718,11 +718,10 @@ def predict(self, X, raw_score=False, start_iteration=0, num_iteration=None,
if not isinstance(X, (pd_DataFrame, dt_DataTable)):
X = _LGBMCheckArray(X, accept_sparse=True, force_all_finite=False)
n_features = X.shape[1]
if self._n_features != n_features:
raise ValueError("Number of features of the model must "
"match the input. Model n_features_ is %s and "
"input n_features is %s "
% (self._n_features, n_features))
if self._n_features != n_features:
raise ValueError(f"Number of features of the model must " \
f"match the input. Model n_features_ is {self._n_features} and " \
f"input n_features is {n_features} ")
return self._Booster.predict(X, raw_score=raw_score, start_iteration=start_iteration, num_iteration=num_iteration,
pred_leaf=pred_leaf, pred_contrib=pred_contrib, **kwargs)

Expand Down Expand Up @@ -919,9 +918,10 @@ def predict_proba(self, X, raw_score=False, start_iteration=0, num_iteration=Non
"""Docstring is set after definition, using a template."""
result = super().predict(X, raw_score, start_iteration, num_iteration, pred_leaf, pred_contrib, **kwargs)
if callable(self._objective) and not (raw_score or pred_leaf or pred_contrib):
_log_warning("Cannot compute class probabilities or labels "
"due to the usage of customized objective function.\n"
"Returning raw scores instead.")
new_line = "\n"
_log_warning(f"Cannot compute class probabilities or labels " \
f"due to the usage of customized objective function.{new_line}"
f"Returning raw scores instead.")
return result
elif self._n_classes > 2 or raw_score or pred_leaf or pred_contrib:
return result
Expand Down