-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimization of row-wise histogram construction #3522
Conversation
…ulti_val_dense_opt
src/io/dataset.cpp
Outdated
} | ||
CHECK(local_offsets.size() == offsets.size()); | ||
for (size_t i = 0; i < local_offsets.size(); ++i) { | ||
CHECK(local_offsets[i] == offsets[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is local_offsets
used for? only for checking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the passed-in offset is already the offset of dense features? why need to re-campute local_offset ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just for checking.
src/io/dataset.cpp
Outdated
sum_dense_ratio /= static_cast<double>(most_freq_bins.size()); | ||
CHECK(local_offsets.size() == offsets.size()); | ||
for (size_t i = 0; i < local_offsets.size(); ++i) { | ||
CHECK(local_offsets[i] == offsets[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the same as above.
Update row-wise and sep-row-wise time. The master branch uses row-wise.
|
Co-authored-by: Guolin Ke <guolin.ke@outlook.com>
…into multi_val_opt
633eb17
to
1e10eb1
Compare
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
This PR is to optimize row-wise histogram construction. For dense multi-value bin, the original bins, without offsets added, are stored, which saves memory when the max bin number per feature is small. For both dense and sparse multi-value, we try to use single precision floating point in histogram buffers, and use avx-256 instructions to speedup histogram construction.