-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training with 96 CPU cores is slower than with 48 CPU cores #4631
Comments
Hello, As you increase the number of CPU threads, the following requirements increase :
In most cases, you are very likely to hit a large multithreading overhead especially with "only" 270k rows and few features (and potentially lower clock rates from using more threads). In this scenario, it will significantly cost more CPU time to dispatch work to threads, and the cost become large enough to supersede the gain of parallelism. It is a normal and expected behavior. Note that 100% CPU usage reported in task managers / top (etc.) does not mean 100% proper usage of the CPU. You may want to check the following: szilard/GBM-perf#29 (comment) but also some of my old detailed benchmarks : https://sites.google.com/view/lauraepp/benchmarks/xgb-vs-lgb-oct-2018 (or for instance Laurae2/ml-perf#6 (comment) for simple example on xgboost) for some examples on multithreading scaling. |
Update with more data on training speed vs CPU count
interestingly I’ve just noticed that
|
Thanks @Laurae2 for the clear and useful explanation. |
So does it mean that in this case (same dataset, same hyper parameters), 20 minutes would likely be the training time lower bound? There's really no way to get significantly below this without changing the hyper parameters? I've also tried using GPU and it makes everything much, much slower on this "small" dataset, especially because of the high-cardinality categorical variables. |
@JivanRoquet Thank you for using LightGBM. Could you please try force_row_wise and force_col_wise options to see if the same conclusion holds on both choices? |
Hi @shiyu1994 I'm going to try this, thanks for the suggestion. |
This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one. Thank you for taking the time to improve LightGBM! |
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Hello, I noticed that the training time is consistently about 50% slower when using a
c5.24xlarge
instance on AWS (96 cpu cores, 96GB RAM) than when using ac5.12xlarge
(48 cpu cores, 192GB RAM).Model is created with the following settings:
Training is done with these parameters:
Eval set is about 15k rows.
Training dataset has about 270k rows, with about 18 categorical features (high cardinality, between 200 and 2000 unique elements each) and 2 numeric (integer) features. Target is a categorical feature with about 300 unique categories.
In each case, all cores are 100% busy during the whole time of the training.
I would have expected training time to go down with the number of cores. Could this be a bug or is this behaviour normal? Is there any way to fix this using different hyper parameters or settings for the model?
The text was updated successfully, but these errors were encountered: