Training with 96 CPU cores is slower than with 48 CPU cores #4631

JivanRoquet · 2021-09-28T11:32:16Z

LightGBM version: 3.2.1
Using LightGBM's Python Scikit-Learn API

Hello, I noticed that the training time is consistently about 50% slower when using a c5.24xlarge instance on AWS (96 cpu cores, 96GB RAM) than when using a c5.12xlarge (48 cpu cores, 192GB RAM).

Model is created with the following settings:

lgb_c = LGBMClassifier(
    max_depth=7,
    n_estimators=1000,
    reg_alpha=0.15,
    reg_lambda=0.15,
    num_leaves=100,
    learning_rate=0.006,
    colsample_bytree=0.8,
    min_child_samples=20,
    objective='multiclass',
    class_weight='balanced',
    importance_type='gain',
    n_jobs=os.cpu_count(),
)

Training is done with these parameters:

lgb_c.fit(
    X_train,
    y_train,
    eval_set=(X_eval, y_eval),
    eval_names=['eval'],
    eval_metric='multiclass',
    early_stopping_rounds=100,
    verbose=50
)

Eval set is about 15k rows.

Training dataset has about 270k rows, with about 18 categorical features (high cardinality, between 200 and 2000 unique elements each) and 2 numeric (integer) features. Target is a categorical feature with about 300 unique categories.

Typical training time with 48 cores: 20 minutes
Typical training time with 96 cores: 30 minutes

In each case, all cores are 100% busy during the whole time of the training.

I would have expected training time to go down with the number of cores. Could this be a bug or is this behaviour normal? Is there any way to fix this using different hyper parameters or settings for the model?

The text was updated successfully, but these errors were encountered:

Laurae2 · 2021-09-28T12:25:50Z

Hello,

As you increase the number of CPU threads, the following requirements increase :

overhead of multithreading
RAM bandwidth used
cores fighting for L1/L2/L3 caches
maintaining turbo boost clocks
maintaining NUMA node memory coherency (for multi CPU setups)
CPU interconnects bandwidth (for multi CPU setups)
and possibly more depending on the hardware/software setup (ex: kernel, heat, power limits, etc.)

In most cases, you are very likely to hit a large multithreading overhead especially with "only" 270k rows and few features (and potentially lower clock rates from using more threads). In this scenario, it will significantly cost more CPU time to dispatch work to threads, and the cost become large enough to supersede the gain of parallelism. It is a normal and expected behavior.

Note that 100% CPU usage reported in task managers / top (etc.) does not mean 100% proper usage of the CPU.

You may want to check the following: szilard/GBM-perf#29 (comment) but also some of my old detailed benchmarks : https://sites.google.com/view/lauraepp/benchmarks/xgb-vs-lgb-oct-2018 (or for instance Laurae2/ml-perf#6 (comment) for simple example on xgboost) for some examples on multithreading scaling.

JivanRoquet · 2021-09-28T12:33:47Z

Update with more data on training speed vs CPU count

CPU Cores	Training time (minutes)
36	22
48	20
72	27
96	27

interestingly I’ve just noticed that

72 cores machine with 48 cores used : 25 minutes
48 cores machine with 48 cores used : 20 minutes

JivanRoquet · 2021-09-28T12:43:01Z

Thanks @Laurae2 for the clear and useful explanation.

JivanRoquet · 2021-09-28T12:44:34Z

So does it mean that in this case (same dataset, same hyper parameters), 20 minutes would likely be the training time lower bound? There's really no way to get significantly below this without changing the hyper parameters?

I've also tried using GPU and it makes everything much, much slower on this "small" dataset, especially because of the high-cardinality categorical variables.

shiyu1994 · 2021-09-28T18:17:43Z

@JivanRoquet Thank you for using LightGBM. Could you please try force_row_wise and force_col_wise options to see if the same conclusion holds on both choices?

JivanRoquet · 2021-09-29T09:09:22Z

Hi @shiyu1994 I'm going to try this, thanks for the suggestion.

no-response · 2021-12-10T12:39:29Z

This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one. Thank you for taking the time to improve LightGBM!

github-actions · 2023-08-23T14:07:48Z

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

jameslamb added efficiency question labels Sep 28, 2021

StrikerRUS added the awaiting response label Nov 10, 2021

no-response bot closed this as completed Dec 10, 2021

github-actions bot removed the awaiting response label Aug 23, 2023

github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training with 96 CPU cores is slower than with 48 CPU cores #4631

Training with 96 CPU cores is slower than with 48 CPU cores #4631

JivanRoquet commented Sep 28, 2021 •

edited

Loading

Laurae2 commented Sep 28, 2021

JivanRoquet commented Sep 28, 2021 •

edited

Loading

JivanRoquet commented Sep 28, 2021

JivanRoquet commented Sep 28, 2021 •

edited

Loading

shiyu1994 commented Sep 28, 2021

JivanRoquet commented Sep 29, 2021

no-response bot commented Dec 10, 2021

github-actions bot commented Aug 23, 2023

Training with 96 CPU cores is slower than with 48 CPU cores #4631

Training with 96 CPU cores is slower than with 48 CPU cores #4631

Comments

JivanRoquet commented Sep 28, 2021 • edited Loading

Laurae2 commented Sep 28, 2021

JivanRoquet commented Sep 28, 2021 • edited Loading

JivanRoquet commented Sep 28, 2021

JivanRoquet commented Sep 28, 2021 • edited Loading

shiyu1994 commented Sep 28, 2021

JivanRoquet commented Sep 29, 2021

no-response bot commented Dec 10, 2021

github-actions bot commented Aug 23, 2023

JivanRoquet commented Sep 28, 2021 •

edited

Loading

JivanRoquet commented Sep 28, 2021 •

edited

Loading

JivanRoquet commented Sep 28, 2021 •

edited

Loading