Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LightGBM uses entire host's available cores as one executor's configured cpu cores #346

Closed
ywskycn opened this issue Jul 16, 2018 · 3 comments
Labels

Comments

@ywskycn
Copy link
Contributor

ywskycn commented Jul 16, 2018

When calculating # cores for each executor, currently LightGBMUtils uses Java's Runtime API, which returns all CPU cores in the physical host I think (https://github.com/Azure/mmlspark/blob/master/src/lightgbm/src/main/scala/LightGBMUtils.scala#L127). From the function comment, it mentioned "this is more reliable than getting value from conf". But here we should use the value from the conf, right? @imatiach-msft

@mhamilton723
Copy link
Collaborator

To me you seem to be right and the conf seems like the way to go, though it seems like there was some good reason to avoid it. @imatiach-msft what did you mean by "more reliable"?

@imatiach-msft
Copy link
Contributor

@ywskycn @mhamilton723 do you mean to use the "spark.executor.cores" configuration property? But it seems that it is not set in all cluster modes, eg for standalone mode:
https://spark.apache.org/docs/latest/spark-standalone.html#executors-scheduling

When spark.executor.cores is explicitly set, multiple executors from the same application may be launched on the same worker if the worker has enough cores and memory. Otherwise, each executor grabs all the cores available on the worker by default

Maybe I misunderstood? What do you think would be the most reliable way to get it?
This is the suggested way here:
https://stackoverflow.com/questions/47399087/spark-get-number-of-cluster-cores-programmatically
Although I do see this comment:

sometimes executor cores are overprovisioned or underprovisioned, which means JVM runtime function may be inaccurate

I don't quite understand what they mean by [over/under]provisioned however.

@imatiach-msft
Copy link
Contributor

imatiach-msft commented Oct 22, 2018

@humbinal @land1725 @thusithaC @ywskycn closing this issue as it should be resolved with #404 based on @humbinal 's recommendation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants