Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make MPI ring connection retry count configurable #3301

Closed
aakarshg opened this issue Aug 12, 2020 · 1 comment · Fixed by #3306
Closed

Make MPI ring connection retry count configurable #3301

aakarshg opened this issue Aug 12, 2020 · 1 comment · Fixed by #3306

Comments

@aakarshg
Copy link
Contributor

Summary

Currently the maximum number of retries is 20, which might seem enough for normal cases, but when running with large number of machines, this isn't enough. If this can be made into a parameter that can be passed to the config, that'll be helpful

@jameslamb
Copy link
Collaborator

Thanks for the suggestion and for using LightGBM! We'd welcome a pull request if you're interested in making one.

Otherwise we will add this to our feature backlog (#2302) and try to get to it in a coming release

aakarshg pushed a commit to aakarshg/LightGBM that referenced this issue Aug 14, 2020
This allows for network retries, to scale well with the
number of machines, and still retains the existing functionality
for cases with smaller num_machines ( 500 )

Fixes microsoft#3301
aakarshg pushed a commit to aakarshg/LightGBM that referenced this issue Aug 14, 2020
This allows for network retries, to scale well with the
number of machines, and still retains the existing functionality
for cases with smaller num_machines ( 500 )

Fixes microsoft#3301
aakarshg pushed a commit to aakarshg/LightGBM that referenced this issue Aug 14, 2020
This allows for network retries, to scale well with the
number of machines, and still retains the existing functionality
for cases with smaller num_machines ( 500 )

Fixes microsoft#3301
aakarshg pushed a commit to aakarshg/LightGBM that referenced this issue Aug 14, 2020
This allows for network retries, to scale well with the
number of machines, and still retains the existing functionality
for cases with smaller num_machines ( 500 )

Fixes microsoft#3301
aakarshg pushed a commit to aakarshg/LightGBM that referenced this issue Aug 17, 2020
This allows for network retries, to scale well with the
number of machines, and still retains the existing functionality
for cases with smaller num_machines ( 500 )

Fixes microsoft#3301
aakarshg pushed a commit to aakarshg/LightGBM that referenced this issue Aug 27, 2020
This allows for network retries, to scale well with the
number of machines, and still retains the existing functionality
for cases with smaller num_machines ( 500 )

Fixes microsoft#3301
aakarshg pushed a commit to aakarshg/LightGBM that referenced this issue Aug 27, 2020
This allows for network retries, to scale well with the
number of machines, and still retains the existing functionality
for cases with smaller num_machines ( 500 )

Fixes microsoft#3301
aakarshg pushed a commit to aakarshg/LightGBM that referenced this issue Oct 16, 2020
This allows for network retries, to scale well with the
number of machines, and still retains the existing functionality
for cases with smaller num_machines ( 500 )

Fixes microsoft#3301
aakarshg pushed a commit to aakarshg/LightGBM that referenced this issue Oct 16, 2020
This allows for network retries, to scale well with the
number of machines, and still retains the existing functionality
for cases with smaller num_machines ( 500 )

Fixes microsoft#3301
StrikerRUS pushed a commit that referenced this issue Oct 17, 2020
This allows for network retries, to scale well with the
number of machines, and still retains the existing functionality
for cases with smaller num_machines ( 500 )

Fixes #3301

Co-authored-by: Aakarsh Gopi <aakarsh@vaticlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants