-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make MPI ring connection retry count configurable #3301
Labels
Comments
Thanks for the suggestion and for using Otherwise we will add this to our feature backlog (#2302) and try to get to it in a coming release |
aakarshg
pushed a commit
to aakarshg/LightGBM
that referenced
this issue
Aug 14, 2020
This allows for network retries, to scale well with the number of machines, and still retains the existing functionality for cases with smaller num_machines ( 500 ) Fixes microsoft#3301
aakarshg
pushed a commit
to aakarshg/LightGBM
that referenced
this issue
Aug 14, 2020
This allows for network retries, to scale well with the number of machines, and still retains the existing functionality for cases with smaller num_machines ( 500 ) Fixes microsoft#3301
aakarshg
pushed a commit
to aakarshg/LightGBM
that referenced
this issue
Aug 14, 2020
This allows for network retries, to scale well with the number of machines, and still retains the existing functionality for cases with smaller num_machines ( 500 ) Fixes microsoft#3301
aakarshg
pushed a commit
to aakarshg/LightGBM
that referenced
this issue
Aug 14, 2020
This allows for network retries, to scale well with the number of machines, and still retains the existing functionality for cases with smaller num_machines ( 500 ) Fixes microsoft#3301
aakarshg
pushed a commit
to aakarshg/LightGBM
that referenced
this issue
Aug 17, 2020
This allows for network retries, to scale well with the number of machines, and still retains the existing functionality for cases with smaller num_machines ( 500 ) Fixes microsoft#3301
aakarshg
pushed a commit
to aakarshg/LightGBM
that referenced
this issue
Aug 27, 2020
This allows for network retries, to scale well with the number of machines, and still retains the existing functionality for cases with smaller num_machines ( 500 ) Fixes microsoft#3301
aakarshg
pushed a commit
to aakarshg/LightGBM
that referenced
this issue
Aug 27, 2020
This allows for network retries, to scale well with the number of machines, and still retains the existing functionality for cases with smaller num_machines ( 500 ) Fixes microsoft#3301
aakarshg
pushed a commit
to aakarshg/LightGBM
that referenced
this issue
Oct 16, 2020
This allows for network retries, to scale well with the number of machines, and still retains the existing functionality for cases with smaller num_machines ( 500 ) Fixes microsoft#3301
aakarshg
pushed a commit
to aakarshg/LightGBM
that referenced
this issue
Oct 16, 2020
This allows for network retries, to scale well with the number of machines, and still retains the existing functionality for cases with smaller num_machines ( 500 ) Fixes microsoft#3301
StrikerRUS
pushed a commit
that referenced
this issue
Oct 17, 2020
This allows for network retries, to scale well with the number of machines, and still retains the existing functionality for cases with smaller num_machines ( 500 ) Fixes #3301 Co-authored-by: Aakarsh Gopi <aakarsh@vaticlabs.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Summary
Currently the maximum number of retries is 20, which might seem enough for normal cases, but when running with large number of machines, this isn't enough. If this can be made into a parameter that can be passed to the config, that'll be helpful
The text was updated successfully, but these errors were encountered: