-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CheckOnHostCommand: add missing timeout setting #9677
base: 4.19
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## 4.19 #9677 +/- ##
============================================
+ Coverage 15.08% 15.11% +0.02%
+ Complexity 11192 11190 -2
============================================
Files 5406 5406
Lines 473215 473214 -1
Branches 61680 58585 -3095
============================================
+ Hits 71386 71521 +135
- Misses 393880 393883 +3
+ Partials 7949 7810 -139
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
The new CheckOnHostCommand constructor was missing a reasonable timeout value, which meant it would fallback to the wait (1800s) timeout. On a Linstor cluster this resulted in over 15 minutes wait time until a host was recognized as down. With timeout of 20s (as the other constructor) it takes 4-5 mins for a host to become recognized as down.
5ce9077
to
eca66f8
Compare
@blueorangutan package |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code lgtm
@blueorangutan package |
@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 11163 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code LGTM but I haven't tested it
@blueorangutan package |
@rohityadavcloud a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 11374 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clgtm
@blueorangutan test |
@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests |
[SF] Trillian test result (tid-11709)
|
Description
The new CheckOnHostCommand constructor was missing a reasonable timeout value, which meant it would fallback to the wait (1800s) timeout. On a Linstor cluster this resulted in over 15 minutes wait time until a host was recognized as down.
With timeout of 20s (as the other constructor) it takes 4-5 mins for a host to become recognized as down.
Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
Screenshots (if appropriate):
How Has This Been Tested?
Failover tests (force shutdown of a host) in a Linstor cluster.
How did you try to break this feature and the system with this change?