Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix network exception handling and fencing flow logic. #254

Merged
merged 1 commit into from
Apr 25, 2022

Conversation

emesika
Copy link
Member

@emesika emesika commented Apr 10, 2022

This patch fixes network exception handling and fencing flow logic.
Problems in current code:

1. Hard fencing happens too fast since we waited on number of
    attempts <or> grace period, since number of attempts is configured to a
    value of "2", grace period was ~20 seconds.

2. VdsManager::isHostInGracePeriod was called periodically from
    VdsManager:handleNetworkExeception and from
    SsshSoftFencingCommand::checkIfHostBecomeUp which makes the logic
    complex in not working as expected

While we have to handle the network exception grace period when the host
is switched to 'connecting' state due to its load regarding number of
running VMs and SPM status, in the case of soft-fencing flow, the host
is already in not-responding status, other host already took the SPM
role and all its running VMs set to 'unknown' status. So we should not
consider the host load at all and a fixed grace period (configurable 1
min) is enough to restart the vdsmd service on the host and get it up
and running.

Solution was tested with host as SPM with running VMs (some are HA),
with a non SPM host running VMs and with a regular host.

Results:

  1. Both initial grace between connecting and non-responding and between
    soft-fencing and hard-fencing are honored.

  2. Code is more readable and straight foreword

Signed-off-by: Eli Mesika emesika@redhat.com
Bug-Url: https://bugzilla.redhat.com/2071468

This patch fixes network exception handling and fencing flow logic.
Problems in current code:

    1. Hard fencing happens too fast since we waited on number of
attempts <or> grace period, since number of attempts is configured to a
value of "2", grace period was ~20 seconds.

    2. VdsManager::isHostInGracePeriod was called periodically from
VdsManager:handleNetworkExeception and from
SsshSoftFencingCommand::checkIfHostBecomeUp which makes the logic
complex in not working as expected
While we have to handle the network exception grace period when the host
is switched to 'connecting' state due to its load regarding number of
running VMs and SPM status, in the case of soft-fencing flow, the host
is already in not-responding status, other host already took the SPM
role and all its running VMs set to 'unknown' status. So we should not
consider the host load at all and a fixed grace period (configurable 1
min) is enough to restart the vdsmd service on the host and get it up
and running.

Solution was tested with host as SPM with running VMs (some are HA),
with a non SPM host running VMs and with a regular host.

Results:

1. Both initial grace between connecting and non-responding and between
soft-fencing and hard-fencing are honored.

2. Code is more readable and straight foreword

Signed-off-by: Eli Mesika <emesika@redhat.com>
Bug-Url: https://bugzilla.redhat.com/2071468
Copy link
Member

@mwperina mwperina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@mwperina mwperina merged commit 292e637 into oVirt:master Apr 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants