Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix healing after failed refresh #1465

Merged
merged 18 commits into from
Jun 20, 2023

Conversation

d-uzlov
Copy link
Contributor

@d-uzlov d-uzlov commented Jun 6, 2023

Description

Restart connection monitoring after a failed refresh.

Issue link

Should also at least partially fix this:

How Has This Been Tested?

  • Added unit testing to cover
  • Tested manually
  • Tested by integration testing
  • Have not tested

Types of changes

  • Bug fix
  • New functionality
  • Documentation
  • Refactoring
  • CI

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>
Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>
Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>
Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>
Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>
Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>
Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>
@codecov
Copy link

codecov bot commented Jun 6, 2023

Codecov Report

❗ No coverage uploaded for pull request base (main@6fa2f68). Click here to learn what that means.
Patch has no changes to coverable lines.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1465   +/-   ##
=======================================
  Coverage        ?   70.39%           
=======================================
  Files           ?      248           
  Lines           ?    11166           
  Branches        ?        0           
=======================================
  Hits            ?     7860           
  Misses          ?     2806           
  Partials        ?      500           

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>
Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>
@d-uzlov d-uzlov marked this pull request as ready for review June 14, 2023 11:09
@d-uzlov d-uzlov force-pushed the 1457-restart-monitor branch 3 times, most recently from 5692592 to a38cab0 Compare June 19, 2023 05:21
Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>
Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>
Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>
Comment on lines 114 to 121
func WithRefresh(refreshClient networkservice.NetworkServiceClient) Option {
if refreshClient == nil {
panic("refreshClient cannot be nil")
}
return Option(func(c *clientOptions) {
c.refreshClient = refreshClient
})
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not add a new api for testing goals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines 222 to 223
require.Equal(t, 2, counter.Requests())
require.Equal(t, 0, counter.Closes())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain the change?

Copy link
Contributor Author

@d-uzlov d-uzlov Jun 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified it because this test was unstable, and yesterday the CI was constantly failing.
There was somewhat of a race between the test code and healing.
I figured out that this check isn't really important for the purpose of this test.
Now I reverted this and pushed a better solution which makes the test more deterministic.

pkg/networkservice/chains/nsmgr/select_forwarder_test.go Outdated Show resolved Hide resolved
pkg/networkservice/common/heal/eventloop.go Outdated Show resolved Hide resolved
pkg/networkservice/common/heal/eventloop.go Outdated Show resolved Hide resolved
pkg/networkservice/common/heal/eventloop.go Outdated Show resolved Hide resolved
pkg/networkservice/common/heal/eventloop.go Outdated Show resolved Hide resolved

for {
eventIn, err := cev.client.Recv()
if !needToHeal {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably name inversion could be more clear

Suggested change
if !needToHeal {
if healthy {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if needToHead=false and reselct=true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the event monitoring there are three states:

  1. Context was cancelled, needToHeal == false
  2. Control plane is down, needToHeal == true, reselect == false
  3. Data plane is down, needToHeal == true, reselect == true

Maybe we could add a special enum for these three states, but I think the code would only become more complex, because these 2 variables are checked and changed in different places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably name inversion could be more clear

Fixed.
I modified it to use canceled.

Comment on lines 115 to 127
// We need to force check the DataPlane if a down event was received from the ControlPlane
if !reselect {
deadlineCtx, deadlineCancel := context.WithDeadline(cev.chainCtx, time.Now().Add(cev.heal.livenessCheckTimeout))
if !cev.heal.livenessCheck(deadlineCtx, cev.conn) {
cev.logger.Warnf("Data plane is down")
reselect = true
}
// Otherwise - Start healing
return
deadlineCancel()
}

// Handle event. Start healing
if eventIn.GetConnections()[cev.conn.GetId()].GetState() == networkservice.State_DOWN {
var options []begin.Option
if reselect {
cev.logger.Debugf("Reconnect with reselect")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// We need to force check the DataPlane if a down event was received from the ControlPlane
if !reselect {
deadlineCtx, deadlineCancel := context.WithDeadline(cev.chainCtx, time.Now().Add(cev.heal.livenessCheckTimeout))
if !cev.heal.livenessCheck(deadlineCtx, cev.conn) {
cev.logger.Warnf("Data plane is down")
reselect = true
}
// Otherwise - Start healing
return
deadlineCancel()
}
// Handle event. Start healing
if eventIn.GetConnections()[cev.conn.GetId()].GetState() == networkservice.State_DOWN {
var options []begin.Option
if reselect {
cev.logger.Debugf("Reconnect with reselect")
var options []begin.Option
if reselect {
...
} else {
deadlineCtx, deadlineCancel := context.WithDeadline(cev.chainCtx, time.Now().Add(cev.heal.livenessCheckTimeout))
if !cev.heal.livenessCheck(deadlineCtx, cev.conn) {
cev.logger.Warnf("Data plane is down")
reselect = true
}
deadlineCancel()
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also old code that was moved. Now I reverted this to make it clear that I didn't modify it.
Here reselect is not a constant flag. It is used as a state that could be switched, so it doesn't make sense to use if-else pattern.

pkg/networkservice/common/heal/eventloop.go Outdated Show resolved Hide resolved
Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>
Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>
Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>
Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>
Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>
@denis-tingaikin
Copy link
Member

@glazychev-art Could you have a look?

@denis-tingaikin denis-tingaikin merged commit a8c394e into networkservicemesh:main Jun 20, 2023
17 checks passed
nsmbot pushed a commit to networkservicemesh/cmd-map-ip-k8s that referenced this pull request Jun 20, 2023
…k@main

PR link: networkservicemesh/sdk#1465

Commit: a8c394e
Author: Danil Uzlov
Date: 2023-06-20 21:27:40 +0700
Message:
  - Fix healing after failed refresh (#1465)
* fix dialer

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* add tests for healing after refresh

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* refactor heal context for event loop

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* add heal started flag to heal event lopp

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* start monitoring after refresh error

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* remove test TestNSMGR_RefreshFailed_DataPlaneHealthy

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix coyright

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix linter

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix linter

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* bump ci

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix discover forwarder tests

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* improve heal monitor cleanup

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* remove withrefresh option, use clock in context

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* move monitorCtrlPlane function

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* use explicit returns in waitForEvents

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix typo

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* different fix for select forwarder tests

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* use better variable names in heal monitor

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

---------

Signed-off-by: NSMBot <nsmbot@networkservicmesh.io>
nsmbot pushed a commit to networkservicemesh/cmd-ipam-vl3 that referenced this pull request Jun 20, 2023
…k@main

PR link: networkservicemesh/sdk#1465

Commit: a8c394e
Author: Danil Uzlov
Date: 2023-06-20 21:27:40 +0700
Message:
  - Fix healing after failed refresh (#1465)
* fix dialer

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* add tests for healing after refresh

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* refactor heal context for event loop

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* add heal started flag to heal event lopp

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* start monitoring after refresh error

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* remove test TestNSMGR_RefreshFailed_DataPlaneHealthy

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix coyright

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix linter

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix linter

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* bump ci

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix discover forwarder tests

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* improve heal monitor cleanup

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* remove withrefresh option, use clock in context

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* move monitorCtrlPlane function

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* use explicit returns in waitForEvents

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix typo

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* different fix for select forwarder tests

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* use better variable names in heal monitor

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

---------

Signed-off-by: NSMBot <nsmbot@networkservicmesh.io>
nsmbot pushed a commit to networkservicemesh/sdk-kernel that referenced this pull request Jun 20, 2023
…k@main

PR link: networkservicemesh/sdk#1465

Commit: a8c394e
Author: Danil Uzlov
Date: 2023-06-20 21:27:40 +0700
Message:
  - Fix healing after failed refresh (#1465)
* fix dialer

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* add tests for healing after refresh

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* refactor heal context for event loop

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* add heal started flag to heal event lopp

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* start monitoring after refresh error

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* remove test TestNSMGR_RefreshFailed_DataPlaneHealthy

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix coyright

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix linter

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix linter

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* bump ci

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix discover forwarder tests

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* improve heal monitor cleanup

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* remove withrefresh option, use clock in context

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* move monitorCtrlPlane function

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* use explicit returns in waitForEvents

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix typo

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* different fix for select forwarder tests

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* use better variable names in heal monitor

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

---------

Signed-off-by: NSMBot <nsmbot@networkservicmesh.io>
nsmbot pushed a commit to networkservicemesh/cmd-admission-webhook-k8s that referenced this pull request Jun 20, 2023
…k@main

PR link: networkservicemesh/sdk#1465

Commit: a8c394e
Author: Danil Uzlov
Date: 2023-06-20 21:27:40 +0700
Message:
  - Fix healing after failed refresh (#1465)
* fix dialer

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* add tests for healing after refresh

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* refactor heal context for event loop

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* add heal started flag to heal event lopp

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* start monitoring after refresh error

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* remove test TestNSMGR_RefreshFailed_DataPlaneHealthy

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix coyright

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix linter

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix linter

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* bump ci

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix discover forwarder tests

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* improve heal monitor cleanup

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* remove withrefresh option, use clock in context

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* move monitorCtrlPlane function

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* use explicit returns in waitForEvents

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix typo

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* different fix for select forwarder tests

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* use better variable names in heal monitor

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

---------

Signed-off-by: NSMBot <nsmbot@networkservicmesh.io>
nsmbot pushed a commit to networkservicemesh/cmd-registry-proxy-dns that referenced this pull request Jun 20, 2023
…k@main

PR link: networkservicemesh/sdk#1465

Commit: a8c394e
Author: Danil Uzlov
Date: 2023-06-20 21:27:40 +0700
Message:
  - Fix healing after failed refresh (#1465)
* fix dialer

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* add tests for healing after refresh

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* refactor heal context for event loop

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* add heal started flag to heal event lopp

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* start monitoring after refresh error

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* remove test TestNSMGR_RefreshFailed_DataPlaneHealthy

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix coyright

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix linter

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix linter

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* bump ci

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix discover forwarder tests

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* improve heal monitor cleanup

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* remove withrefresh option, use clock in context

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* move monitorCtrlPlane function

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* use explicit returns in waitForEvents

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix typo

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* different fix for select forwarder tests

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* use better variable names in heal monitor

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

---------

Signed-off-by: NSMBot <nsmbot@networkservicmesh.io>
nsmbot pushed a commit to networkservicemesh/cmd-nse-vfio that referenced this pull request Jun 20, 2023
…k@main

PR link: networkservicemesh/sdk#1465

Commit: a8c394e
Author: Danil Uzlov
Date: 2023-06-20 21:27:40 +0700
Message:
  - Fix healing after failed refresh (#1465)
* fix dialer

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* add tests for healing after refresh

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* refactor heal context for event loop

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* add heal started flag to heal event lopp

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* start monitoring after refresh error

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* remove test TestNSMGR_RefreshFailed_DataPlaneHealthy

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix coyright

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix linter

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix linter

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* bump ci

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix discover forwarder tests

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* improve heal monitor cleanup

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* remove withrefresh option, use clock in context

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* move monitorCtrlPlane function

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* use explicit returns in waitForEvents

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix typo

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* different fix for select forwarder tests

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* use better variable names in heal monitor

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

---------

Signed-off-by: NSMBot <nsmbot@networkservicmesh.io>
nsmbot pushed a commit to networkservicemesh/cmd-nse-remote-vlan that referenced this pull request Jun 20, 2023
…k@main

PR link: networkservicemesh/sdk#1465

Commit: a8c394e
Author: Danil Uzlov
Date: 2023-06-20 21:27:40 +0700
Message:
  - Fix healing after failed refresh (#1465)
* fix dialer

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* add tests for healing after refresh

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* refactor heal context for event loop

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* add heal started flag to heal event lopp

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* start monitoring after refresh error

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* remove test TestNSMGR_RefreshFailed_DataPlaneHealthy

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix coyright

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix linter

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix linter

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* bump ci

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix discover forwarder tests

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* improve heal monitor cleanup

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* remove withrefresh option, use clock in context

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* move monitorCtrlPlane function

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* use explicit returns in waitForEvents

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix typo

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* different fix for select forwarder tests

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* use better variable names in heal monitor

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

---------

Signed-off-by: NSMBot <nsmbot@networkservicmesh.io>
nsmbot pushed a commit to networkservicemesh/cmd-nsmgr-proxy that referenced this pull request Jun 20, 2023
…k@main

PR link: networkservicemesh/sdk#1465

Commit: a8c394e
Author: Danil Uzlov
Date: 2023-06-20 21:27:40 +0700
Message:
  - Fix healing after failed refresh (#1465)
* fix dialer

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* add tests for healing after refresh

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* refactor heal context for event loop

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* add heal started flag to heal event lopp

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* start monitoring after refresh error

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* remove test TestNSMGR_RefreshFailed_DataPlaneHealthy

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix coyright

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix linter

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix linter

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* bump ci

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix discover forwarder tests

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* improve heal monitor cleanup

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* remove withrefresh option, use clock in context

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* move monitorCtrlPlane function

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* use explicit returns in waitForEvents

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix typo

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* different fix for select forwarder tests

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* use better variable names in heal monitor

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

---------

Signed-off-by: NSMBot <nsmbot@networkservicmesh.io>
nsmbot pushed a commit to networkservicemesh/cmd-nsmgr that referenced this pull request Jun 20, 2023
…k@main

PR link: networkservicemesh/sdk#1465

Commit: a8c394e
Author: Danil Uzlov
Date: 2023-06-20 21:27:40 +0700
Message:
  - Fix healing after failed refresh (#1465)
* fix dialer

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* add tests for healing after refresh

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* refactor heal context for event loop

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* add heal started flag to heal event lopp

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* start monitoring after refresh error

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* remove test TestNSMGR_RefreshFailed_DataPlaneHealthy

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix coyright

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix linter

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix linter

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* bump ci

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix discover forwarder tests

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* improve heal monitor cleanup

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* remove withrefresh option, use clock in context

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* move monitorCtrlPlane function

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* use explicit returns in waitForEvents

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix typo

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* different fix for select forwarder tests

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* use better variable names in heal monitor

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

---------

Signed-off-by: NSMBot <nsmbot@networkservicmesh.io>
nsmbot pushed a commit to networkservicemesh/cmd-cluster-info-k8s that referenced this pull request Jun 20, 2023
…k@main

PR link: networkservicemesh/sdk#1465

Commit: a8c394e
Author: Danil Uzlov
Date: 2023-06-20 21:27:40 +0700
Message:
  - Fix healing after failed refresh (#1465)
* fix dialer

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* add tests for healing after refresh

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* refactor heal context for event loop

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* add heal started flag to heal event lopp

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* start monitoring after refresh error

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* remove test TestNSMGR_RefreshFailed_DataPlaneHealthy

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix coyright

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix linter

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix linter

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* bump ci

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix discover forwarder tests

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* improve heal monitor cleanup

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* remove withrefresh option, use clock in context

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* move monitorCtrlPlane function

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* use explicit returns in waitForEvents

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix typo

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* different fix for select forwarder tests

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* use better variable names in heal monitor

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

---------

Signed-off-by: NSMBot <nsmbot@networkservicmesh.io>
nsmbot pushed a commit to networkservicemesh/cmd-nsc-init that referenced this pull request Jun 20, 2023
…k@main

PR link: networkservicemesh/sdk#1465

Commit: a8c394e
Author: Danil Uzlov
Date: 2023-06-20 21:27:40 +0700
Message:
  - Fix healing after failed refresh (#1465)
* fix dialer

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* add tests for healing after refresh

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* refactor heal context for event loop

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* add heal started flag to heal event lopp

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* start monitoring after refresh error

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* remove test TestNSMGR_RefreshFailed_DataPlaneHealthy

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix coyright

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix linter

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix linter

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* bump ci

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix discover forwarder tests

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* improve heal monitor cleanup

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* remove withrefresh option, use clock in context

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* move monitorCtrlPlane function

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* use explicit returns in waitForEvents

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix typo

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* different fix for select forwarder tests

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* use better variable names in heal monitor

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

---------

Signed-off-by: NSMBot <nsmbot@networkservicmesh.io>
nsmbot pushed a commit to networkservicemesh/sdk-k8s that referenced this pull request Jun 20, 2023
…k@main

PR link: networkservicemesh/sdk#1465

Commit: a8c394e
Author: Danil Uzlov
Date: 2023-06-20 21:27:40 +0700
Message:
  - Fix healing after failed refresh (#1465)
* fix dialer

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* add tests for healing after refresh

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* refactor heal context for event loop

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* add heal started flag to heal event lopp

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* start monitoring after refresh error

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* remove test TestNSMGR_RefreshFailed_DataPlaneHealthy

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix coyright

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix linter

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix linter

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* bump ci

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix discover forwarder tests

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* improve heal monitor cleanup

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* remove withrefresh option, use clock in context

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* move monitorCtrlPlane function

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* use explicit returns in waitForEvents

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix typo

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* different fix for select forwarder tests

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* use better variable names in heal monitor

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

---------

Signed-off-by: NSMBot <nsmbot@networkservicmesh.io>
nsmbot pushed a commit to networkservicemesh/cmd-registry-memory that referenced this pull request Jun 20, 2023
…k@main

PR link: networkservicemesh/sdk#1465

Commit: a8c394e
Author: Danil Uzlov
Date: 2023-06-20 21:27:40 +0700
Message:
  - Fix healing after failed refresh (#1465)
* fix dialer

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* add tests for healing after refresh

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* refactor heal context for event loop

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* add heal started flag to heal event lopp

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* start monitoring after refresh error

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* remove test TestNSMGR_RefreshFailed_DataPlaneHealthy

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix coyright

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix linter

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix linter

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* bump ci

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix discover forwarder tests

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* improve heal monitor cleanup

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* remove withrefresh option, use clock in context

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* move monitorCtrlPlane function

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* use explicit returns in waitForEvents

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* fix typo

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* different fix for select forwarder tests

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

* use better variable names in heal monitor

Signed-off-by: Danil Uzlov <DanilUzlov@yandex.ru>

---------

Signed-off-by: NSMBot <nsmbot@networkservicmesh.io>
@d-uzlov d-uzlov deleted the 1457-restart-monitor branch June 21, 2023 06:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants