-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
connectServer Request fails with context canceled error during refresh period #1068
Comments
@Bolodya1997 as discussed, i tried with latest NSM images and issue still exists. Here is the forwarder log:
|
@Bolodya1997 please ignore the above log, looks that happened because of NSM Mgr pod crash.
would keep you posted if the issue is seen again with latest image.. |
Hey, Running a simple setup of 1 nse - 2 nsc's Everything seems to be working fine, but I see lots of error messages from all the nsm-components as well as healing not working properly (not sure if its related or another issue).
Attached logs per component forwarder.txt EDIT: EDIT2: Attached logs after nse restart: nsc2-after-nse-restart.txt ip addr for nse
ip addr for nsc 1
ip addr for nsc 2
|
@yuraxdrumz Which version of NSM are you using? |
I was running v1.1.1 for a week and then upgraded to v1.2.0 a couple of days ago |
@yuraxdrumz And you are seeing the same issue in v1.2.0 after the upgrade? |
@edwarnicke Yep. I just checked and registry, nsmgr, nsc, nsc-init and forwarder are all on latest images. Same issues. |
Started to look into this. Q1: Could you point at examples or steps that you're using to reproduce this? Q2: What do you see in terms of datapath? Are all interfaces correct? Is ping working? |
|
After going through networkservicemesh/deployments-k8s#1999, I see my use case of multiple interfaces with same address is caused by multiple nse's that overwrite each other, so that explains one issue. |
@yuraxdrumz What are the remaining issues you are seeing? |
1 similar comment
@yuraxdrumz What are the remaining issues you are seeing? |
I think that the problem is not that we get "context cancelled" errors but that the code which tries to ignore them isn't working. For example, in the log messages that I get, one of the lines in the backtrace is:
...which is https://github.com/networkservicemesh/sdk/blob/v1.5.0/pkg/registry/core/trace/nse_registry.go#L56 The message is https://github.com/golang/go/wiki/ErrorValueFAQ#how-should-i-change-my-error-handling-code-to-work-with-the-new-features indicates that we should be using "errors.Is()" for this purpose instead of "==" so this change might reduce the logspam. I'm just getting into NSM so I haven't set up a dev environment yet but I should be able to take a look in the next couple of weeks unless someone else gets to it sooner. |
@caboteria Nice catch! Care to push a PR there? :) |
Expected Behavior
The refresh or healing should work seamlessly when endpoint and client are connected and traffic is running fine among those pods.
Current Behavior
The forwarder's
connectServer.Request()
fails withcontext canceled
error for unknown reason, this would cause timeout handler to invokeClose
of forwarder which would teardown the connection between client and endpoint.Failure Information (for bugs)
Steps to Reproduce
Context
Failure Logs
Forwarder logs:
NSMgr logs:
The text was updated successfully, but these errors were encountered: