Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forwarder breaks NSMgr restore case #970

Closed
Bolodya1997 opened this issue Jun 8, 2021 · 9 comments
Closed

Forwarder breaks NSMgr restore case #970

Bolodya1997 opened this issue Jun 8, 2021 · 9 comments
Assignees
Labels
ASAP The issue that blocking SOW items or core use-cases

Comments

@Bolodya1997
Copy link

Expected Behavior

NSMgr restore case should restore the NSC -> NSE connection.

Current Behavior

If Forwarder starts restoring NSMgr earlier than NSC / local NSMgr it results in new connection IDs and +2 path length.

Failure Information (for bugs)

  1. Forwarder requests restarted NSMgr.
  2. NSMgr interpose chain element doesn't know that it was a Forwarder.
  3. NSMgr requests Forwarder.
  4. Forwarder requests NSMgr (+2 path length, new connection IDs).

Steps to Reproduce

  1. Run kernel - vxlan - kernel case.
  2. Restart remote NSMgr.
  3. Wait for 60 seconds to make heal working.
  4. Try to ping NSE from NSC.
  5. Ping fails.

Context

Packet automation cluster.

@Bolodya1997 Bolodya1997 added the ASAP The issue that blocking SOW items or core use-cases label Jun 8, 2021
@Bolodya1997 Bolodya1997 self-assigned this Jun 8, 2021
@denis-tingaikin
Copy link
Member

@Bolodya1997 Do you have any idea how to fix this?

@Bolodya1997
Copy link
Author

@Bolodya1997 Do you have any idea how to fix this

We can disable restore on Forwarder, because it actually doesn't make any sense:

NSMgr -(1)-> Forwarder -(2)-> NSMgr

If we have both [1, 2] failed, we probably need to first heal [1] and after heal [2]. Currently, Forwarder restore blocks [1] healing until [2] would be healed, though [1] healing contains [2] healing.

@denis-tingaikin
Copy link
Member

I would be suggesting to do next thing:

  1. add if branch into interpose
  2. use grpc.FromPeer to check if it is a known forwarder(just look into interpose sync map).
  3. if it is a known forwarder then all fine and do what we do now in interpose
  4. if it is an unknown forwarder then we do return the connection from the request. (it is possible only in restore race healing case)

It is also a bit ugly solution but in comparison with #971 we'll not need to modify each forwarder application.

The best solution to this problem is just to switch to a multipoint architecture.

@Bolodya1997
Copy link
Author

Bolodya1997 commented Jun 8, 2021

I would be suggesting to do next thing:

  1. add if branch into interpose
  2. use grpc.FromPeer to check if it is a known forwarder(just look into interpose sync map).
  3. if it is a known forwarder then all fine and do what we do now in interpose
  4. if it is an unknown forwarder then we do return the connection from the request. (it is possible only in restore race healing case)

It is also a bit ugly solution but in comparison with #971 we'll not need to modify each forwarder application.

The best solution to this problem is just to switch to a multipoint architecture.

Exactly this solution wouldn't work, because we cannot use grpc.FromPeer to check if it is an unknown Forwarder or NSC/local NSMgr - it will be the same, so we can just receive a Request from the unregistered Forwarder and so go to some existing another Forwarder.

Probably we can try to use path segments names to understand that the Request was originated in Forwarder? But it looks like a very dirty solution.

@denis-tingaikin
Copy link
Member

/cc @edwarnicke

@denis-tingaikin
Copy link
Member

Exactly this solution wouldn't work, because we cannot use grpc.FromPeer to check if it is an unknown Forwarder or

Could you explain why?

We'll have forwarder's URL here: https://github.com/networkservicemesh/sdk/blob/main/pkg/networkservice/common/interpose/server.go#L39
So then we can compare the URL with the URL from grpc.FromContext.

@Bolodya1997
Copy link
Author

Bolodya1997 commented Jun 9, 2021

We'll have forwarder's URL here: https://github.com/networkservicemesh/sdk/blob/main/pkg/networkservice/common/interpose/server.go#L39
So then we can compare the URL with the URL from grpc.FromContext.

If Forwarder originates Request, it doesn't mean that it already has reregistered itself. So in such case we wouldn't have Forwarder URL in map.

@denis-tingaikin
Copy link
Member

denis-tingaikin commented Jun 9, 2021

If Forwarder originates Request, it doesn't mean that it already has reregistered itself. So in such case we wouldn't have Forwarder URL in map.

Sorry, I mean that we can use peer.FromContext to handle this scenario...

For example:

  1. If we got tcp scheme from the context then we do what we do currently in interpose.
  2. if we got unix scheme and our path index is not equal 3 then we got forwarder then we can do proposal Forwarder breaks NSMgr restore case #970 (comment)
  3. otherwise do what we do currently in interpose.

@Bolodya1997
Copy link
Author

Closed by #971.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ASAP The issue that blocking SOW items or core use-cases
Projects
None yet
Development

No branches or pull requests

2 participants