Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration tests are not stable after adding OPA policy for registry feature #721

Open
NikitaSkrynnik opened this issue Sep 20, 2022 · 0 comments

Comments

@NikitaSkrynnik
Copy link
Collaborator

NikitaSkrynnik commented Sep 20, 2022

Description

Heal tests with nsmgr restart are not stable. When we are deleting old nsmgr, it doesn't unregister its forwarder from registry. New nsmgr (with new spiffeID) can't register this forwarder again, because it's already registered by old nsmgr (with old spiffeID).

How to reproduce

  1. Run basic nsm setup
  2. Delete one of NSMGRs
  3. Wait for new NSMGR start
  4. See logs in registry pod

Expected behavior

NSMGR registers its forwarder successfully

Actual behavior

NSMGR can't register its forwarder, because it's already registered by old deleted nsmgr

[TRAC] [type:registry] (2.1)    register={"name":"forwarder-vpp-6tpfn","network_service_names":["forwarder"],"network_service_labels":{"forwarder":{"labels":{"nodeName":"kind-worker","p2p":"true"}}},"url":"tcp://10.244.1.109:5001","expiration_time":{"seconds":1663668686,"nanos":268959800},"initial_registration_time":{"seconds":1663668484,"nanos":824354244}}
[INFO] [type:registry] (2.2)    AUTHORIZE spiffieIDNSEsMap: map[spiffe://example.org/ns/nsm-system/pod/nsmgr-h7h5k:[forwarder-vpp-x565x nse-kernel-56c9ffbff5-bwpvs] spiffe://example.org/ns/nsm-system/pod/nsmgr-tcx5p:[forwarder-vpp-6tpfn]]
[INFO] [type:registry] (2.3)    AUTHORIZE spiffieID: spiffe://example.org/ns/nsm-system/pod/nsmgr-tkc59
[INFO] [type:registry] (2.4)    AUTHORIZE nseName: forwarder-vpp-6tpfn
[ERRO] [type:registry] (2.5)    rpc error: code = PermissionDenied desc = no sufficient privileges;	Error returned from sdk/pkg/registry/common/authorize/authorizeNSEServer.Register;	github.com/networkservicemesh/sdk/pkg/registry/core/trace.logError;		/build/local/sdk/pkg/registry/core/trace/common.go:38;	github.com/networkservicemesh/sdk/pkg/registry/core/trace.(*traceNetworkServiceEndpointRegistryServer).Register;		/build/local/sdk/pkg/registry/core/trace/nse_registry.go:129;	github.com/networkservicemesh/sdk/pkg/registry/core/next.(*nextNetworkServiceEndpointRegistryServer).Register;		/build/local/sdk/pkg/registry/core/next/nse_registry_server.go:59;	github.com/networkservicemesh/sdk/pkg/registry/common/begin.(*beginNSEServer).Register.func2;		/build/local/sdk/pkg/registry/common/begin/nse_server.go:64;	github.com/edwarnicke/serialize.(*Executor).process;		/go/pkg/mod/github.com/edwarnicke/serialize@v1.0.7/serialize.go:68;	runtime.goexit;		/usr/local/go/src/runtime/asm_amd64.s:1571;	
[ERRO] [type:registry] (1.2)   Error returned from sdk/pkg/registry/common/authorize/authorizeNSEServer.Register: rpc error: code = PermissionDenied desc = no sufficient privileges

Possible solultions

  1. Wait until forwarder entry in registry is expired
  2. Add feature for heal servers: immediately delete entry from registry if the connection with endpoint is lost
  3. Disable OPA authorization for cmd-registry-k8s
  4. Add a possibilty for managers to bypass OPA policies
  5. Add roles to pods using spire. Add special role "nsmgr". Accept all register requests in OPA policies if role == "nsmgr" (For example, we can change spiffeID for some pods by adding special labels https://github.com/spiffe/spire/blob/main/support/k8s/k8s-workload-registrar/mode-crd/README.md#label-based-workload-registration)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant