Deleting FederatedVirtualServices and FederatedUpstreams can take a few minutes #8670

jmunozro · 2023-09-11T11:14:33Z

Gloo Edge Product

Enterprise

Gloo Edge Version

1.15.4

Is your feature request related to a problem? Please describe.

Deleting FederatedVirtualServices and FederatedUpstreams can take a few minutes.

I've created 100 of each one, with only one federated cluster:

for i in $(seq 1 100)
do
kubectl apply -f - <<EOF
apiVersion: fed.gloo.solo.io/v1
kind: FederatedUpstream
metadata:
  name: my-federated-upstream-$i
  namespace: gloo-system
spec:
  placement:
    clusters:
      - cluster2
    namespaces:
      - gloo-system
  template:
    spec:
      static:
        hosts:
          - addr: solo$i.io
            port: 80
    metadata:
      name: fed-upstream
EOF
kubectl apply -f - <<EOF
apiVersion: fed.gateway.solo.io/v1
kind: FederatedVirtualService
metadata:
  name: my-federated-vs-$i
  namespace: gloo-system
spec:
  placement:
    clusters:
      - cluster2
    namespaces:
      - gloo-system
  template:
    spec:
      virtualHost:
        domains:
          - "*"
        routes:
          - matchers:
              - exact: /solo$i
            options:
              prefixRewrite: /
            routeAction:
              single:
                upstream:
                  name: httpbin-in-mesh-8000
                  namespace: gloo-system
    metadata:
      name: fed-virtualservice
EOF
done

Then, when I delete one single resource:

# time kubectl -n gloo-system delete federatedupstreams my-federated-upstream-100
federatedupstream.fed.gloo.solo.io "my-federated-upstream-100" deleted

real    1m54.614s
user    0m0.184s
sys     0m0.080s

# glooctl check
Checking deployments... OK
Checking pods... OK
Checking upstreams... OK
Checking upstream groups... OK
Checking auth configs... OK
Checking rate limit configs... OK
Checking VirtualHostOptions... OK
Checking RouteOptions... OK
Checking secrets... OK
Checking virtual services... OK
Checking gateways... OK
Checking proxies... OK
Checking rate limit server... OK
No problems detected.

Detected Gloo Federation!

Checking Gloo Instance cluster2-gloo-system... 
Checking deployments... OK
Checking pods... OK
Checking settings... OK
Checking upstreams... OK
Checking upstream groups... OK
Checking auth configs... OK
Checking virtual services... OK
Checking route tables... OK
Checking gateways... OK
Checking proxies... OK

Describe the solution you'd like

Translation times should be better

Describe alternatives you've considered

gloo.gateway.validation.webhook.skipDeleteValidationResources[] but this is not an acceptable solution

Additional Context

Apparently there is some sort of caching inplace, this is a tests with 1000 federatedupstreams:

root@mgmt:~# time kubectl -n gloo-system delete federatedupstreams my-federated-upstream-88
federatedupstream.fed.gloo.solo.io "my-federated-upstream-88" deleted
    
real    16m35.892s
user    0m0.493s
sys     0m0.122s
root@mgmt:~# time kubectl -n gloo-system delete federatedupstreams my-federated-upstream-89
federatedupstream.fed.gloo.solo.io "my-federated-upstream-89" deleted

real    0m15.927s
user    0m0.189s
sys     0m0.029s
root@mgmt:~# time kubectl -n gloo-system delete federatedupstreams my-federated-upstream-666
federatedupstream.fed.gloo.solo.io "my-federated-upstream-666" deleted

real    2m39.984s
user    0m0.186s
sys     0m0.036s

The text was updated successfully, but these errors were encountered:

avizov · 2023-09-21T00:11:13Z

I am afraid that it’s not only the deletion action take time until it propagates to all edges but any global nature action that requires to be federated from the center cluster to the edge clusters. For instance, we were performed a center change of the connectionConfig.commonHttpProtoclOptions.idleTimeout and it took approximately 45 minutes until such change successfully federated from center to all edges and all upstreams. We have 44 edges and ~180 apis.

nfuden · 2023-10-03T14:55:59Z

Performance improvements also being handled in https://github.com/solo-io/gloo-mesh-enterprise/issues/12273

timflannagan · 2023-10-06T15:17:24Z

Looked into this a bit - OSS gloo bumped to 0.15.2 in v1.15.2, so there's a decent change we're hitting the same root cause that GP was seeing a couple of weeks back. The gloo federation reconciler codegen is using the client.MatchingLabels list option when making client list calls, so it's possible we're running into kubernetes-sigs/controller-runtime#2522 here too.

shahar-h · 2023-10-11T20:16:36Z

We experienced the slow fus and fvs deletion issue multiple times today.
When looking at the deletion timeframe in Grafana dashboard we can notice that both fvs and fus queue is drained slowly:

We can also notice slow reconciliation time for both:

shahar-h · 2023-11-15T09:06:25Z

Update: the issue in our case was a result of blocked webhook server by network policy. Since webhook failurePolicy was set to ignore it took 10 seconds(default timeout) for each request to webhook server until timeout was reached and then validation was skipped.
After we fixed the network policy issue issue was resolved.

jmunozro added the Type: Enhancement New feature or request label Sep 11, 2023

SantoDE closed this as completed Dec 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deleting FederatedVirtualServices and FederatedUpstreams can take a few minutes #8670

Deleting FederatedVirtualServices and FederatedUpstreams can take a few minutes #8670

jmunozro commented Sep 11, 2023

avizov commented Sep 21, 2023 •

edited

Loading

nfuden commented Oct 3, 2023

timflannagan commented Oct 6, 2023

shahar-h commented Oct 11, 2023

shahar-h commented Nov 15, 2023

Deleting FederatedVirtualServices and FederatedUpstreams can take a few minutes #8670

Deleting FederatedVirtualServices and FederatedUpstreams can take a few minutes #8670

Comments

jmunozro commented Sep 11, 2023

Gloo Edge Product

Gloo Edge Version

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional Context

avizov commented Sep 21, 2023 • edited Loading

nfuden commented Oct 3, 2023

timflannagan commented Oct 6, 2023

shahar-h commented Oct 11, 2023

shahar-h commented Nov 15, 2023

avizov commented Sep 21, 2023 •

edited

Loading