Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deleting FederatedVirtualServices and FederatedUpstreams can take a few minutes #8670

Closed
jmunozro opened this issue Sep 11, 2023 · 5 comments
Labels
Type: Enhancement New feature or request

Comments

@jmunozro
Copy link
Member

Gloo Edge Product

Enterprise

Gloo Edge Version

1.15.4

Is your feature request related to a problem? Please describe.

Deleting FederatedVirtualServices and FederatedUpstreams can take a few minutes.

I've created 100 of each one, with only one federated cluster:

for i in $(seq 1 100)
do
kubectl apply -f - <<EOF
apiVersion: fed.gloo.solo.io/v1
kind: FederatedUpstream
metadata:
  name: my-federated-upstream-$i
  namespace: gloo-system
spec:
  placement:
    clusters:
      - cluster2
    namespaces:
      - gloo-system
  template:
    spec:
      static:
        hosts:
          - addr: solo$i.io
            port: 80
    metadata:
      name: fed-upstream
EOF
kubectl apply -f - <<EOF
apiVersion: fed.gateway.solo.io/v1
kind: FederatedVirtualService
metadata:
  name: my-federated-vs-$i
  namespace: gloo-system
spec:
  placement:
    clusters:
      - cluster2
    namespaces:
      - gloo-system
  template:
    spec:
      virtualHost:
        domains:
          - "*"
        routes:
          - matchers:
              - exact: /solo$i
            options:
              prefixRewrite: /
            routeAction:
              single:
                upstream:
                  name: httpbin-in-mesh-8000
                  namespace: gloo-system
    metadata:
      name: fed-virtualservice
EOF
done

Then, when I delete one single resource:

# time kubectl -n gloo-system delete federatedupstreams my-federated-upstream-100
federatedupstream.fed.gloo.solo.io "my-federated-upstream-100" deleted

real    1m54.614s
user    0m0.184s
sys     0m0.080s
# glooctl check
Checking deployments... OK
Checking pods... OK
Checking upstreams... OK
Checking upstream groups... OK
Checking auth configs... OK
Checking rate limit configs... OK
Checking VirtualHostOptions... OK
Checking RouteOptions... OK
Checking secrets... OK
Checking virtual services... OK
Checking gateways... OK
Checking proxies... OK
Checking rate limit server... OK
No problems detected.

Detected Gloo Federation!

Checking Gloo Instance cluster2-gloo-system... 
Checking deployments... OK
Checking pods... OK
Checking settings... OK
Checking upstreams... OK
Checking upstream groups... OK
Checking auth configs... OK
Checking virtual services... OK
Checking route tables... OK
Checking gateways... OK
Checking proxies... OK

Describe the solution you'd like

Translation times should be better

Describe alternatives you've considered

gloo.gateway.validation.webhook.skipDeleteValidationResources[] but this is not an acceptable solution

Additional Context

Apparently there is some sort of caching inplace, this is a tests with 1000 federatedupstreams:

root@mgmt:~# time kubectl -n gloo-system delete federatedupstreams my-federated-upstream-88
federatedupstream.fed.gloo.solo.io "my-federated-upstream-88" deleted
    
real    16m35.892s
user    0m0.493s
sys     0m0.122s
root@mgmt:~# time kubectl -n gloo-system delete federatedupstreams my-federated-upstream-89
federatedupstream.fed.gloo.solo.io "my-federated-upstream-89" deleted

real    0m15.927s
user    0m0.189s
sys     0m0.029s
root@mgmt:~# time kubectl -n gloo-system delete federatedupstreams my-federated-upstream-666
federatedupstream.fed.gloo.solo.io "my-federated-upstream-666" deleted

real    2m39.984s
user    0m0.186s
sys     0m0.036s
@jmunozro jmunozro added the Type: Enhancement New feature or request label Sep 11, 2023
@avizov
Copy link

avizov commented Sep 21, 2023

I am afraid that it’s not only the deletion action take time until it propagates to all edges but any global nature action that requires to be federated from the center cluster to the edge clusters. For instance, we were performed a center change of the connectionConfig.commonHttpProtoclOptions.idleTimeout and it took approximately 45 minutes until such change successfully federated from center to all edges and all upstreams. We have 44 edges and ~180 apis.

@nfuden
Copy link
Contributor

nfuden commented Oct 3, 2023

Performance improvements also being handled in https://github.com/solo-io/gloo-mesh-enterprise/issues/12273

@timflannagan
Copy link
Contributor

Looked into this a bit - OSS gloo bumped to 0.15.2 in v1.15.2, so there's a decent change we're hitting the same root cause that GP was seeing a couple of weeks back. The gloo federation reconciler codegen is using the client.MatchingLabels list option when making client list calls, so it's possible we're running into kubernetes-sigs/controller-runtime#2522 here too.

@shahar-h
Copy link

We experienced the slow fus and fvs deletion issue multiple times today.
When looking at the deletion timeframe in Grafana dashboard we can notice that both fvs and fus queue is drained slowly:
image

We can also notice slow reconciliation time for both:
image

@shahar-h
Copy link

Update: the issue in our case was a result of blocked webhook server by network policy. Since webhook failurePolicy was set to ignore it took 10 seconds(default timeout) for each request to webhook server until timeout was reached and then validation was skipped.
After we fixed the network policy issue issue was resolved.

@SantoDE SantoDE closed this as completed Dec 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants