Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance is pretty bad when clicking into the GitOps app in VSCode #407

Closed
onedr0p opened this issue Dec 18, 2022 · 7 comments
Closed

Comments

@onedr0p
Copy link

onedr0p commented Dec 18, 2022

Expected behaviour

Clicking on the GitOps button in VSCode should not cause my computer or vscode to melt 😉

Actual behaviour

The first time I click into the GitOps extension in the VSCode UI vscode slows to a crawl/freezes with the plugin saying "initializing tree view" and it is not just VSCode impacted, it affects chrome and other apps as well as my laptops fan working overtime. This behavior lasts for at least a minute or two with sometimes vscode completely being frozen.

Steps to reproduce

❯ k get ks -A | wc -l
84
❯ k get hr -A | wc -l
81
❯ k get po -A | wc -l
248
❯ k get helmrepositories -A | wc -l
26
❯ k get ocirepositories -A | wc -l
6
❯ k get gitrepositories -A | wc -l
2
  • Install Plugin
  • Click on GitOps icon in VSCode
  • Workstation performance is impacted for at least a minute.

Versions

kubectl version: 1.25.2
Flux version: 0.37.0
Git version: 2.39.0
Azure version: N/A
Extension version: v0.22.5
VSCode version: 1.74.1
Operating System (OS) and its version:
MacBook Pro 2.3 GHz 8-Core Intel Core i9.
13.1 (22C65)

@kingdonb
Copy link
Collaborator

kingdonb commented Feb 2, 2023

Thanks for this report. We've been building out a test environment that has multi-cluster with lots of workloads, that should help us narrow this down in our own testing. There are some "server-side" solutions we're exploring that might make this better broadly speaking. It's a challenge to get direct access to the Kubernetes API, the performance is generally going to be better if we did not need to fork and exec a million times to execute a million queries.

But we can't count on Kubernetes to have a valid certificate for its API, so we need to arrange for all of our requests to go through a proxy (if they're going to come out of the VSCode sandbox.)

@kingdonb
Copy link
Collaborator

kingdonb commented Feb 13, 2023

We released 0.23.1 today. I don't know if it resolved this issue or targeted it specifically, but it would help to get confirmation if this issue is still present.

There is an untold list of fixes in this version, a major refactor went into it, and the code should be much easier for us to maintain from now on in the future 🎉 @juozasg thank you for this work!

My busiest test cluster has about 18 kustomizations and 5 helmreleases, at this point the largest cluster is a cluster that hosts other clusters, has 207 pods but it is divided into sub-cluster tenant "vclusters" that were provisioned with the CAPI provider for vcluster, so none has 80 kustomizations, even though all the pods are on the host cluster because that's how vcluster works.

I'm not sure how to build a test environment that large for the performance testing, but maybe we can scaffold one up inside of a GitHub Action... it will be a great help anyway @onedr0p if you can test this latest version again and let us know how it holds up by comparison.

@kingdonb
Copy link
Collaborator

kingdonb commented Feb 13, 2023

We have been evaluating a number of different approaches that we might solve the performance issues, most important for me is the number of fork and exec calls. I'd like to be able to use the Kubernetes API directly from within JS/TS, but we cannot do this because we know there are limitations in the sandbox model and we have not worked around them yet.

One way to work around would be with a proxy server that has a valid SSL certificate. Many Kubernetes setups actually include this auth proxy in their configuration already, and maybe we could write our extension targeting those if they were broadly available everywhere.

In lieu of that, we can imagine some service like Weave GitOps that sits in the cluster exposing an API for all the Flux commands like flux reconcile and asynchronous status subscriber channel for some status information about each Flux resource. It would have the advantage that, as a separate API service, the Kubernetes cluster cert (which is self-signed) would not stand in the way of using this together with sandboxed code, like we'll have in the VSCode extension.

We can perform batching and avoid forking as many processes as resources you have, if we have a proper API service like this, but as long as we are tied to kubectl the performance is going to be very bad in extreme cases.

Maybe there are some changes we could make short of this great refactor over a Flux API proxy, but right now those are the choices on the table. Creating a separate issue to track "Server-Side Connection" which would hopefully solve this.

@kingdonb
Copy link
Collaborator

We have gone the kubectl proxy route that does not require an SSL cert, because proxy runs on localhost. So in the latest release, 0.25.0, we now connect directly with the cluster and register an informer for each Flux resource in the cluster. The resources are sorted by namespace, and we're aiming to make more changes in the next release so that errors float to the top. We're aiming for full visibility, so while now you can look at a HelmRelease and see that its HelmChart needs attention, you cannot observe the chart itself (though you can reconcile it, that might actually be enough - "Reconcile with Source") - we will aim to add the HelmChart to the expanded list below the HelmRelease, so you can see what events are associated with the HelmChart. It should be a nice experience.

In the 0.25 release we also added the client-node library and substituted it wherever we can for the CLI kubectl approach. This means we have a lot fewer forks and execs, but we may have at least as many threads as before. I think we're going to continue to update the design.

We talked about adopting the reconciler pattern, where there is a fixed concurrency pool and instead of registering more threads for every callback, callbacks go to one of the workers in the concurrency pool and they get queued, or handled when one becomes ready. We don't have a CRD but we can use a record structure and manage it the same way as a CRD. It is still a bit slow on clusters with many resources, I think because we have so many threads. I'm not 100% sure of how it works, @juozasg is the brains behind the implementation now. I've been testing and advising, but he's done the work. 👏

@kingdonb
Copy link
Collaborator

kingdonb commented Jul 31, 2023

If you want to disable flux check (which takes a long time and has more impact on the startup performance than probably anything else now) you can install the prerelease, we have v0.25.1690810685 out now.

The new prerelease has an option for this (enable/disable Flux Check) in the extension preferences, gitops tools preferences. We're going to make performance enhancements in the prerelease branch, and cut a v0.25.1 as soon as possible.

@kingdonb
Copy link
Collaborator

kingdonb commented Sep 7, 2023

Tomorrow's release should resolve this issue permanently. We're going to add performance metrics in the following release, so we can have benchmarks and regression testing around performance.

The features that are going into this release will be documented here, but if you want to try it ahead of the documentation we have published prereleases and the latest today is 🏅 I'd go so far as to call it the release candidate, just looking out for show-stoppers or any last straggler issues we can fix before we pull the lever tomorrow and close a whole bunch of issues.

@kingdonb
Copy link
Collaborator

kingdonb commented Sep 8, 2023

Free to reopen this, or open again if you have a specific case that still has performance issues.

The v0.25.1 release today should solve basically every issue. We are using server-side connection with kubectl proxy and a controller manager for the workloads and sources resource nodes, such that performance is much better and manual refreshing should almost never be required at all. Would appreciate if you can test and let us know how you like it!

@kingdonb kingdonb closed this as completed Sep 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants