Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add memory test doc for 1.18 #1196

Merged
merged 1 commit into from
May 7, 2024

Conversation

tiffanny29631
Copy link
Contributor

No description provided.

@@ -0,0 +1,34 @@
# Config Sync Memory Usage Reduction v1.16 vs v1.18

Config Sync v1.18 contains change to stop loading OpenAPI for schema validations when Config Sync admission webhook is not enabled. Here are some test results that shows the reduction in the memory usage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the problem first?

How does using OpenAPI increase the memory? In what use cases?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nan-yu did profiling previously and loading the OpenAPI consumes a sizable fraction of the memory.


#### Test Repository

[Config Sync Quickstart](https://github.com/GoogleCloudPlatform/anthos-config-management-samples/tree/main/config-sync-quickstart)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many objects is this syncing? How many different resource types? How many namespace?

These numbers also affect memory usage. So having tests with varying amounts would help show how much reduction is gained at various scales. Is the % reduction constant or dependent on other variables?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Within this specific repo there are 30 objects with various kinds for the rootsync, I'll test with a larger count to see if the reduction rate changes. Not sure if the resource type would make a difference, worth a try. Cluster CRDs are the primary cause of decreased memory usage in this case. This reduction scales with CRD load, as schema validation processes all of them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does memory usage scale just with the CRDs managed by the current reconciler or also with any CRDs on the cluster?

FWIW, the applier watch cache memory usage increases with any objects on the cluster in an overlapping resources and namespace (it's actually even more complicated than this due to RootSyncs often using cluster-scoped watches). So I'm trying to figure out how isolated your tests are on those dimentions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage in this case scales with the any CRDs on the cluster, not the CRDs managed by the reconciler. With the test CRDs on the AutoPilot clusters the reconciler will get OOM killed with the default memory request / limit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many CRDs can it handle with the default resources on autopilot?

Does the size of the CRDs matter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could enhance the dataset with your work in #1187 later.

@tiffanny29631
Copy link
Contributor Author

/retest-required

Copy link
Contributor

@janetkuo janetkuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: janetkuo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tiffanny29631
Copy link
Contributor Author

/hold

@janetkuo
Copy link
Contributor

janetkuo commented May 7, 2024

This doc is good for now, given that it includes the scope of tests being performed. We can add more types of benchmarking tests later.

@tiffanny29631
Copy link
Contributor Author

/retest-required

@google-oss-prow google-oss-prow bot merged commit 263bc73 into GoogleContainerTools:main May 7, 2024
3 checks passed
@tiffanny29631 tiffanny29631 deleted the mem-doc branch June 4, 2024 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants