-
Notifications
You must be signed in to change notification settings - Fork 690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
internal/dag: validate Secrets only when used #4788
Conversation
Still testing this, but putting up as a draft in case anyone wants to check out the approach. |
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #4788 +/- ##
==========================================
+ Coverage 76.16% 76.20% +0.04%
==========================================
Files 140 140
Lines 16914 16921 +7
==========================================
+ Hits 12882 12895 +13
+ Misses 3780 3776 -4
+ Partials 252 250 -2
|
kc.secrets[k8s.NamespacedNameOf(obj)] = obj | ||
// Secret validation status is intentionally cleared, it needs | ||
// to be re-validated after an insert. | ||
kc.secrets[k8s.NamespacedNameOf(obj)] = &Secret{Object: obj} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we end up caching more Secrets this way, but we never look at them unless they're actually referenced. I'm not sure it matters all that much memory-wise, since these are all in the controller-runtime cache already anyway, and we're just storing pointers to them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah even if it stores a few more references, we do gain some information since you have the validation status cached which stops redoing some work if a secret is used in multiple resources
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noticing a regression in Contour's resource usage after upgrading to contour-v1.24.1 (from contour-v1.22.x) . IMO, above change could be the reason ? Contour's memory usage in clusters having very large secret resources (like secrets of type helm.sh/release.v1
) keeps growing and leading to OOM due to reaching container's resources.memory.limits. contour-v1.22.x on the same cluster (with similar K8s resources profile and large secrets) didn't had this issue.
1.) heap pprof from a Contour-v1.24.1 instance shows 3GB+ in use memory towards v1.Secrets resources unmarshalling ( the above change is holding back references to secrets in informer cache, preventing the unmarshall'ed bytes from being go-GC'ed ?) :
2.) memory usage graph (before 03/06 12 pm and after):
3.) Also, the cpu usage of Contour has gone up considerably since the v1.22.x to v1.24.1 upgrade (before 03/06 12pm and after) :
Also cc @tsaarni (who's related work on filtering out unwanted Secrets from the informer cache is also relevant, IMO).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On further debugging, this turned out to be a bug in a controller causing it to patch Secret resources in a tight loop(more than 20 times per second). since the cluster in question went through the upgrade of multiple components including Contour, the change in contour's resource usage was wrongly attributed to contour upgrade change.
but, IMO, contour should still filter out unrelated secrets like helm release secrets which are very large, from its secret informer cache, to prevent contour being affected with high resource usage and event processing overhead from those unrelated Secrets resources in a cluster. So, I hope we can take tsaarni's PR forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebased, should be ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to summarize:
- instead of saving secrets directly, we save secrets with their validation errors
- secrets can simultaneously be of different types
- secrets no longer validated when inserted to dag, which means more secret references
- secrets validated when validating resources that reference them
got := Result{Valid: valid, Err: err} | ||
|
||
assert.Equal(t, want, got) | ||
assert.Equal(t, tc.tlsSecretError, validTLSSecret(tc.secret)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice way of testing this, since the way the Secret type is constructed it means we can have a single secret used in different ways
@@ -928,7 +928,10 @@ func (s *ServiceCluster) Rebalance() { | |||
// Secret represents a K8s Secret for TLS usage as a DAG Vertex. A Secret is | |||
// a leaf in the DAG. | |||
type Secret struct { | |||
Object *v1.Secret | |||
Object *v1.Secret | |||
ValidTLSSecret *SecretValidationStatus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this has the wrapper around the error since we need the distinction between not yet set, no error, and error
Signed-off-by: Steve Kriss <krisss@vmware.com>
Signed-off-by: Steve Kriss <krisss@vmware.com>
Signed-off-by: Steve Kriss <krisss@vmware.com>
Signed-off-by: Steve Kriss <krisss@vmware.com>
Signed-off-by: Steve Kriss <krisss@vmware.com>
Signed-off-by: Steve Kriss <krisss@vmware.com>
Signed-off-by: Steve Kriss <krisss@vmware.com>
Changes Secret validation to happen when Secrets are used by processors, rather than as they are added to the DAG cache. Also caches the validation results on the DAG objects for efficiency. This eliminates the need for Contour to inspect Secrets that are not used by an Ingress/HTTPProxy/Gateway. It also replaces the confusing "Secret not found" errors with specific validation errors when a Secret does exist, but is not valid. This also allows Secret validation logic to be more precise, since we know what the processors are trying to use them for. Fixes projectcontour#2616. Fixes projectcontour#2908. Signed-off-by: Steve Kriss <krisss@vmware.com>
Changes Secret validation to happen when Secrets
are used by processors, rather than as they are added
to the DAG cache. Also caches the validation results
on the DAG objects for efficiency.
This eliminates the need for Contour to inspect Secrets
that are not used by an Ingress/HTTPProxy/Gateway. It
also replaces the confusing "Secret not found" errors
with specific validation errors when a Secret does exist,
but is not valid.
This also allows Secret validation logic to be more
precise, since we know what the processors are trying
to use them for.
Fixes #2616.
Fixes #2908.