Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconciler stops responding after some minutes when using the builder.OnlyMetadata watch option #1789

Closed
wallrj opened this issue Feb 2, 2022 · 1 comment · Fixed by #1790

Comments

@wallrj
Copy link
Contributor

wallrj commented Feb 2, 2022

I'm using builder.OnlyMetadata in a controller that watches Secrets, so as to avoid caching all the secret data in-memory.

The controller works initially and my Reconcile function is invoked everytime I add / delete / update Secrets in my cluster.

But after some minutes (~8-9 minutes) the controller stops responding to changes in Secrets.

I've increased the logging verbosity so that I can see client-go messages and I notice that:

For Watches without the OnlyMetadata option

There are periodic log message such as:

2022-01-31T11:51:27.415Z        LEVEL(-4)       pkg/mod/k8s.io/client-go@v0.23.2/tools/cache/reflector.go:167: Watch close - *v1.Issuer total 8 items received
2022-01-31T11:51:27.420Z        INFO    GET https://10.159.240.1:443/apis/cert-manager.io/v1/issuers?allowWatchBookmarks=true&resourceVersion=2718530&timeoutSeconds=382&watch=true 200 OK in 2 milliseconds

suggesting that the watch is regularly bein re-established.

For watches with OnlyMetadata option

There are no regular "Watch close" messages.
And at about the time the other resource watches have been closed and re-opened,
the controller stops responding to changes to those resources with the OnlyMetadata (PartialObjectMetaData) setting.

And it may be unrelated, but I notice a slight difference in the initial log messages when I use OnlyMetada

2022-02-01T16:00:06.107Z	INFO	controller.secret	Starting EventSource	{"reconciler group": "", "reconciler kind": "Secret", "source": "kind source: *v1.PartialObjectMetadata"}
2022-02-01T16:00:06.107Z	INFO	controller.secret	Starting Controller	{"reconciler group": "", "reconciler kind": "Secret"}
2022-02-01T16:00:06.108Z	LEVEL(-3)	Starting reflector *v1.PartialObjectMetadata (10h17m20.880108561s) from pkg/mod/k8s.io/client-go@v0.23.0/tools/cache/reflector.go:167

2022-02-01T16:00:06.108Z	LEVEL(-3)	Listing and watching *v1.PartialObjectMetadata from pkg/mod/k8s.io/client-go@v0.23.0/tools/cache/reflector.go:167

2022-02-01T16:00:06.110Z	INFO	GET https://127.0.0.1:45521/api/v1/namespaces/default/secrets?limit=500&resourceVersion=0 200 OK in 1 milliseconds

2022-02-01T16:00:06.120Z	INFO	GET https://127.0.0.1:45521/api/v1/namespaces/default/secrets?allowWatchBookmarks=true&resourceVersion=5682&timeout=8m33s&timeoutSeconds=513&watch=true 200 OK in 9 milliseconds

That last GET request includes both a timeout and a timeoutSeconds querystring parameter,
unlike non-only-metadata GET requests which only have the timeoutSeconds querystring parameter.

Recreating the problem

I've created a repo with a very simple kubebuilder project and controller which watches and logs changes to Secrets:

There's an envtest based test which recreates the failure within 10 minutes.

/bug

@wallrj
Copy link
Contributor Author

wallrj commented Feb 3, 2022

I've created two CLI tools that hopefully demonstrate the problem more clearly.
Both set up a metadata informer and log Secrets added to a particular namespace.
One uses the client-go API and the other uses the controller-runtime public API.

Using client-go

https://github.com/wallrj/tmp-controller-runtime-partial-object-experiment/blob/main/cmd/clientgo-metadata-informer/main.go

This one works and periodically re-established the watch.

$ go run ./cmd/clientgo-metadata-informer/ --zap-time-encoding iso8601 --zap-log-level 8 --namespace test1
...
2022-02-03T14:29:13.632Z        INFO    GET https://127.0.0.1:46255/api/v1/namespaces/test1/secrets?allowWatchBookmarks=true&resourceVersion=15256&timeout=9m10s&timeoutSeconds=550&watch=true

2022-02-03T14:29:13.632Z        INFO    Request Headers:

2022-02-03T14:29:13.632Z        INFO    add     {"name": "test1/default-token-9vlkc"}
2022-02-03T14:29:13.632Z        INFO        Accept: application/vnd.kubernetes.protobuf;as=PartialObjectMetadata;g=meta.k8s.io;v=v1,application/json;as=PartialObjectMetadata;g=meta.k8s.io;v=v1,application/json

2022-02-03T14:29:13.632Z        INFO    add     {"name": "test1/foo-6546"}
2022-02-03T14:29:13.632Z        INFO        User-Agent: clientgo-metadata-informer/v0.0.0 (linux/amd64) kubernetes/$Format

2022-02-03T14:29:13.634Z        INFO    Response Status: 200 OK in 1 milliseconds

2022-02-03T14:29:13.634Z        INFO    Response Headers:

2022-02-03T14:29:13.634Z        INFO        Date: Thu, 03 Feb 2022 14:29:13 GMT

2022-02-03T14:29:13.634Z        INFO        Cache-Control: no-cache, private

2022-02-03T14:29:13.634Z        INFO        Content-Type: application/vnd.kubernetes.protobuf;stream=watch

2022-02-03T14:29:13.719Z        LEVEL(-4)       caches populated



2022-02-03T14:30:10.438Z        INFO    add     {"name": "test1/foo-17292"}
2022-02-03T14:31:10.631Z        INFO    add     {"name": "test1/foo-26758"}
2022-02-03T14:32:10.725Z        INFO    add     {"name": "test1/foo-605"}
2022-02-03T14:33:10.823Z        INFO    add     {"name": "test1/foo-31889"}
2022-02-03T14:34:10.960Z        INFO    add     {"name": "test1/foo-29962"}
2022-02-03T14:35:11.083Z        INFO    add     {"name": "test1/foo-25531"}
2022-02-03T14:36:11.201Z        INFO    add     {"name": "test1/foo-6940"}
2022-02-03T14:37:11.325Z        INFO    add     {"name": "test1/foo-18719"}
2022-02-03T14:38:11.511Z        INFO    add     {"name": "test1/foo-24683"}
2022-02-03T14:38:23.634Z        LEVEL(-4)       pkg/mod/k8s.io/client-go@v0.23.0/tools/cache/reflector.go:167: Watch close - *v1.PartialObjectMetadata total 19 items received

2022-02-03T14:38:23.635Z        INFO    GET https://127.0.0.1:46255/api/v1/namespaces/test1/secrets?allowWatchBookmarks=true&resourceVersion=16177&timeout=7m54s&timeoutSeconds=474&watch=true

2022-02-03T14:38:23.635Z        INFO    Request Headers:

2022-02-03T14:38:23.635Z        INFO        Accept: application/vnd.kubernetes.protobuf;as=PartialObjectMetadata;g=meta.k8s.io;v=v1,application/json;as=PartialObjectMetadata;g=meta.k8s.io;v=v1,application/json

2022-02-03T14:38:23.635Z        INFO        User-Agent: clientgo-metadata-informer/v0.0.0 (linux/amd64) kubernetes/$Format

2022-02-03T14:38:23.644Z        INFO    Response Status: 200 OK in 9 milliseconds

2022-02-03T14:38:23.644Z        INFO    Response Headers:

2022-02-03T14:38:23.644Z        INFO        Cache-Control: no-cache, private

2022-02-03T14:38:23.644Z        INFO        Content-Type: application/vnd.kubernetes.protobuf;stream=watch

2022-02-03T14:38:23.644Z        INFO        Date: Thu, 03 Feb 2022 14:38:23 GMT

2022-02-03T14:39:11.629Z        INFO    add     {"name": "test1/foo-952"}
2022-02-03T14:40:11.733Z        INFO    add     {"name": "test1/foo-25093"}

Using controller-runtime

https://github.com/wallrj/tmp-controller-runtime-partial-object-experiment/blob/main/cmd/ctrl-metadata-informer/main.go

This one stops responding after ~7min

$ go run ./cmd/ctrl-metadata-informer --zap-log-level 8 --zap-time-encoding iso8601 --namespace test1
...
2022-02-03T14:29:14.299Z        INFO    GET https://127.0.0.1:46255/api/v1/namespaces/test1/secrets?allowWatchBookmarks=true&resourceVersion=15256&timeout=8m14s&timeoutSeconds=494&watch=true

2022-02-03T14:29:14.299Z        INFO    Request Headers:

2022-02-03T14:29:14.299Z        INFO        Accept: application/vnd.kubernetes.protobuf;as=PartialObjectMetadata;g=meta.k8s.io;v=v1,application/json;as=PartialObjectMetadata;g=meta.k8s.io;v=v1,application/json

2022-02-03T14:29:14.299Z        INFO    add     {"name": "test1/foo-6546"}
2022-02-03T14:29:14.299Z        INFO        User-Agent: ctrl-metadata-informer/v0.0.0 (linux/amd64) kubernetes/$Format

2022-02-03T14:29:14.299Z        INFO    add     {"name": "test1/default-token-9vlkc"}
2022-02-03T14:29:14.301Z        INFO    Response Status: 200 OK in 1 milliseconds

2022-02-03T14:29:14.301Z        INFO    Response Headers:

2022-02-03T14:29:14.301Z        INFO        Cache-Control: no-cache, private

2022-02-03T14:29:14.301Z        INFO        Content-Type: application/vnd.kubernetes.protobuf;stream=watch

2022-02-03T14:29:14.301Z        INFO        Date: Thu, 03 Feb 2022 14:29:14 GMT



2022-02-03T14:30:10.438Z        INFO    add     {"name": "test1/foo-17292"}
2022-02-03T14:31:10.631Z        INFO    add     {"name": "test1/foo-26758"}
2022-02-03T14:32:10.725Z        INFO    add     {"name": "test1/foo-605"}
2022-02-03T14:33:10.823Z        INFO    add     {"name": "test1/foo-31889"}
2022-02-03T14:34:10.960Z        INFO    add     {"name": "test1/foo-29962"}
2022-02-03T14:35:11.083Z        INFO    add     {"name": "test1/foo-25531"}
2022-02-03T14:36:11.201Z        INFO    add     {"name": "test1/foo-6940"}
2022-02-03T14:37:11.325Z        INFO    add     {"name": "test1/foo-18719"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant