Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 apid might fail watching resources #8345

Closed
smira opened this issue Feb 22, 2024 · 0 comments · Fixed by #8358
Closed

🐛 apid might fail watching resources #8345

smira opened this issue Feb 22, 2024 · 0 comments · Fixed by #8358
Assignees

Comments

@smira
Copy link
Member

smira commented Feb 22, 2024

This bug applies to both apid and trustd - they don't have PKI, but rather watch resources provided by the machined over COSI state socket. If the watch fails, apid and trustd stop rotating their certificates.

On watch failure, we should crash the processes so that they can restart and re-establish the watch.

: 2023/11/15 11:04:10.543784 provider.go:73: error watching for API certificates: rpc error: code = Unavailable desc = error reading from server: EOF

(the root cause of the error is not known here, might be gRPC bug)

@smira smira self-assigned this Feb 23, 2024
smira added a commit to smira/talos that referenced this issue Mar 6, 2024
Fixes siderolabs#8345

Both `apid` and `trustd` services use a gRPC connection back to
`machined` to watch changes to the certificates (new certificates being
issued).

This refactors the code to follow regular conventions, so that a failure
to watch will crash the process, and they have a way to restart and
re-establish the watch.

Use the context and errgroup consistently.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 67ac693)
dsseng pushed a commit to dsseng/talos that referenced this issue Mar 7, 2024
Fixes siderolabs#8345

Both `apid` and `trustd` services use a gRPC connection back to
`machined` to watch changes to the certificates (new certificates being
issued).

This refactors the code to follow regular conventions, so that a failure
to watch will crash the process, and they have a way to restart and
re-establish the watch.

Use the context and errgroup consistently.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 6, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant