-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix etcd backup/restore test and add guardrail for etcd-snapshot #11314
Fix etcd backup/restore test and add guardrail for etcd-snapshot #11314
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #11314 +/- ##
==========================================
- Coverage 46.89% 42.44% -4.45%
==========================================
Files 179 179
Lines 18587 18610 +23
==========================================
- Hits 8716 7899 -817
- Misses 8518 9505 +987
+ Partials 1353 1206 -147
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that we are treating a 404 (url does not exist) as Unauthorized https://github.com/k3s-io/k3s/blob/master/pkg/server/router.go#L87.
That's not what's happening here. The NotFoundHandler field is perhaps poorly named, but it is just how mux handles chaining request handlers. If a router handler does not find a matching route, it hands it off to the next handler. In this case you'll notice that section of code setting up a long chain of routers where each router in the chain sets the previous one as its NotFound handler, and the final router is whats returned. Because etcd isn't present to install any routes that match the snapshot path, you end up getting passed all the way through to the apiserver, which is what generates the Unauthorized response.
We do have other handlers that return proper errors if etcd isn't enabled, for example:
Lines 118 to 126 in b93fd98
func bootstrapHandler(runtime *config.ControlRuntime) http.Handler { | |
if runtime.HTTPBootstrap { | |
return bootstrap.Handler(&runtime.ControlRuntimeBootstrap) | |
} | |
return http.HandlerFunc(func(resp http.ResponseWriter, req *http.Request) { | |
logrus.Warnf("Received HTTP bootstrap request from %s, but embedded etcd is not enabled.", req.RemoteAddr) | |
util.SendError(errors.New("etcd disabled"), resp, req, http.StatusBadRequest) | |
}) | |
} |
Here's where the snapshot handlers are registered:
Lines 94 to 103 in b93fd98
// registerDBHandlers registers managed-datastore-specific callbacks, and installs additional HTTP route handlers. // Note that for etcd, controllers only run on nodes with a local apiserver, in order to provide stable external // management of etcd cluster membership without being disrupted when a member is removed from the cluster. func (c *Cluster) registerDBHandlers(handler http.Handler) (http.Handler, error) { if c.managedDB == nil { return handler, nil } return c.managedDB.Register(handler) } Lines 632 to 633 in b93fd98
// Register adds db info routes for the http request handler, and registers cluster controller callbacks func (e *ETCD) Register(handler http.Handler) (http.Handler, error) { Lines 696 to 697 in b93fd98
// handler wraps the handler with routes for database info func (e *ETCD) handler(next http.Handler) http.Handler {
We do not need to worry about returning 404s for the snapshot handlers if etcd isn't enabled, since that's not what's currently happening. Nor should the CLI be checking for etcd datastore files - request validation should happen on the server side.
You could probably do a couple different things:
- Add a default handler in
router.go
that matches /db/ and sends an appropriate error - counting on the fact that the etcd router is later in the chain and the requests won't ever make it to this handler if etcd is running. - Add a default handler in
managed.go
that is only added if etcd isn't enabled. This is probably technically safer, since it would avoid installing the error handler if etcd is running.
Thanks for the explanation and the suggestion. I took this path, I also agree it is safer |
17234f0
to
8bbc8a6
Compare
aec8aac
to
83358d5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace nit
83358d5
to
cd5478f
Compare
Signed-off-by: manuelbuil <mbuil@suse.com>
/trivy |
❌ Trivy scan action failed, check logs ❌ |
1 similar comment
❌ Trivy scan action failed, check logs ❌ |
scale up trivy! |
🌟 No High or Critical CVEs Found 🌟 |
Proposed Changes
This PR does two things:
1 - Fixes the e2e test doing backup/restore from a snapshot. It was failing with:
When comparing the steps it was following and the documentation, it was indeed wrong. The rest of the servers don't have to run
--cluster-reset
. When removing the stepResets non bootstrap nodes
, the e2e test works again2 - I must confess that I wasted an hour trying to use
k3s etcd-snapshot
in a k3s cluster that was not using etcd. When doing that, the error we get is super confusing:However, there is nothing in the logs. The problem is that we are treating a 404 (url does not exist) as Unauthorized https://github.com/k3s-io/k3s/blob/master/pkg/server/router.go#L87. We have been doing that for a long time, so it is probably risky to change that. Therefore, I decided to check if
kine.sock
exists anddb/etcd/config
not in the dataDir. If that's the case, we printK3s is not deployed with an etcd datastore
Types of Changes
Test fix + guardrail
Verification
1 - Run the e2e test
2 - Run any
k3s etcd-snapshot
command without etcd as datastoreTesting
Linked Issues
#11388
User-Facing Change
Further Comments