Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Authorization refactor in preparation for fine-grained authorization #12313

Merged
merged 55 commits into from
Oct 25, 2023

Conversation

markylaing
Copy link
Contributor

This is a large refactor of authorization in LXD. The key points are:

  • There is no longer a concept of an "admin" outside of the auth package.
  • All EndpointActions must have either an AccessHandler or AllowUntrusted must be true.
  • The auth package now contains Entitlements which act or particular objects. Access handlers should ask "Does user X have entitlement Y on object Z?"

This is in preparation for https://warthogs.atlassian.net/browse/LXD-389, the remainder of which will be found in #12252

@markylaing
Copy link
Contributor Author

I think the DCO check is failing because there are too many commits. I've definitely signed them all. If it's an issue I'll figure out how to run the action locally and submit a fix upstream.

lxd/auth/driver_tls.go Outdated Show resolved Hide resolved
lxd/auth/authorization_types.go Show resolved Hide resolved
lxd/auth/authorization_objects.go Outdated Show resolved Hide resolved
lxd/auth/authorization_objects.go Outdated Show resolved Hide resolved
lxd/auth/driver_tls.go Outdated Show resolved Hide resolved
lxd/daemon.go Outdated Show resolved Hide resolved
lxd/db/operationtype/operation_type.go Outdated Show resolved Hide resolved
lxd/project/permissions.go Outdated Show resolved Hide resolved
lxd/instances.go Outdated Show resolved Hide resolved
lxd/storage_buckets.go Outdated Show resolved Hide resolved
lxd/warnings.go Show resolved Hide resolved
@markylaing markylaing force-pushed the authorization-refactor branch 3 times, most recently from 52d4a3d to 42c5f30 Compare September 27, 2023 13:38
@markylaing
Copy link
Contributor Author

@tomponline @monstermunchkin @gabrielmougard @MusicDin I've addressed the comments so ready for round 2 when you have a moment. Thanks :)

@markylaing
Copy link
Contributor Author

I've just pushed an update because my most recent changes still included some logic which caused inconsistent behaviour in in the OpenFGA driver. Tests are passing locally including RBAC tests.

@markylaing markylaing force-pushed the authorization-refactor branch 2 times, most recently from c332af2 to 7e245ba Compare September 28, 2023 14:50
@markylaing
Copy link
Contributor Author

@tomponline Regarding the problem with the events websocket I mentioned on Monday.

If you add this commit: f02bec6

Rather than fail because the user does not have sufficient privilege to view logging events, the test suite will hang indefinitely, never outputting an event. This happens because the client implementation of GetEvents does not specify an event type, so the /1.0/events endpoint returns all events to which the user has access. The CLI adds a handler to the EventListener returned by GetEvents, but since no logging events are ever returned, the handler is never run.

Really this should return a 403 Forbidden, but I'm not sure how this can be achieved with the current client implementation, especially given that we really don't want to be changing the client interface. Given that this behaviour is currently in main with TLS and RBAC, I don't think the OpenFGA driver needs to fix it. We can fix at a later date.

@markylaing
Copy link
Contributor Author

@tomponline an additional pain I have found is that lxc exec requires access to /1.0/events. Rather than querying the /1.0/operations/{uuid}/wait endpoint, the client uses an event listener to figure out when an operation has completed. This means that for a user to be able to exec into a single instance they currently need to be able to see all operation and lifecycle events. This is a fairly easy fix though, I'll point it out in the other PR in a commit message.

@tomponline
Copy link
Member

This is a fairly easy fix though, I'll point it out in the other PR in a commit message.

Can you put this up as a standalone PR so we can evaluate it independently, it sounds like a reasonable change to me though. Thanks

@tomponline
Copy link
Member

Rather than fail because the user does not have sufficient privilege to view logging events, the test suite will hang indefinitely, never outputting an event. This happens because the client implementation of GetEvents does not specify an event type, so the /1.0/events endpoint returns all events to which the user has access. The CLI adds a handler to the EventListener returned by GetEvents, but since no logging events are ever returned, the handler is never run.

Really this should return a 403 Forbidden,

Why do you think it should be a 403?

Isn't this like the other discussion we had the other day regarding filtered lists, whereby if you dont have permission on any resources you just get an empty output rather than an error? In this case, isn't a blocking event stream with no events equivalent to that?

@markylaing
Copy link
Contributor Author

Why do you think it should be a 403?

Isn't this like the other discussion we had the other day regarding filtered lists, whereby if you dont have permission on any resources you just get an empty output rather than an error? In this case, isn't a blocking event stream with no events equivalent to that?

I guess it is equivalent. Though in this case I think the reason for a hanging lxc monitor is a little more obfuscated. Happy to leave as it is for now though.

@markylaing
Copy link
Contributor Author

This is a fairly easy fix though, I'll point it out in the other PR in a commit message.

Can you put this up as a standalone PR so we can evaluate it independently, it sounds like a reasonable change to me though. Thanks

Yep will do.

@tomponline
Copy link
Member

Can we make it so if the user has access to no events it returns 403 then?

@markylaing
Copy link
Contributor Author

This is a fairly easy fix though, I'll point it out in the other PR in a commit message.

Can you put this up as a standalone PR so we can evaluate it independently, it sounds like a reasonable change to me though. Thanks

Yep will do.

#12349

Signed-off-by: Mark Laing <mark.laing@canonical.com>
Signed-off-by: Mark Laing <mark.laing@canonical.com>
@markylaing
Copy link
Contributor Author

Rebased.


authenticationProtocol := details.authenticationProtocol()
if authenticationProtocol != api.AuthenticationMethodTLS {
t.logger.Warn("Authentication protocol is not compatible with authorization driver", logger.Ctx{"protocol": authenticationProtocol})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should ban all requests that use an invalid authentication protocol rather than fail open?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure. Currently if you authenticate with Candid (without RBAC) or OIDC the default is to allow everything. I can change it but will need to change some tests. Any existing users using these authentication methods will find that they are unauthorized and will need to configure authorization via the unix socket.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's unlikely that there many (or any) users using OIDC or standalone Candid though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so they dont have a trusted client TLS cert because they authenticated through a different mechanism but the active authorizer is for TLS. I think we should keep it the same, but perhaps clarify that section there with a comment how you explained it above.

r.resourcesLock.Lock()
r.resources = resourcesMap
r.resourcesLock.Unlock()
if shared.ValueInSlice(PermissionAdmin, permissions[""]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markylaing I dont understand what this part is doing, will it always be false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The permissions map here is a map of project name to slice of permissions. However, if the user is an admin, this permission is stored in the map with the project name as an empty string. This is how it is currently done, but I can change it to something like:

type rbacPermission struct {
    admin bool
    projects map[string][]Permission
}

The permission cache defined on rbac will then be a map[string]rbacPermission where the keys are usernames.

How does that sound?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i think that would be clearer. Thanks

return nil
if details.isAllProjectsRequest {
// Only admins can use the all-projects parameter.
return nil, api.StatusErrorf(http.StatusForbidden, "User is not an administrator")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "administrator" an RBAC concept? As I remember you were removing IsAdmin concept from LXD.
So I wouldn't want to see "administrator" in the error unless it was related to an RBAC concept specifically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes "admin" is an RBAC permission which grants full access. I'll update the comment to: // Only RBAC administrators can use the all-projects parameter.

return
}

logger.Errorf("Failed to prepare RBAC query: %v", err)
Copy link
Member

@tomponline tomponline Oct 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to include in the log message what it was trying to do, i.e "Failed RBAC status check, failed preparing request: %v"

continue
}

logger.Errorf("Failed to connect to RBAC, re-trying: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like "Failed RBAC status check, failed connecting to RBAC service, retrying: %v"


// Ignore unknown projects.
}
if resp.StatusCode != 200 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use constants from http package here.

access.Projects[projectName] = v
break
}
if resp.StatusCode == 504 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use constants from http package here.

d.authorizer.StopStatusCheck()
err := d.authorizer.StopService(d.shutdownCtx)
if err != nil {
logger.Error("Failed to stop authorizer service", logger.Ctx{"error": err})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should return an error here rather than logging right? Otherwise we can get into an inconsistent state?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or do we also need to fallback to TLS in that case?

@@ -1803,14 +1803,17 @@ func (d *Daemon) setupRBACServer(rbacURL string, rbacKey string, rbacExpiry int6
var err error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in general we should rework this function to be a generic "one of the authorization keys has changed, we need to restart/change the authorization driver" and have it handle all the driver's config keys so it can fallback to the previous active driver (and not just TLS) on error. But this can come as a separate PR.


spaceSeparatedEntityPath := strings.Replace(entityPath, "/", " / ", -1)

// Make an []any for the number of expected path arguments and set each value in the slice to a *string.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not []string or []*string?

}

if !authorizer.UserHasPermission(r, projectName, "view") {
err = authorizer.CheckPermission(r.Context(), r, object, auth.EntitlementCanView)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid calling this function for each entity in entries - this could be a lot of them and slow?
Can we use the transactional checker concept we discussed?

return response.SmartError(err)
} else if err != nil {
// This is counterintuitive. We are unauthorized to get a permission checker for viewing instances because a metric type certificate
// can't view instances. However, in order to get to this point we must already have auth.EntitlementCanViewMetrics. So we can view
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if you can view all metrics with auth.EntitlementCanViewMetrics why cant you also do filtering?

@@ -528,7 +544,15 @@ func certificatesPost(d *Daemon, r *http.Request) response.Response {
}

// Handle requests by non-admin users.
if !s.Authorizer.UserIsAdmin(r) {
var userCanCreateCertificates bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this section could do with a comment explaining whats going on here and why its special.

trusted := d.checkTrustedClient(r) == nil && allowProjectPermission("images", "manage-images")(d, r) == response.EmptySyncResponse
projectName := request.ProjectParam(r)

var userCanCreateImages bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment here explaining why this bit is special would be good.

@@ -1137,6 +1147,12 @@ func imagesPost(d *Daemon, r *http.Request) response.Response {
return fmt.Errorf("Failed syncing image between nodes: %w", err)
}

// Add the image to the authorizer.
err = s.Authorizer.AddImage(r.Context(), projectName, info.Fingerprint)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this earlier, fail if it fails, and if something goes wrong later remove the entry from the authorizer again?
This feels safer to me.

var names []string
var instances []instance.Instance
for _, inst := range c {
if inst.Project().Name != projectName {
continue
}

// Only allow changing the state of instances the user has permission for.
if !userHasPermission(auth.ObjectInstance(inst.Project().Name, inst.Name())) {
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this should be an error.

@@ -327,6 +327,12 @@ func lxcCreate(s *state.State, args db.InstanceArgs, p api.Project) (instance.In
if d.isSnapshot {
d.state.Events.SendLifecycle(d.project.Name, lifecycle.InstanceSnapshotCreated.Event(d, nil))
} else {
// Add instance to authorizer.
err = d.state.Authorizer.AddInstance(d.state.ShutdownCtx, d.project.Name, d.Name())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in general we need to review and consider the ordering of when entities are added/removed from the authorizer and whether if that fails whether it should be passed back to the user.

return response.InternalError(fmt.Errorf("Unable to create authorization object for operation: %w", err))
}

err = s.Authorizer.CheckPermission(r.Context(), r, object, entitlement)
Copy link
Member

@tomponline tomponline Oct 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a "get permission checker" to avoid having to call out to the authorizer service for each op.Resources()?

Copy link
Member

@tomponline tomponline left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although there's some additional conversations and changes to be made to this, in the interests of making progress I am approving it as I do not see any glaring problems with it. Thanks!

@tomponline tomponline merged commit 20efe1b into canonical:main Oct 25, 2023
25 of 26 checks passed
@markylaing markylaing mentioned this pull request Oct 25, 2023
18 tasks
@markylaing
Copy link
Contributor Author

Thanks for your review @tomponline. Glad it's finally merged 🥳

I've created an issue for the outstanding comments: #12461

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants