Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More Complete ACL Implementation #386

Merged
merged 52 commits into from
Jun 7, 2024

Conversation

kevin-montrose
Copy link
Contributor

@kevin-montrose kevin-montrose commented May 15, 2024

This sketches out a more proper ACL implementation - one with categories beyond @admin, and where individual commands can be ACL'd.

Basic idea is to use RespCommandsInfo as a mechanism to discover command ACLs, and attach a bitfield to each User to track individual commands rather than categories.

Outstanding TODOs:

  • Test rest of commands w.r.t. ACL categories
  • Implement individual command ACL'ing (ie. +get)
  • Implement sub-command ACL'ing (ie. -config|get)
  • Figure out what to do with unlisted/internal/cluster commands
  • Figure out what to do with custom extensions
    • Introduced a @garnet ACL category that all commands not in Redis are now tagged with
  • (Probably) Remove CommandCategory - it is redundant
  • Clean up command parsing, now that subcommand bytes are a "later in the pipeline'-thing
    • As part of this, probably DRY up ACL checking - it's a bit error prone to add a new command with this code so "wet"
    • SET (with EX, NX, etc. but not SETEX) are still special cased as they weren't realized as subCmds
  • Cluster subcommands are weird, but need ACL testing too
  • CustomCmd, CustomTnx, CustomObjCmd are weird and need to be ACL'able in some way too
    • Introduced a @Custom ACL category that blocks all of these, and the individual commands can also be ACL'd (that is +customcmd works)
    • Longer term we may want to allow ACL'ing the commands by their actual registered name, but IMO that should be a separate enhancement

This is explicitly not going to implement keyspace patterns. I feel like that should be a separate enhancement, as it's pretty substantial on it's own.


The biggest incidental change here is a substantial refactoring of RespCommand and the various switches over it. This is, IMO, the cleanest way to get a canonical value to test against ACLs with. As a consequence, RespCommand has a bunch more entries as most things that were previously "sub commands" (but not in the RESP sense) are now actual enum values.

There's one additional implementation detail to call out, which is how ACL descriptions are maintained. So that ACL descriptions behave more or less like you'd expect (that is +@foo +bar -bar becomes +@foo) a greedy approach is taken where, upon update, a trial removal of each token is made to see if it impacts the effective ACL set. This is pretty allocate-y, and while that could be improved I think ACL manipulation is rare and locked down enough that clarity was a bit more important. It's worth a stringent review, however. Code is in User.RationalizeACLDescription.

An attempt has been made to cover all commands, but more remain.

A number of TODOs remain, but this gives a rough shape of how it'd work
and how individual Commands or SubCommands would end up ACL'd too.
@lmaas lmaas self-assigned this May 15, 2024
As part of this, needed to actually enumerate all commands in RespCommands which
implies a lot of future cleanup in parsing and command processesing.

This also uncovered a number of different gaps in implemented RESP commands, which
were not addressed unless doing so was necessary for ACL testing.
Garnet-specific commands are still pending.
Introduces @garnet category.
Reworks WATCH so MS and OS are proper subcommands.
Reworks parsing to remove the tuple with subcommand, as it'd always be 0 now.
…ands, while the common case is consolidated into an upfront check.

Note that SET (and pseudo-variants SETEXNX, etc.) is a special case, as it lacks subcommands but has multiple command types.
That could be cleaned up later, but has performance implications - so leaving it for future work.
…CL categories, replace slow temp code with better code, add tests to catch future additions; this involves reordering RespCommand yet again
@kevin-montrose kevin-montrose marked this pull request as ready for review May 21, 2024 16:42
@kevin-montrose
Copy link
Contributor Author

kevin-montrose commented May 21, 2024

There's still some more to do here (more testing, subcommands, some cleanup) , but I'm going on vacation for a week and this is pretty dang close to how it's going to actually look when done. So taking it off draft, and marking ready for review (if not merging).

/cc @mgravell @NickCraver - this is relevant to your interests.

@kevin-montrose kevin-montrose changed the title [SKETCH] More Complete ACL Implementation More Complete ACL Implementation May 21, 2024
Copy link
Contributor

@lmaas lmaas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have only made it half-way through the PR, but this implementation looks very promising! Your efforts in addressing the open to-do items in the ACL implementation are greatly appreciated. I have included some comments below.

libs/cluster/Session/ClusterSession.cs Outdated Show resolved Hide resolved
libs/cluster/Session/ClusterSession.cs Outdated Show resolved Hide resolved
libs/server/ACL/ACLException.cs Outdated Show resolved Hide resolved
libs/server/ACL/ACLParser.cs Outdated Show resolved Hide resolved
libs/server/ACL/User.cs Outdated Show resolved Hide resolved
libs/server/ACL/User.cs Outdated Show resolved Hide resolved
libs/server/ACL/User.cs Outdated Show resolved Hide resolved
test/Garnet.test/CredentialManager.cs Outdated Show resolved Hide resolved
Copy link
Contributor

@vazois vazois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR has too many changes to vet, maybe we need to break it into smaller PRs. I think the first is good but very hard to review because the surface area is huge. I think the idea of having a bitmap to match command permissions is good. There might I couple alternatives to tradeoff between CPU cycles or memory. We need to separate general fixes from clear ACL enchantments and avoid pushing parsing inside the execution methods.

libs/cluster/Session/MigrateCommand.cs Show resolved Hide resolved
libs/cluster/Session/ClusterSession.cs Show resolved Hide resolved
libs/cluster/Session/ClusterSession.cs Show resolved Hide resolved
libs/server/Resp/RespServerSession.cs Outdated Show resolved Hide resolved
libs/cluster/Session/RespClusterBasicCommands.cs Outdated Show resolved Hide resolved
libs/server/Resp/Bitmap/BitmapCommands.cs Outdated Show resolved Hide resolved
libs/server/Resp/KeyAdminCommands.cs Outdated Show resolved Hide resolved
libs/server/Resp/Objects/HashCommands.cs Outdated Show resolved Hide resolved
libs/server/Resp/Objects/HashCommands.cs Show resolved Hide resolved
@kevin-montrose
Copy link
Contributor Author

I think this PR has too many changes to vet, maybe we need to break it into smaller PRs. I think the first is good but very hard to review because the surface area is huge. I think the idea of having a bitmap to match command permissions is good. There might I couple alternatives to tradeoff between CPU cycles or memory. We need to separate general fixes from clear ACL enchantments and avoid pushing parsing inside the execution methods.

Yeah, it's gigantic - ACLs touch literally everything. Excluding tests, it is trending smaller as things DRY up.

A bunch of random-ish fixes are somewhat unfortunate, as a big part of testing this is making sure that failed ACLs don't bork the command stream - and I encountered a number of cases of fairly "brittle" command implementations. I'm open to breaking those out into separate PRs once ACLs are in a good place, but I needed some confidence that the ACL design would work for all commands upfront.

kevin-montrose and others added 4 commits May 28, 2024 13:40
Co-authored-by: Lukas Maas <contact@lukasmaas.com>
…ssion ACL validation

address feedback; break CommandPermissionSet out into separate file
address feedback; increment correct statistic, fixing bug
address feedback; adding count is unnecessary in many places, remove that post-DRY-ing
@badrishc
Copy link
Contributor

Great work overall. Shaping up really nicely, thanks!

@kevin-montrose
Copy link
Contributor Author

Pushed up one more exploratory commit that remembers CanAuthenticate so we can avoid an interface method call on each command.


Switching to BDN now that it's in main.

main (as of fcf880a)

Method Job EnvironmentVariables Runtime Mean Error StdDev Allocated
InlinePing .NET 6 Empty .NET 6.0 2.249 us 0.0338 us 0.0316 us -
Set .NET 6 Empty .NET 6.0 23.161 us 0.2178 us 0.1931 us -
Get .NET 6 Empty .NET 6.0 13.875 us 0.0973 us 0.0910 us -
InlinePing .NET 8 DOTNET_TieredPGO=0 .NET 8.0 1.983 us 0.0138 us 0.0122 us -
Set .NET 8 DOTNET_TieredPGO=0 .NET 8.0 18.985 us 0.0977 us 0.0914 us -
Get .NET 8 DOTNET_TieredPGO=0 .NET 8.0 12.280 us 0.1564 us 0.1463 us -

this (as of 18d7d13), now with a UseACLs option

Method Job EnvironmentVariables Runtime UseACLs Mean Error StdDev Allocated
InlinePing .NET 6 Empty .NET 6.0 False 2.378 us 0.0232 us 0.0217 us -
Set .NET 6 Empty .NET 6.0 False 23.721 us 0.2094 us 0.1959 us -
Get .NET 6 Empty .NET 6.0 False 14.102 us 0.0550 us 0.0514 us -
InlinePing .NET 8 DOTNET_TieredPGO=0 .NET 8.0 False 2.033 us 0.0154 us 0.0144 us -
Set .NET 8 DOTNET_TieredPGO=0 .NET 8.0 False 18.883 us 0.1771 us 0.1656 us -
Get .NET 8 DOTNET_TieredPGO=0 .NET 8.0 False 11.194 us 0.0779 us 0.0728 us -
InlinePing .NET 6 Empty .NET 6.0 True 2.493 us 0.0105 us 0.0098 us -
Set .NET 6 Empty .NET 6.0 True 23.715 us 0.2381 us 0.1989 us -
Get .NET 6 Empty .NET 6.0 True 14.485 us 0.0909 us 0.0806 us -
InlinePing .NET 8 DOTNET_TieredPGO=0 .NET 8.0 True 2.306 us 0.0254 us 0.0237 us -
Set .NET 8 DOTNET_TieredPGO=0 .NET 8.0 True 19.279 us 0.1247 us 0.1106 us -
Get .NET 8 DOTNET_TieredPGO=0 .NET 8.0 True 12.000 us 0.0741 us 0.0693 us -

full logs here: https://gist.github.com/kevin-montrose/8145ddfffa201735b50c78e7edf8f39f

Which is awful close to a wash when ACLs are disable, and maybe a 0.5us loss when enabled.

Still need to look at Badrish's proposal, so that's up next.

@badrishc
Copy link
Contributor

badrishc commented Jun 6, 2024

Still need to look at Badrish's proposal, so that's up next.

Adding a link to this thread for reference - f20925a

@badrishc
Copy link
Contributor

badrishc commented Jun 6, 2024

exploratory commit that remembers CanAuthenticate

I don't think this will work with authenticators that might time out, e.g., the Aad authenticator.

@kevin-montrose
Copy link
Contributor Author

exploratory commit that remembers CanAuthenticate

I don't think this will work with authenticators that might time out, e.g., the Aad authenticator.

IsAuthenticated is used for timeouts, CanAuthenticate (which I cached) is a constant for all current implementations so I think it'd be safe.

That said, I think your proposal here is cleaner - getting a perf run together then I'll update.

…nd testing) showed were redundant, removes the caching of CanAuthenticate, cleanup some of the other changes from prior commits
@kevin-montrose
Copy link
Contributor Author

I've pushed up a version that merges some of my tweaks and @badrishc's proposal.

The CanAuthenticate cache is gone, but other changes remain: _user will never be null (so we can elide some other checks), NoAuth commands aren't special cased anymore (they're set in the bitmap like everything else), some less likely paths are NoInlining]'d, and some hot paths are AggressiveInlining'd.

Latest BDN (as of 1ed1635)

Method Job EnvironmentVariables Runtime UseACLs Mean Error StdDev Allocated
InlinePing .NET 6 Empty .NET 6.0 False 2.229 us 0.0194 us 0.0172 us -
Set .NET 6 Empty .NET 6.0 False 23.134 us 0.1162 us 0.1030 us -
Get .NET 6 Empty .NET 6.0 False 13.650 us 0.0628 us 0.0557 us -
InlinePing .NET 8 DOTNET_TieredPGO=0 .NET 8.0 False 1.890 us 0.0082 us 0.0077 us -
Set .NET 8 DOTNET_TieredPGO=0 .NET 8.0 False 18.742 us 0.1390 us 0.1160 us -
Get .NET 8 DOTNET_TieredPGO=0 .NET 8.0 False 11.005 us 0.0559 us 0.0495 us -
InlinePing .NET 6 Empty .NET 6.0 True 2.502 us 0.0201 us 0.0178 us -
Set .NET 6 Empty .NET 6.0 True 23.270 us 0.0814 us 0.0722 us -
Get .NET 6 Empty .NET 6.0 True 14.275 us 0.0563 us 0.0527 us -
InlinePing .NET 8 DOTNET_TieredPGO=0 .NET 8.0 True 1.913 us 0.0154 us 0.0144 us -
Set .NET 8 DOTNET_TieredPGO=0 .NET 8.0 True 18.779 us 0.1348 us 0.1126 us -
Get .NET 8 DOTNET_TieredPGO=0 .NET 8.0 True 11.582 us 0.0520 us 0.0486 us -

Full log: https://gist.github.com/kevin-montrose/a1a7f9ef472c68fbb2b3b61c1c3f190d

Suggests a w/o ACL improvement, and about the same for perf with ACLs.

I did a sanity check with Embedded.perftest again, and while I trust it less than BDN numbers it also shows a gain of ~4K PINGs / sec (for about 6% gain, upt o ~66K total PINGs / sec; .NET 8, PGO disabled) in the no-ACL case.

libs/server/ACL/CommandPermissionSet.cs Show resolved Hide resolved
libs/server/ACL/CommandPermissionSet.cs Outdated Show resolved Hide resolved
libs/server/Resp/AdminCommands.cs Outdated Show resolved Hide resolved
libs/server/Transaction/TxnRespCommands.cs Show resolved Hide resolved
libs/server/Transaction/TxnRespCommands.cs Outdated Show resolved Hide resolved
libs/cluster/Session/ClusterCommands.cs Show resolved Hide resolved
libs/server/Resp/ACLCommands.cs Show resolved Hide resolved
@badrishc
Copy link
Contributor

badrishc commented Jun 7, 2024

Idea to avoid the interface call to authenticator in common case (not necessary for this PR):

17402f3

libs/cluster/Session/ClusterSession.cs Outdated Show resolved Hide resolved
libs/server/ACL/ACLParser.cs Show resolved Hide resolved
badrishc and others added 3 commits June 6, 2024 19:39
…h user auth invalidation, remove some optimizations that are now not actually useful after the bugfixes
@kevin-montrose
Copy link
Contributor Author

Latest perf numbers

BDN

main (0acff38)

Method Job EnvironmentVariables Runtime Mean Error StdDev Allocated
InlinePing .NET 6 Empty .NET 6.0 2.246 us 0.0275 us 0.0244 us -
Set .NET 6 Empty .NET 6.0 23.058 us 0.1166 us 0.1090 us -
Get .NET 6 Empty .NET 6.0 13.873 us 0.0944 us 0.0837 us -
InlinePing .NET 8 DOTNET_TieredPGO=0 .NET 8.0 1.986 us 0.0089 us 0.0084 us -
Set .NET 8 DOTNET_TieredPGO=0 .NET 8.0 19.673 us 0.1043 us 0.0975 us -
Get .NET 8 DOTNET_TieredPGO=0 .NET 8.0 11.025 us 0.0482 us 0.0377 us -

aclImprovements (f5e1684)

Method Job EnvironmentVariables Runtime UseACLs Mean Error StdDev Allocated
InlinePing .NET 6 Empty .NET 6.0 False 2.323 us 0.0440 us 0.0646 us -
Set .NET 6 Empty .NET 6.0 False 23.092 us 0.1001 us 0.0887 us -
Get .NET 6 Empty .NET 6.0 False 13.588 us 0.0956 us 0.0848 us -
InlinePing .NET 8 DOTNET_TieredPGO=0 .NET 8.0 False 2.184 us 0.0163 us 0.0153 us -
Set .NET 8 DOTNET_TieredPGO=0 .NET 8.0 False 18.372 us 0.1050 us 0.0931 us -
Get .NET 8 DOTNET_TieredPGO=0 .NET 8.0 False 11.196 us 0.0495 us 0.0439 us -
InlinePing .NET 6 Empty .NET 6.0 True 2.586 us 0.0142 us 0.0133 us -
Set .NET 6 Empty .NET 6.0 True 23.417 us 0.0952 us 0.0891 us -
Get .NET 6 Empty .NET 6.0 True 14.442 us 0.0901 us 0.0843 us -
InlinePing .NET 8 DOTNET_TieredPGO=0 .NET 8.0 True 2.102 us 0.0268 us 0.0251 us -
Set .NET 8 DOTNET_TieredPGO=0 .NET 8.0 True 19.076 us 0.1852 us 0.1641 us -
Get .NET 8 DOTNET_TieredPGO=0 .NET 8.0 True 11.330 us 0.0473 us 0.0419 us -

Which looks like around 0.2us loss.

I did grab Embedded.perftest numbers again, but they are wildly variable for me latest main - swinging by up to 30K PINGs / sec so I really don't trust them.

@kevin-montrose
Copy link
Contributor Author

kevin-montrose commented Jun 7, 2024

Idea to avoid the interface call to authenticator in common case (not necessary for this PR):

17402f3

This is interesting (as is punching the authenticator into RespServerSession as a generic argument, to de-virtualize everything) but I want push back a little bit on the idea that NoAuth is the common case.

While it certainly isn't unusual to deploy Redis (and thus I'd assume Garnet) with NoAuth and rely on network isolation for security, there are big cases where that's not true. Top of my mind is if Garnet is ever shipped as a service itself, or in any environment where defense in depth is a big thing.


Thinking out loud a bit, it seems to me a remaining perf killer is that we have to call into the IGarnetAuthenticator on every command to validate our User is still good - a poll model, if you will. Given that the common case is that the User is still good (because invalidation is rare, AD/Kerberos sessions get measured in hours, password rotations happen on the order of months, etc. etc.) it feels a push model is more appropriate.

One way to IGarnetAuthenticator into a push model, shooting from the hip:

public interface IGarnetAuthenticator
{
    /// <summary>
    /// Can authenticator authenticate
    /// </summary>
    bool CanAuthenticate { get; }

    /// <summary>
    /// Whether this authenticator can be used with the ACL
    /// </summary>
    bool HasACLSupport { get; }

    /// <summary>
    /// Authenticate the incoming username and password from AUTH command. Username is optional
    /// </summary>
    bool TryAuthenticate(ReadOnlySpan<byte> password, ReadOnlySpan<byte> username, out AuthenticationLifetime authLifetime);
}

public sealed class AuthenticationLifetime
{
    // for Authenticators that support auth but not invalidation, or we use null as a special case
    public static AuthenticationLifetime Infinite = ... ;

    // we call this in RespServerSession, it get devirtualized into a simple bool check
    public bool IsValid { get; }

    // IGarnetAuthenticators call this when a previously auth'd user (from a call to IGarnetAuthenticator.TryAuthenticate) is de-auth'd
    public void Invalidate() { ... }
}

Our hotpath logic in RespServerSession then ends up looking like:

// note _authLifetime gets initialized with some "already invalidated" value so we wouldn't need a null check
(!_authLifetime.IsValid || !_user.CanAccessCommand(cmd)) && !cmd.IsNoAuth()

which is virtual-call-less.

I'm not proposing to do this as part of this PR.

To be clear, the IGarnetAuthenticator implementation would have to remember the AuthenticationLifetimes it hands back to invalidate them - but I think that's fine, it already has to remember stuff about the user to implement IsAuthenticated.

This would keep the User and authentication separation we have, get the IGarnetAuthenticator interface out of the hotpath, and I think cover all reasonable schemes.

@PaulusParssinen
Copy link
Contributor

PaulusParssinen commented Jun 7, 2024

Just noting that in real-world workload, the .NET 8 DynamicPGO (which is disabled in our BDN microbenchmarks to reduce noise) would likely specialize the virtualized calls for common types here and we shouldn't have to do it ourselves. Whether it kicks in here is a bit harder to verify but something to keep in mind.

@badrishc
Copy link
Contributor

badrishc commented Jun 7, 2024

Thinking out loud a bit, it seems to me a remaining perf killer is that we have to call into the IGarnetAuthenticator on every command to validate our User is still good - a poll model, if you will. Given that the common case is that the User is still good (because invalidation is rare, AD/Kerberos sessions get measured in hours, password rotations happen on the order of months, etc. etc.) it feels a push model is more appropriate.

Yeah, it makes sense to consider a push-based approach to update the auth status in the rare case of it being revoked instead of having to poll the provider every single time. cc @yangmsft whose team uses the auth feature. (definitely out of scope for this PR)

@badrishc badrishc merged commit abfbff0 into microsoft:main Jun 7, 2024
23 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Aug 7, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants