Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Adding a token getter to get service account tokens #1006

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

skattoju
Copy link

@skattoju skattoju commented Jul 2, 2024

Description

As part of #737 user provided service accounts will be used to manage cluster extensions, the token getter implemented in this PR enables this by providing an interface to fetch tokens given a service account. It also implements a cache where fetched tokens are stored saving requests to the api server for subsequent requests. Fixes #972

Reviewer Checklist

  • API Go Documentation
  • Tests: Unit Tests (and E2E Tests, if appropriate)
  • Comprehensive Commit Messages
  • Links to related GitHub Issue(s)

@skattoju skattoju requested a review from a team as a code owner July 2, 2024 14:49
Copy link

netlify bot commented Jul 2, 2024

Deploy Preview for olmv1 ready!

Name Link
🔨 Latest commit 3b2949f
🔍 Latest deploy log https://app.netlify.com/sites/olmv1/deploys/6686f15bcfe6650009a78b01
😎 Deploy Preview https://deploy-preview-1006--olmv1.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@skattoju skattoju force-pushed the token_getter branch 2 times, most recently from 46d5ddc to b3d079a Compare July 2, 2024 15:06
@skattoju skattoju changed the title Token getter for service account tokens ✨ Adding a token getter to get service account tokens Jul 2, 2024
Copy link

codecov bot commented Jul 2, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.10%. Comparing base (872b7f7) to head (a902abc).
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1006      +/-   ##
==========================================
+ Coverage   77.19%   78.10%   +0.90%     
==========================================
  Files          17       18       +1     
  Lines        1206     1256      +50     
==========================================
+ Hits          931      981      +50     
  Misses        193      193              
  Partials       82       82              
Flag Coverage Δ
e2e 56.54% <ø> (ø)
unit 53.82% <100.00%> (+1.91%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@everettraven everettraven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think this looks really good, great work! There are a few smaller changes I'd recommend

expireTime = token.ExpirationTimestamp.Time
}

fiveMinutesAfterNow := metav1.Now().Add(5 * time.Minute)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason for five minutes being hard coded here?

What do you think about making this configurable via the options pattern? I think that this would make this code a bit more robust and allow for tweaking this value in various cases without having to change the underlying logic of the TokenGetter

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot about that.. That should be set to expiration seconds. The options pattern is a good idea, thanks for pointing it out.

}

func (t *TokenGetter) getToken(ctx context.Context, key types.NamespacedName) (*authv1.TokenRequestStatus, error) {
req, err := t.client.ServiceAccounts(key.Namespace).CreateToken(ctx, key.Name, &authv1.TokenRequest{Spec: authv1.TokenRequestSpec{ExpirationSeconds: ptr.To[int64](3600)}}, metav1.CreateOptions{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment here, any particular reason for the hardcoded 3600 seconds for expiration? Maybe this would be another good configurable option

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also missed this, this should also be set to expiration seconds will fix 👍

func NewTokenGetter(client corev1.ServiceAccountsGetter, expirationSeconds int64) *TokenGetter {
return &TokenGetter{
client: client,
expirationSeconds: expirationSeconds,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed this expirationSeconds field is unused (noticed after my other comments related to hardcoded values)

}

func (t *TokenGetter) getToken(ctx context.Context, key types.NamespacedName) (*authenticationv1.TokenRequestStatus, error) {
req, err := t.client.ServiceAccounts(key.Namespace).CreateToken(ctx, key.Name, &authenticationv1.TokenRequest{Spec: authenticationv1.TokenRequestSpec{ExpirationSeconds: ptr.To[int64](3600)}}, metav1.CreateOptions{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment regarding hardcoded value for expiration seconds, I'd like to see this be a configurable option

(in case this seems like a duplicate my GH review UI seemed to have lost the comment so I'm potentially adding a repeat comment here)

ctest "k8s.io/client-go/testing"
)

func TestNewTokenGetter(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking through the steps of the test I think the logic looks good overall. One thing I will note is that most of the tests we write for operator-controller end up being table driven tests that end up sharing the same test logic where possible. I think this test in particular could be a bit more readable and maintainable utilizing the table driven approach.

I don't think it is absolutely necessary if folks feel strongly about not requiring that change, but something I would personally like to see to stay consistent with many of the other unit tests we have throughout operator-controller

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ill change it to a table driven style 👍

Comment on lines 66 to 53
func (k *keyLock[K]) getLock(key K) *sync.Mutex {
k.mu.Lock()
defer k.mu.Unlock()

lock, ok := k.locks[key]
if !ok {
lock = &sync.Mutex{}
k.locks[key] = lock
}
return lock
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of locks will grow unbounded here, right? Maybe this is unnecessary complexity to start off with. WDYT about skipping the keyLock altogether. For now, maybe it's fine to get a global lock in the Get method. That would allow only one token to be requested at a time. If that becomes a problem, we can introduce key locks later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking back through this a second time I was wondering the same thing. I get why the key lock was created but agree that this is probably more complex than necessary for a first pass

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok ill remove the key locks for now

expireTime = token.ExpirationTimestamp.Time
}

expirationSecondsAfterNow := metav1.Now().Add(time.Duration(t.expirationSeconds))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the expiration seconds field here feels unintuitive to me. Maybe we should add another field to the struct called something like rotationThreshold that is a time.Duration type?

Copy link
Member

@joelanford joelanford Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe we make it some hardcoded fraction of expirationSeconds? Maybe if we're within 10% or something (i.e. if expiration is 100s, then we would get a new token anytime after 90s)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i guess setting that to expirationSeconds may be too much.. I will add a rotationThreshold so its configurable..

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if expirationSeconds is 100s and we set rotationThreshold to 10s we we would create a new token if its been more than 90s since the token was created like you suggested 👍

}

// Get returns a token from the cache if available and not expiring, otherwise creates a new token
func (t *TokenGetter) Get(ctx context.Context, key types.NamespacedName) (string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for not noticing this earlier in the review, but Joe's earlier comment on the tokenLocks potentially growing unbounded made me realize that the tokens themselves could potentially grow unbounded with the current implementation since we only have a Get function.

What do you think about adding a method that could be used to "prune" expired tokens from the cache?

I'm thinking this would be a method used as needed by a caller. I could imagine when we wire everything together we may want to ensure we prune any expired tokens that are older than X amount of time

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joelanford What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that theres no way to invalidate tokens issued by the TokenRequest api 🤔 but for our purposes i guess we just remove keys i.e. namespaced names of the service accounts from the token getter cache

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm thinking that it could simply be a process that runs every X amount of time and runs the method to ensure we aren't storing expired tokens for an unreasonable amount of time and adding bloat to the memory usage

Copy link
Contributor

@everettraven everettraven Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, I'm only recommending we implement the method that will delete the keys and tokens from the map that are expired and meet a certain criteria defined by the caller (like expired for more than 2 hours)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this a bit more, we would only remove entries from the cache when extensions are removed, in case there are expired tokens they would belong to service accounts of currently installed extensions. Further more the expired tokens will be replaced by fresh ones whenever an attempt is made to use the tokens so in theory tokens should not accumulate 🤔

Copy link
Member

@joelanford joelanford Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the complexity of a background goroutine? What if we write a reapExpiredTokens function and call it every time Get is called?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would make it simpler. Ive added a reapExpiredTokens This also eliminates the need for Clean and TokenExists

for _, opt := range options {
opt(tokenGetter)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we allow expiration seconds and rotation threshold to be independently configurable, we probably need to do some sanity validation to make sure they make sense.

But I'm not sure we actually need rotation threshold to be configurable if we just make it expirationDuration * 0.1 or whatever we think is a reasonable ratio. I feel like we may be over-engineering this.

func NewTokenGetter(client corev1.ServiceAccountsGetter, options ...TokenGetterOption) *TokenGetter {
tokenGetter := &TokenGetter{
client: client,
expirationSeconds: int64(5 * time.Minute), // default token ttl
Copy link
Member

@joelanford joelanford Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should change this to a time.Duration type and change the name to expirationDuration. The name ("Seconds" suffix) and value (integer number of nanoseconds in 5 minutes) lead a reader to an incorrect conclusion.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that was int64 because the TokenRequest api takes an pointer to an int64.. its better as a time.Duration 👍

Comment on lines 94 to 111
func (t *TokenGetter) Clean(ctx context.Context, key types.NamespacedName) {
t.mu.RLock()
defer t.mu.RUnlock()
delete(t.tokens, key)
}

func (t *TokenGetter) TokenExists(ctx context.Context, key types.NamespacedName) bool {
t.mu.RLock()
defer t.mu.RUnlock()
_, ok := t.tokens[key]
return ok
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we envision Clean and TokenExists being used?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean would be to remove entries from the cache when extensions are uninstalled or unused items need to be cleaned up for some other reason. TokenExists is debatable, as I added to make it testable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are made redundant by reapExpiredTokens so I have removed them

@skattoju skattoju force-pushed the token_getter branch 4 times, most recently from 0f3c60d to d3c041f Compare July 4, 2024 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement a Go struct for fetching and caching authentication tokens for a ServiceAccount
3 participants