Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement LRU cache for storing SVIDs in SPIRE Agent #3181

Merged
merged 25 commits into from
Sep 14, 2022

Conversation

prasadborole1
Copy link
Contributor

@prasadborole1 prasadborole1 commented Jun 22, 2022

Updating SPIRE agent SVID cache to be LRU cache. This cache has experimental config fields like MaxSvidCacheSize and SVIDCacheExpiryPeriod.

  1. Size limit of SVID cache is a soft limit which means if SVID has a subscriber present then that SVID is never removed from cache.
  2. Least recently used SVIDs are removed from cache only after the cache expiry period has passed. This is done to reduce the overall cache churn.
  3. Last access timestamp for SVID cache entry is updated when a new subscriber is created.
  4. When a new subscriber is created and if there is a cache miss then subscriber needs to wait for next SVID sync event to receive WorkloadUpdate with newly minted SVID

More context: #2940

Testing:

In addition to new integration test, also tested locally with 9k registrations per agent and validated:

  1. SVIDs are properly getting cached as per subscriber selector set
  2. SVIDs are getting removed after expiry
  3. Tested with workload associated with 8k entries, cli command fails with expected error codes DeadlineExceeded or ResourceExhausted due to huge response size. But validated that SVIDs cache is properly getting constructed.

azdagron
azdagron previously approved these changes Jun 22, 2022
Copy link
Member

@azdagron azdagron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks much!

@azdagron azdagron dismissed their stale review June 22, 2022 18:06

Wrong PR!

@azdagron
Copy link
Member

Sorry, I accidentally accepted this PR by mistake.

Signed-off-by: Prasad Borole <prasadb@uber.com>
Signed-off-by: Prasad Borole <prasadb@uber.com>
Signed-off-by: Prasad Borole <prasadb@uber.com>
Signed-off-by: Prasad Borole <prasadb@uber.com>
Signed-off-by: Prasad Borole <prasadb@uber.com>
Signed-off-by: Prasad Borole <prasadb@uber.com>
Signed-off-by: Prasad Borole <prasadb@uber.com>
Signed-off-by: Prasad Borole <prasadb@uber.com>
Signed-off-by: Prasad Borole <prasadb@uber.com>
Copy link
Collaborator

@rturner3 rturner3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor comments, otherwise LGTM

pkg/agent/manager/cache/cache.go Show resolved Hide resolved
pkg/agent/manager/cache/cache.go Outdated Show resolved Hide resolved
pkg/agent/manager/manager.go Outdated Show resolved Hide resolved
pkg/agent/manager/manager.go Outdated Show resolved Hide resolved
pkg/agent/manager/manager.go Outdated Show resolved Hide resolved
pkg/agent/manager/config.go Outdated Show resolved Hide resolved
pkg/agent/manager/manager_test.go Outdated Show resolved Hide resolved
Signed-off-by: Prasad Borole <prasadb@uber.com>
rturner3
rturner3 previously approved these changes Aug 31, 2022
bar := makeRegistrationEntry("BAR", "B")

// check empty result
assert.Equal(t, []*common.RegistrationEntry{},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert.Empty() is also nice for this sort of thing. No need to change here though. We've iterated a lot on this PR already 😀

@MarcosDY MarcosDY modified the milestones: 1.4.1, 1.4.2 Sep 1, 2022
@evan2645 evan2645 modified the milestones: 1.4.2, 1.4.3 Sep 6, 2022
m.cache.SyncSVIDsWithSubscribers()
staleEntries := m.cache.GetStaleEntries()
if len(staleEntries) > 0 {
return m.updateSVIDs(ctx, staleEntries, m.cache)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this contend with the normal sync goroutine? could this cause two goroutines to pick up the same set of stale entries and effectively request the server sign the same SVID twice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. There's a chance that this may happen. Updated code to put fetching of SVIDs under lock.

Signed-off-by: Prasad Borole <prasadb@uber.com>
data_dir = "/opt/spire/data/server"
log_level = "DEBUG"
ca_ttl = "1h"
default_svid_ttl = "10m"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this configuration now need to change to use the experimental flag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The experimental flag is added to agent config and not server config.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops. Wires crossed :) Thanks.

@azdagron
Copy link
Member

Thanks, @prasadborole1 ! I think we're ready to get this in so we can get some operational experience with it to iron out any remaining issues.

@azdagron azdagron merged commit 6689c36 into spiffe:main Sep 14, 2022
stevend-uber pushed a commit to stevend-uber/spire that referenced this pull request Oct 16, 2023
Signed-off-by: Prasad Borole <prasadb@uber.com>
Co-authored-by: Ryan Turner <turner@uber.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants