Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uses atomics for read-only account cache entry index #32518

Merged
merged 1 commit into from
Aug 7, 2023

Conversation

behzadnouri
Copy link
Contributor

Problem

Improve loads on read-only account cache.

Summary of Changes

  • read-only account cache uses an atomic for index.
  • ReadOnlyAccountsCache::load no longer needs a mutable entry.

@codecov
Copy link

codecov bot commented Jul 17, 2023

Codecov Report

Merging #32518 (57ff271) into master (6eea38d) will decrease coverage by 0.1%.
Report is 2 commits behind head on master.
The diff coverage is 95.0%.

@@            Coverage Diff            @@
##           master   #32518     +/-   ##
=========================================
- Coverage    82.0%    82.0%   -0.1%     
=========================================
  Files         785      785             
  Lines      211192   211204     +12     
=========================================
  Hits       173200   173200             
- Misses      37992    38004     +12     

@jeffwashington
Copy link
Contributor

@behzadnouri I'm happy with this change. It does allow us to get rid of the write lock. Along with the idea of only updating the lru once per interval instead of every time, we can eliminate the contention on the lru queue as well. This avoids us having to modify IndexList.
Thoughts?

@behzadnouri
Copy link
Contributor Author

@behzadnouri I'm happy with this change. It does allow us to get rid of the write lock. Along with the idea of only updating the lru once per interval instead of every time, we can eliminate the contention on the lru queue as well. This avoids us having to modify IndexList. Thoughts?

"updating the lru once per interval instead of every time"
sounds a bit too heuristic to me.

What is blocking to add the new move_to_last api to IndexList crate?

@jeffwashington
Copy link
Contributor

What is blocking to add the new move_to_last api to IndexList crate?

It's externally owned and not on a schedule we control. I have not heard anything on my draft pr yet. I am unclear of their goals or requirements for getting this in.

In the interim, I have a pr for discussion to use the subset of the code we need and add move_to_last.

@jeffwashington
Copy link
Contributor

"updating the lru once per interval instead of every time"
sounds a bit too heuristic to me.

I'm personally comfortable with lrus of caches being heuristic. Maybe this heuristic can help us come up with a better idea or heuristic. The system has to be correct and tolerant whether an account is in the read only cache or not. We'd know that a heuristic is bad if the cache hit rate is low. Otherwise, load will be faster and cache hit rate should be similarly high.

Hardcoding the owner being vote account is a bridge too far for me, even though that knocks out the biggest offender ;-)

We could also always update the timestamp, but then when we look at the oldest items in lru, we could get rid of the ones at the end that have the oldest timestamp. So, even if we didn't update lru list, we'd still effectively get rid of the oldest items, while only searching a small set of entries. That would get our heuristic to an even closer match to reality.

@jeffwashington
Copy link
Contributor

We could also always update the timestamp, but then when we look at the oldest items in lru, we could get rid of the ones at the end that have the oldest timestamp. So, even if we didn't update lru list, we'd still effectively get rid of the oldest items, while only searching a small set of entries. That would get our heuristic to an even closer match to reality.

Actually, at eviction time (which is already rare and could also be done in the bg) all we have to do is search backwards in lru for all entries with timestamp relative to oldest lru item < lru update time (100ms, 10ms, whatever). There will be very few items to scan. Then, we can exactly hit the oldest items for eviction with minimal effort/time. But all this is overkill in practice, I think.
Anything that hasn't been accessed in that long is old enough to be arbitrarily thrown away anyway once it makes its way to the end of the lru queue.
THe most performant thing to do is not run code. So, not updating the lru will always be faster than any kind of update.

Using atomics for entry indices allows load to use self.cache.get
instead of get_mut which reduces lock contention on the respective
dash-map shard.
Copy link
Contributor

@jeffwashington jeffwashington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. ty. Good building block to get rid of write lock.

@behzadnouri behzadnouri merged commit b9a2030 into solana-labs:master Aug 7, 2023
4 checks passed
@behzadnouri behzadnouri deleted the read-only-cache-atomic branch August 7, 2023 17:47
jeffwashington pushed a commit to jeffwashington/solana that referenced this pull request Aug 8, 2023
)

Using atomics for entry indices allows load function to use self.cache.get
instead of get_mut which reduces lock contention on the respective
dash-map shard.
jeffwashington pushed a commit to jeffwashington/solana that referenced this pull request Aug 8, 2023
)

Using atomics for entry indices allows load function to use self.cache.get
instead of get_mut which reduces lock contention on the respective
dash-map shard.
jeffwashington pushed a commit to jeffwashington/solana that referenced this pull request Aug 9, 2023
)

Using atomics for entry indices allows load function to use self.cache.get
instead of get_mut which reduces lock contention on the respective
dash-map shard.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants