-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: implement cache abstraction and unit test #506
Changes from 10 commits
91c2b76
48c3a5e
1c4c288
d07ef28
d696f75
e194c16
8bf11a3
5a5f34b
e935d8f
d2b985c
a7152c2
6eac72a
77dce5b
b5a27ad
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,108 @@ | ||
package cache | ||
|
||
import ( | ||
"container/list" | ||
) | ||
|
||
// Node represents a node eligible for caching. | ||
type Node interface { | ||
GetKey() []byte | ||
} | ||
|
||
// Cache is an in-memory structure to persist nodes for quick access. | ||
// Please see lruCache for more details about why we need a custom | ||
// cache implementation. | ||
type Cache interface { | ||
// Adds node to cache. If full and had to remove the oldest element, | ||
// returns the oldest, otherwise nil. | ||
// CONTRACT: node can never be nil. Otherwise, cache panics. | ||
Add(node Node) Node | ||
|
||
// Returns Node for the key, if exists. nil otherwise. | ||
Get(key []byte) Node | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What the value stored is nil? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It should never be possible for the value to be nil. Currently, I added the Given the contract, it is impossible to have a nil value There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Gotcha. |
||
|
||
// Has returns true if node with key exists in cache, false otherwise. | ||
Has(key []byte) bool | ||
|
||
// Remove removes node with key from cache. The removed node is returned. | ||
// if not in cache, return nil. | ||
Remove(key []byte) Node | ||
|
||
// Len returns the cache length. | ||
Len() int | ||
} | ||
|
||
// lruCache is an LRU cache implementation. | ||
// The motivation for using a custom cache implementation is to | ||
// allow for a custom max policy. | ||
// | ||
// Currently, the cache maximum is implemented in terms of the | ||
// number of nodes which is not intuitive to configure. | ||
// Instead, we are planning to add a byte maximum. | ||
// The alternative implementations do not allow for | ||
// customization and the ability to estimate the byte | ||
// size of the cache. | ||
type lruCache struct { | ||
dict map[string]*list.Element // FastNode cache. | ||
maxElementCount int // FastNode the maximum number of nodes in the cache. | ||
ll *list.List // LRU queue of cache elements. Used for deletion. | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would prefer if we use https://github.com/hashicorp/golang-lru or https://github.com/hashicorp/golang-lru/tree/master/simplelru ? We are already using it in various places. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've looked into these options and, unfortunately, they do not allow for enough customization necessary for future work. I'm planning to further improve the custom cache by adding the ability to have 2 kinds of limits:
I have this work started here that I'm hoping to move to this repository: osmosis-labs#40 More background about why I'm introducing this cache and its limits can be found here: osmosis-labs/osmosis#1163 In short, we found it not intuitive configuring the limits in terms of nodes so we would like to provide the ability to do so in bytes only for fast cache There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe let's add a short comment in the code, before someone else will come and propose to remove this and reuse other library. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @p0mvn how about https://dgraph.io/blog/post/introducing-ristretto-high-perf-go-cache/ It supports setting byte high watermarks and that blogpost talks about their journey in figuring out some LRUs and caches. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @odeke-em thanks for the suggestion. I've investigated using ristretto by swapping it for fast nodes cache but have run into issues. The major one is a ~17% increase in time/op. I suspect that is due to ristretto having overhead from concurrency guarantees which we might not really need. Also, some randomized tests started failing for reasons I could not explain. The current state of the attempt, summary, and benchmarks can be found here: #508 |
||
|
||
var _ Cache = (*lruCache)(nil) | ||
|
||
func New(maxElementCount int) Cache { | ||
return &lruCache{ | ||
dict: make(map[string]*list.Element), | ||
maxElementCount: maxElementCount, | ||
ll: list.New(), | ||
} | ||
} | ||
|
||
func (c *lruCache) Add(node Node) Node { | ||
if e, exists := c.dict[string(node.GetKey())]; exists { | ||
c.ll.MoveToFront(e) | ||
old := e.Value | ||
e.Value = node | ||
return old.(Node) | ||
} | ||
|
||
elem := c.ll.PushFront(node) | ||
c.dict[string(node.GetKey())] = elem | ||
|
||
if c.ll.Len() > c.maxElementCount { | ||
oldest := c.ll.Back() | ||
|
||
return c.remove(oldest) | ||
} | ||
return nil | ||
} | ||
|
||
func (nc *lruCache) Get(key []byte) Node { | ||
if ele, hit := nc.dict[string(key)]; hit { | ||
nc.ll.MoveToFront(ele) | ||
return ele.Value.(Node) | ||
} | ||
return nil | ||
} | ||
|
||
func (c *lruCache) Has(key []byte) bool { | ||
_, exists := c.dict[string(key)] | ||
return exists | ||
} | ||
|
||
func (nc *lruCache) Len() int { | ||
return nc.ll.Len() | ||
} | ||
|
||
func (c *lruCache) Remove(key []byte) Node { | ||
if elem, exists := c.dict[string(key)]; exists { | ||
return c.remove(elem) | ||
} | ||
return nil | ||
} | ||
|
||
func (c *lruCache) remove(e *list.Element) Node { | ||
removed := c.ll.Remove(e).(Node) | ||
delete(c.dict, string(removed.GetKey())) | ||
return removed | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
package cache_test | ||
|
||
import ( | ||
"math/rand" | ||
"testing" | ||
|
||
"github.com/cosmos/iavl/cache" | ||
) | ||
|
||
func BenchmarkAdd(b *testing.B) { | ||
b.ReportAllocs() | ||
testcases := map[string]struct { | ||
cacheMax int | ||
keySize int | ||
}{ | ||
"small - max: 10K, key size - 10b": { | ||
cacheMax: 10000, | ||
keySize: 10, | ||
}, | ||
"med - max: 100K, key size 20b": { | ||
cacheMax: 100000, | ||
keySize: 20, | ||
}, | ||
"large - max: 1M, key size 30b": { | ||
cacheMax: 1000000, | ||
keySize: 30, | ||
}, | ||
} | ||
|
||
for name, tc := range testcases { | ||
cache := cache.New(tc.cacheMax) | ||
b.Run(name, func(b *testing.B) { | ||
for i := 0; i < b.N; i++ { | ||
b.StopTimer() | ||
key := randBytes(tc.keySize) | ||
b.StartTimer() | ||
|
||
_ = cache.Add(&testNode{ | ||
key: key, | ||
}) | ||
} | ||
}) | ||
} | ||
} | ||
|
||
func BenchmarkRemove(b *testing.B) { | ||
b.ReportAllocs() | ||
|
||
cache := cache.New(1000) | ||
existentKeyMirror := [][]byte{} | ||
// Populate cache | ||
for i := 0; i < 50; i++ { | ||
key := randBytes(1000) | ||
|
||
existentKeyMirror = append(existentKeyMirror, key) | ||
|
||
cache.Add(&testNode{ | ||
key: key, | ||
}) | ||
} | ||
|
||
randSeed := 498727689 // For deterministic tests | ||
r := rand.New(rand.NewSource(int64(randSeed))) | ||
b.ResetTimer() | ||
for i := 0; i < b.N; i++ { | ||
key := existentKeyMirror[r.Intn(len(existentKeyMirror))] | ||
_ = cache.Remove(key) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when someone adds a nil Node? That'll break this entire interface. If nil nodes can be permitted, it might be much more useful to return (last Node, evicted bool)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should never be a nil Node added. Would it be sufficient to add a comment saying that the contract for this method is that
node
is never nil?From reading this discussion, I could either:
Add
methodIsNil()
method for theNode
interface and implement it for every node struct.However, if we were to go with one of the approaches, I think that this would be trading performance for safety unnecessarily at such a low database layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment will perhaps suffice. Thank you @p0mvn.