Skip to content

Modern efficient cache design with simple FIFO queue only in Golang

License

Notifications You must be signed in to change notification settings

scalalang2/golang-fifo

Repository files navigation

Go Reference Go Report Card MIT License Coverage Status

golang-fifo

This is a modern cache implementation, inspired by the following papers, provides high efficiency.

This offers state-of-the-art efficiency and scalability compared to other LRU-based cache algorithms.

Basic Usage

import "github.com/scalalang2/golang-fifo/sieve"

size := 1e5
ttl := 0 // 0 means no expiration
cache := sieve.New[string, string](size, ttl)

// set value under hello
cache.Set("hello", "world")

// get value under hello
val, ok := cache.Get("hello")
if ok {
    fmt.Printf("value: %s", val) // => "world"
}

// set more keys
for i := 0; i < 10; i++ {
    cache.Set(fmt.Sprintf("key%d", i), fmt.Sprintf("value%d", i))
}

// get number of cache entries
fmt.Printf("len: %d", cache.Len()) // => 11

// remove value under hello
removed := cache.Remove("hello")
if removed {
	fmt.Println("hello was removed")
}

Expiry

import "github.com/scalalang2/golang-fifo/sieve"

size := 1e5
ttl := time.Second * 3600 // 1 hour
cache := sieve.New[string, string](size, ttl)

// this callback will be called when the element is expired
cahe.SetOnEvict(func(key string, value string) {
    fmt.Printf("key: %s, value: %s was evicted", key, value)
})

// set value under hello
cache.Set("hello", "world")

// remove all cache entries and stop the eviction goroutine.
cache.Close()

Benchmark Result

The benchmark result were obtained using go-cache-benchmark

itemSize=500000, workloads=7500000, cacheSize=0.10%, zipf's alpha=0.99, concurrency=16

      CACHE      | HITRATE |   QPS   |  HITS   | MISSES
-----------------+---------+---------+---------+----------
  sieve          | 47.66%  | 2508361 | 3574212 | 3925788
  tinylfu        | 47.37%  | 2269542 | 3552921 | 3947079
  s3-fifo        | 47.17%  | 1651619 | 3538121 | 3961879
  slru           | 46.49%  | 2201350 | 3486476 | 4013524
  s4lru          | 46.09%  | 2484266 | 3456682 | 4043318
  two-queue      | 45.49%  | 1713502 | 3411800 | 4088200
  clock          | 37.34%  | 2370417 | 2800750 | 4699250
  lru-groupcache | 36.59%  | 2206841 | 2743894 | 4756106
  lru-hashicorp  | 36.57%  | 2055358 | 2743000 | 4757000

SIEVE delivers both high hit rates and the highest QPS(queries per seconds) compared to other LRU-based caches. Additionally, It approximately improves 30% for efficiency than a simple LRU cache.

Increasing efficiency means not only reducing cache misses, but also reducing the demand for heavy operations such as backend database access, which lowers the mean latency.

While LRU promotes accessed objects to the head of the queue, requiring a potentially slow lock acquisition, SIEVE only needs to update a single bit upon a cache hit. This update can be done with a significantly faster reader lock, leading to increased performance.

The real-world traces are also evaluated at here

Appendix

Performance : golang-fifo
goos: linux
goarch: amd64
pkg: github.com/scalalang2/golang-fifo
cpu: Intel(R) Core(TM) i5-10600KF CPU @ 4.10GHz
BenchmarkCache
BenchmarkCache/cache=sieve
BenchmarkCache/cache=sieve/t=int32
BenchmarkCache/cache=sieve/t=int32-12    2765682               393.8 ns/op           148 B/op          4 allocs/op
BenchmarkCache/cache=sieve/t=int32-12    3037669               388.1 ns/op           149 B/op          4 allocs/op
BenchmarkCache/cache=sieve/t=int32-12    3075998               395.0 ns/op           149 B/op          4 allocs/op
BenchmarkCache/cache=sieve/t=int32-12    2924646               392.0 ns/op           148 B/op          4 allocs/op
BenchmarkCache/cache=sieve/t=int32-12    2632326               409.3 ns/op           148 B/op          4 allocs/op
BenchmarkCache/cache=sieve/t=int32-12    2746551               463.5 ns/op           148 B/op          4 allocs/op
BenchmarkCache/cache=sieve/t=int32-12    3004071               401.0 ns/op           148 B/op          4 allocs/op
BenchmarkCache/cache=sieve/t=int32-12    2398981               456.0 ns/op           149 B/op          4 allocs/op
BenchmarkCache/cache=sieve/t=int32-12    2698939               422.9 ns/op           148 B/op          4 allocs/op
BenchmarkCache/cache=sieve/t=int32-12    2647030               392.1 ns/op           148 B/op          4 allocs/op
BenchmarkCache/cache=sieve/t=int64
BenchmarkCache/cache=sieve/t=int64-12    2532614               414.1 ns/op           158 B/op          4 allocs/op
BenchmarkCache/cache=sieve/t=int64-12    2825973               419.3 ns/op           158 B/op          4 allocs/op
BenchmarkCache/cache=sieve/t=int64-12    2693790               407.1 ns/op           158 B/op          4 allocs/op
BenchmarkCache/cache=sieve/t=int64-12    2882792               414.7 ns/op           157 B/op          4 allocs/op
BenchmarkCache/cache=sieve/t=int64-12    2903197               421.7 ns/op           157 B/op          4 allocs/op
BenchmarkCache/cache=sieve/t=int64-12    2876046               435.7 ns/op           157 B/op          4 allocs/op
BenchmarkCache/cache=sieve/t=int64-12    2846494               410.4 ns/op           157 B/op          4 allocs/op
BenchmarkCache/cache=sieve/t=int64-12    2455807               440.1 ns/op           158 B/op          4 allocs/op
BenchmarkCache/cache=sieve/t=int64-12    2774462               435.1 ns/op           158 B/op          4 allocs/op
BenchmarkCache/cache=sieve/t=int64-12    2833150               433.9 ns/op           157 B/op          4 allocs/op
BenchmarkCache/cache=sieve/t=string
BenchmarkCache/cache=sieve/t=string-12           2117859               546.9 ns/op           186 B/op          4 allocs/op
BenchmarkCache/cache=sieve/t=string-12           2079752               527.1 ns/op           186 B/op          4 allocs/op
BenchmarkCache/cache=sieve/t=string-12           2210930               530.8 ns/op           186 B/op          4 allocs/op
BenchmarkCache/cache=sieve/t=string-12           2122942               514.4 ns/op           186 B/op          4 allocs/op
BenchmarkCache/cache=sieve/t=string-12           2222488               553.6 ns/op           186 B/op          4 allocs/op
BenchmarkCache/cache=sieve/t=string-12           2260266               558.6 ns/op           186 B/op          4 allocs/op
BenchmarkCache/cache=sieve/t=string-12           2239196               567.1 ns/op           186 B/op          4 allocs/op
BenchmarkCache/cache=sieve/t=string-12           2064308               576.8 ns/op           186 B/op          4 allocs/op
BenchmarkCache/cache=sieve/t=string-12           1882754               569.9 ns/op           185 B/op          4 allocs/op
BenchmarkCache/cache=sieve/t=string-12           1917342               574.6 ns/op           185 B/op          4 allocs/op
BenchmarkCache/cache=sieve/t=composite
BenchmarkCache/cache=sieve/t=composite-12        1825063               707.0 ns/op           223 B/op          4 allocs/op
BenchmarkCache/cache=sieve/t=composite-12        1745775               660.1 ns/op           224 B/op          4 allocs/op
BenchmarkCache/cache=sieve/t=composite-12        1680552               678.1 ns/op           225 B/op          4 allocs/op
BenchmarkCache/cache=sieve/t=composite-12        1774438               690.1 ns/op           224 B/op          4 allocs/op
BenchmarkCache/cache=sieve/t=composite-12        1530580               731.1 ns/op           226 B/op          4 allocs/op
BenchmarkCache/cache=sieve/t=composite-12        1663950               761.7 ns/op           225 B/op          4 allocs/op
BenchmarkCache/cache=sieve/t=composite-12        1607760               678.4 ns/op           225 B/op          4 allocs/op
BenchmarkCache/cache=sieve/t=composite-12        1703283               784.4 ns/op           225 B/op          4 allocs/op
BenchmarkCache/cache=sieve/t=composite-12        1295089               864.6 ns/op           229 B/op          4 allocs/op
BenchmarkCache/cache=sieve/t=composite-12        1552182               769.9 ns/op           226 B/op          4 allocs/op
Why LRU Cache is not good enough?
  • LRU is often implemented with a doubly linked list and a hash table, requiring two pointers per cache entry, which becomes large overhead when the object is small.
  • It promotes objects to the head of the queue upon cache hit, which performs at least six random memory accesses protected by lock, which limits the scalability.
Things to consider before adoption
  • Both S3-FIFO and SIEVE have a O(n) time complexity for cache eviction, which only occurs when all objects are hit the cache, which means that there is a perfect (100%) hit rate in the cache.
  • SIEVE is not designed to be scan-resistant. Therefore, it's currently recommended for web cache workloads, which typically follow a power-law distribution.
  • S3-FIFO filters out one-hit-wonders early, It bears some resemblance to designing scan-resistant cache eviction algorithms.
  • SIEVE scales well for read-intensive applications such as blogs and online shops, because it doesn't require to hold a writer lock on cache hit.
  • The golang-fifo library aims to provide a straightforward and efficient cache implementation, similar to hashicorp-lru and groupcache. Its goal is not to outperform highly specialized in-memory cache libraries (e.g. bigcache, freecache and etc).
Brief overview of SIEVE & S3-FIFO

Various workloads typically follows Power law distribution (e.g. Zipf's law) as shown in the following figure.

zipflaw_discovered_by_realworld

The analysis reveals that most requests are "one-hit-wonders", which means it's accessed only once. Consequently, a cache eviction strategy should quickly remove most objects after insertion.

S3-FIFO and SIEVE achieves this goal with simplicity, efficiency, and scalability using simple FIFO queue only.

s3-fifo-is-powerful-algorithm

Contribution

How to run unit test

$ go test -v ./...

How to run benchmark test

$ ./bench.sh