Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bazel remote cache: add eviction #6812

Merged
merged 7 commits into from
Feb 15, 2018

Conversation

BenTheElder
Copy link
Member

@BenTheElder BenTheElder commented Feb 14, 2018

  • add --remount which will attempt to remount the storage device with strictatime,lazyatime which will have the filesystem track access times for entries with lazy flushing to disk
    • this takes advantage of the fact that ext4 supports lazyatime which should have better characteristics than just enabling atime
    • We are doing this because Linux defaults to relatime which is not very useful for us since it only updates at most once every 24 hours or when the file is modified, but access time tracking via atime is simple and durable
  • add eviction based on access time

also vendors a small library for getting atime cross platform, go's stdlib leaves this as entirely system dependent even though most (all?) platforms support this.

xref: #6808

/area bazel
/area kinda-hacky 😬

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. area/bazel cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 14, 2018
@@ -38,6 +38,7 @@ filegroup(

go_image(
name = "image",
base = "@alpine-base//image",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

distroless doesn't have any utilities, I've spun up alpine and it has mount etc.

@BenTheElder
Copy link
Member Author

I've SSHed to the cache node and verified that a privileged container can remount with these options enabled and that this does correctly track the access times.

/assign @cjwagner @ixdy @krzyzacy

// remounts with 'strictatime,lazyatime'
func remountWithLazyAtime(device, mountPoint string) error {
_, err := commandLines([]string{
"mount", "-o", "remount,strictatime,lazytime", device, mountPoint,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couldn't you use syscall.Mount and pass syscall.MS_REMOUNT | syscall.MS_STRICTATIME | syscall.MS_LAZYTIME as options?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, I guess go doesn't know about lazytime yet. lxc/incus@7446f6b

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though golang.org/x/sys/unix has it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also have to parse and plumb through the type, mount is a little more magical

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this offline, shelling out smells a bit gross but syscall.Mount also appears to be a wrapper around mount(2) anyhow.

@ixdy
Copy link
Member

ixdy commented Feb 14, 2018

broader question: is this a better approach than just keeping track of access times via GETs in nursery itself?

@BenTheElder
Copy link
Member Author

BenTheElder commented Feb 14, 2018

broader question: is this a better approach than just keeping track of access times via GETs in nursery itself?

I thought about this, implementing this correctly and efficiently is tricky. We're write heavy and we don't want to block when scanning the cache for eviction. Remounting gets us efficient access time tracking for ~free.

More specifically I'm planning to do eviction similar to the kubelet, IE periodically get disk usage and when it passes some threshold(s) scan the cache and start kicking things out.
We could just do something like toss a map of keys to timestamps and mutex in nursery but then we stall while evicting. I'd really like to avoid evicting things unless we're out of disk space and I do want to evict things that are least recently used.

@BenTheElder
Copy link
Member Author

Also from offline discussion: the filesystem also gets us free persistence and concurrent write / read. Having lazyatime should help prevent this from being overly abusive to the disk.

Downsides: lazyatime is ext4 specific (thankfully lots of linux things including our local ssd use it anyhow), this also smells a bit of coupling but I've placed the remount code behind a feature flag so this seems OK.

@BenTheElder BenTheElder force-pushed the cache-hacks branch 2 times, most recently from c2e46b3 to 93eae1f Compare February 14, 2018 06:00
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 14, 2018
@BenTheElder BenTheElder force-pushed the cache-hacks branch 2 times, most recently from a0e3048 to 7f9c610 Compare February 14, 2018 06:24
@BenTheElder BenTheElder changed the title [WIP] bazel remote cache: add eviction bazel remote cache: add eviction Feb 14, 2018
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 14, 2018
@BenTheElder BenTheElder force-pushed the cache-hacks branch 3 times, most recently from 5f9a6b0 to d9d07a4 Compare February 14, 2018 07:00
@BenTheElder
Copy link
Member Author

This should be ready for review.

I've definitely left room in the API for the cache to track things internally in the future in which case we can remove the remount code.

@BenTheElder
Copy link
Member Author

I've deployed this and created / cat-ed a test file, ls -lu shows the timestamps changing 👍

@BenTheElder
Copy link
Member Author

I've spoken to @verult on storage about this: ~ next quarter local volumes should be beta (alpha currently) and then we can in theory use PV mount options instead and delete the --remount handling. There are a few other considerations that come with local volumes but we can probably sort those out then.

@BenTheElder
Copy link
Member Author

/test pull-test-infra-bazel-canary

// PathToKey converts a path on disk to a key, assuming the path is actually
// under DiskRoot() ...
func (c *Cache) PathToKey(key string) string {
return strings.TrimPrefix(key, c.diskRoot+string(os.PathSeparator))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should document that c.diskRoot cannot have a trailing slash. Specifying the root dir as the diskRoot could be a little strange too since it would need to be specified as "" instead of "/".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll probably just make the constructor handle this instead, good point though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the constructor now trims os.PathSeperator

flag.Parse()
logger := log.WithFields(log.Fields{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should pass this logger to functions like cacheHandler and monitorDiskAndEvict instead of using the default undecorated logger or adding the field component=nusery in multiple places.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about creating a custom formatter with the default field instead, there are lots of places the subpackages should log things that won't be decorated. We might want to do this in prow as well.

if err != nil {
logger.WithError(err).Error("Failed to remount with lazyatime!")
}
logger.Info("Remount complete")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only log this if we successfully remount?
Also, should remount failures be fatal?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A) yes, fixing
B) probably not, the cache will still work, the eviction will just be worse

@@ -142,3 +176,42 @@ func cacheHandler(cache *diskcache.Cache) http.Handler {
}
})
}

func monitorDiskAndEvict(cache *diskcache.Cache, interval time.Duration, minBlocksFree, minFilesFree float64) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider moving this to the diskcache package and making it a method of Cache.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

blocksFree, filesFree := diskutil.GetDiskUsage(dir)
// if we are past either threshold, evict until we are not
if blocksFree < minBlocksFree || filesFree < minFilesFree {
logger.WithFields(log.Fields{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be useful to log every tick instead of only when we actually do evictions. Then we could monitor the disk usage more easily.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

// so we can pop entries until we have evicted enough
files := cache.GetEntries()
sort.Slice(files, func(i, j int) bool {
return files[i].LastAccess.After(files[j].LastAccess)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want Before not After?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Originally I was popping from the end of the slice. Thanks!

Copy link
Member

@krzyzacy krzyzacy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly lgtm, a few nit-picks

also what's the expected behavior when the eviction blows away? We probably want to get notified somehow? (though it should not happen?)

"files-free": filesFree,
})
if len(files) < 1 {
logger.Fatalf("Failed to find entries to evict!")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: nothing to format, just use logger.Fatal

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


// MonitorDiskAndEvict loops monitoring the disk, evicting cache entries
// when the disk passes either minPercentBlocksFree or minPercentFilesFree
func (c *Cache) MonitorDiskAndEvict(interval time.Duration, minPercentBlocksFree, minPercentFilesFree float64) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe also add a test for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, how would that work? I'd need to fill up a disk 😬

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you can make a fake diskutil?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think tick loops should be unit tested. This was originally in main.go partially due to that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline, adding metrics / monitoring is coming later, and the failure modes for this are such that it's not a real concern (jobs will continue to work, but perhaps slower).

I will add regression tests if anything does go wrong here.

entry, files = files[0], files[1:]
err := c.Delete(c.PathToKey(entry.Path))
if err != nil {
logger.WithError(err).Error("Error deleting entry")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe also log path

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM, done

// GetDiskUsage wraps syscall.Statfs
func GetDiskUsage(path string) (percentBlocksFree, percentFilesFree float64) {
var stat syscall.Statfs_t
syscall.Statfs(path, &stat)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check error

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, passed up and checked at call sites / logged

// NOTE: remount is a bit of a hack, unfortunately the kubernetes volumes
// don't really support this and to cleanly track entry access times we
// want to use a volume with lazyatime (and not noatime or relatime)
// so that file access times *are* recorded but are lazily flushed to the disk
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also link some doc about lazyatime for future reference?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/done, also so/lazyatime/strictatime,lazytime/

var minPercentBlocksFree = flag.Float64("min-percent-blocks-free", 10,
"minimum percent of blocks free on --dir's disk before evicting entries")
var minPercentFilesFree = flag.Float64("min-percent-files-free", 10,
"minimum percent of blocks free on --dir's disk before evicting entries")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/blocks/files?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done 😕

@krzyzacy
Copy link
Member

forgot to push?

@BenTheElder
Copy link
Member Author

forgot to push?

heh I was just pushing

var stat syscall.Statfs_t
err = syscall.Statfs(path, &stat)
if err != nil {
return 0, 0, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0, 0, err?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:( fixed

@krzyzacy
Copy link
Member

might also worth adding more docs once we start to use it

/lgtm
/hold

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Feb 15, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: BenTheElder, krzyzacy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [BenTheElder,krzyzacy]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@BenTheElder
Copy link
Member Author

might also worth adding more docs once we start to use it

I will write thorough docs if/when this moves out of experiment/. For now I plan to test against canary jobs and test-infra. If that goes well I'll rename, doc, cleanup, and move in a follow up. Also definitely metrics and a dashboard / alerts. See: #6808

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 15, 2018
@k8s-ci-robot k8s-ci-robot merged commit f5b517c into kubernetes:master Feb 15, 2018
@BenTheElder BenTheElder deleted the cache-hacks branch February 15, 2018 20:13
@BenTheElder
Copy link
Member Author

other than needing #6850 to fix the DefaultEntryFormatter this is WAI so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/bazel cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants