fs: get inodes via pure go instead of find #1576

euank · 2017-01-16T18:09:33Z

This also makes the count more accurate since hardlinks no longer
double-count.

euank · 2017-01-16T18:13:42Z

fs/fs.go

+
+	err := filepath.Walk(dir, func(path string, info os.FileInfo, err error) error {
+		if err != nil {
+			return fmt.Errorf("unable to count inodes for part of dir %s: %s", dir, err)


This matches the previous behavior because find would exit non-zero on most types of errors you might run into here (notably permission denied), so the err code would be hit before too.

There might be some cases this wouldn't work though.

timstclair · 2017-01-18T00:26:04Z

fs/fs.go

+
+	inodes := map[uint64]struct{}{}
+
+	err := filepath.Walk(dir, func(path string, info os.FileInfo, err error) error {


Won't this traverse devices?

Good call, I'll skip em.

timstclair · 2017-01-18T00:27:38Z

fs/fs.go

+		if !ok {
+			return fmt.Errorf("unsupported fileinfo; could not convert to stat_t")
+		}
+		inodes[s.Ino] = struct{}{}


I understand that this increases accuracy, but what's the performance impact on a relatively full FS? Would be good to do a comparison to the find method too.

euank · 2017-01-19T00:45:10Z

Numbers are a good idea, so I gave it a quick go on a few sample directories.
I wrote a go program (test-get-inode-usage) that does nothing but call the package-level function in fs to compare with.

My methodology was to call each command 3 times to warm the cache, and then run it once more and record the output.

When run on cadvisor itself:

$ time test-get-inode-usage
2331                                                  
test-get-inode-usage  0.01s user 0.00s system 21% cpu 0.047 total
$ time sh -c 'find -xdev -printf '.' | wc -c'
2331
sh -c 'find -xdev -printf '.' | wc -c'  0.00s user 0.00s system 0% cpu 0.010 total

When run on a 100G directory with quite a few files, recursing, and all that jazz

$ time sh -c 'find -xdev -printf '.' | wc -c'
1461051

real	0m1.713s
user	0m0.560s
sys	0m1.140s


$ time test-get-inode-usage 
1459087

real	0m5.102s
user	0m2.480s
sys	0m2.810s

On /var/lib/docker with a modest set of containers lying around

$ time sh -c 'find -xdev -printf '.' | wc -c'
3764597

real	0m3.678s
user	0m1.030s
sys	0m2.620s


$ time test-get-inode-usage 
1147504

real	0m13.063s
user	0m6.500s
sys	0m6.990s


$ time sh -c 'find . -printf "%i\n" | sort | uniq | wc -l'
1147504

real	0m12.189s
user	0m10.180s
sys	0m2.760s

It's interesting to note that for /var/lib/docker, the new code is roughly 3x more correct than the old code!

I checked what a random set of files with the same inode looked like (via find -links +2 and find -samefile); an example of the sort of duplicates is:

./overlay/fc737a3304e188bc985d07d7e5896393e1c7c97c712d83317c8107cddc5ff58b/root/tmp/openssl-1.0.2e/doc/ssl/SSL_set_verify_result.pod
./overlay/9660ad5520bb82e857bcc326fb1bb9469f0cd0b10bf7dada0c3f81bb6af5c944/root/tmp/openssl-1.0.2e/doc/ssl/SSL_set_verify_result.pod
...

timstclair · 2017-01-19T22:32:07Z

Accuracy increase is great, we should definitely push forward with this, but I'm a little concerned about the slowdown. If I'm reading the cAdvisor code correctly, it looks like we might be running a separate FSHandler per-container, all scanning the same rootfs. If we only run a single handler scanning the rootfs, this would easily overshadow any performance downsides of this change (although it's still worth benchmarking). Would you be willing to pursue this optimization?

euank · 2017-01-31T23:13:57Z

@timstclair Mind explaining what you mean more clearly?

My reading of the code is that the fshandlers are created per-container and apply to the read/write upper layer, so the directory is unique per container and can't be so easily deduped.

Maybe I'm missing something, and there certainly are optimizations that could be made by being smarter, but it seems like a change to make in the future if we're okay with the decrease in performance of this change.

dashpole · 2017-02-01T17:54:28Z

My reading of the code is that the fshandlers are created per-container and apply to the read/write upper layer, so the directory is unique per container and can't be so easily deduped.

I believe @euank is correct in that we have one FsHandler per container, and we are only counting inodes using find on the container's writeable layer.

Given that we only currently use this on the container's r/w upper layer, will we actually see much of an accuracy increase? It looks like the find command has issues counting inodes used by an overlay filesystem, but does a decent job with "regular" files and folders.

The latency increase definitely concerns me, as we believe we have some issues related to evictions using data that isnt recent enough. I am also curious how the latency would change if we were running more than one command at a time (which is the case with multiple containers).

If a couple extra days of delay on this is alright, I would like to collect some data on latency in use myself.

euank · 2017-02-01T19:10:25Z

@dashpole if you're willing to do the footwork to get better numbers, that'd be awesome!

dashpole · 2017-02-01T19:12:21Z

I mostly just want to get numbers in the context of our e2e eviction tests.

euank · 2017-02-01T19:30:47Z

@dashpole @timstclair it's also an option to keep the pure go but remove the map (and thus memory allocation) which will make this only slightly slower than the find method I think.

I'll also point out that on slower disks (e.g. EBS or spinning rust), both find and this will be roughtly equally slower. On my laptop's SSD the difference is more stark than in environments with worse disk io.

And finally, I think it's still worth doing hardlink deduplication if we can in container rootfs too. There aren't too many applications that use hardlinks at runtime, but they're out there, and it would be great to not evict applications that are doing the right thing in terms of saving inodes.

I have some quite hacky rsync scripts on my local kubernetes cluster that do use hardlinks and would probably benefit from this change.

dashpole · 2017-02-01T19:39:07Z

If there aren't any downsides to removing the map, and the change isn't too hard to make, then ill do my testing after that.

euank · 2017-02-01T19:41:46Z

@dashpole The downside is that hardlinks get double-counted, which is why I'd prefer to leave it if possible.

dashpole · 2017-02-01T19:44:48Z

Oh, I see. Ill just run it on this then

timstclair · 2017-02-02T21:00:45Z

My reading of the code is that the fshandlers are created per-container and apply to the read/write upper layer, so the directory is unique per container and can't be so easily deduped.

You're right, I misread. Thanks for clearing it up!

How about only using the map if Nlink > 1?

dashpole · 2017-02-02T21:21:55Z

I ran the inode eviction test with and without this change.
With this change, the test flaked once, out of 4 runs, and each run had a call to GetDirInodeUsage that took > 10s.
Without this change, the test did not flake in 9 runs, and had NO calls to GetDirInodeUsage that took > 1s. The highest recorded was ~700ms.

Ill run some more testing on this change, to make sure that these stats are representative, but this version may pose too high of latency increases.

I spoke with @vishh, and he suggested that we could possibly optimize this in two ways:
parallelization of walking the tree with limits on the max number of threads
combination of this and the GetDirDiskUsage: if we are making the effort of walking the tree, we might as well collect both disk space and inode stats at the same time.

dashpole · 2017-02-03T19:09:41Z

I have done a couple more runs with the change, and I havent seen any that were as slow as the first. The highest seems to be around 2.5 seconds.
I tried it with the change timstclair suggested, and the results were about the same.

I realized something while running my tests: Some of the rarely seen extremely long times (> 10s) may be due to the limits on the number of concurrent exec calls, and they could be waiting to be able to run.
Combining the GetDirInodeUsage and GetDirDiskUsage functions may alleviate this, since we will have half the total number of calls.
This can be done with info.Size()

I think the latency increase is still too large ATM, and I would love to try some optimizations before merging anything.

This also makes the count more accurate since hardlinks no longer double-count.

k8s-ci-robot · 2017-03-10T19:09:31Z

@euank: The following test(s) failed:

Test name	Commit	Details	Rerun command
Jenkins GCE e2e	`6a36b4c`	link	`@k8s-bot test this`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

euank · 2017-03-10T19:12:09Z

I did some more fiddling with optimization here. Tim's suggestion to reduce the map's usage was a good one, and that did help some (though as @dashpole points out, it's still slower).

At this point, the slowness largely comes from Go's implementation of Walk in the stdlib, not from the code in my walkFn. Exchanging the walkFn for an empty body results in hardly any difference.

I tried parallelizing via powerwalk (https://github.com/euank/cadvisor/tree/find-inodes-2) and tweaked the parallelism a bit, but that was slower for all the parallelism values I tried (10, 20, 100, 200).

I didn't really expect it to since sequential access of files (the walk access -> fstat from the fn being sequential) on a (fairly unfragmented) filesystem is usually faster.

I'm happy to merge inode and dir usage and into one walk operation. I'm not sure if that'll end up quicker than the current C implementations combined though.

timstclair · 2017-03-10T19:34:12Z

Take a look at golang/go#16399, in particular the solution for goimports: https://go-review.googlesource.com/c/25001/11/imports/fastwalk.go. Unfortunately that's not exported, but we could just clone it if it's sufficiently better. Another possible improvement here: https://gist.github.com/YuriyNasretdinov/a68b6a997216103b13ea0baa4204230f

euank · 2017-03-14T18:24:43Z

@timstclair The primary issue mentioned in the linked go thread is that for find the caller doesn't need the result of fstat.

However, we do use the fstat result to get the inode, so it's not totally wasted for us. If we're going to merge this with the filesystem size code, then we'll actually need the full fstat.

The goimports fastwalk function doesn't need the inode, but getdirents does include it so most of the optimizations it does can apply here if we want to get the inode, but don't want to merge in the directory size code.

If we do plan to merge inode and size accounting, then we won't want any of the optimizations linked since all of them optimize away the ability to get filesize.

Which direction do you think I should head down?

Optimizing inode counting in isolation, or unifying size/inode counting code to amortize the time that way.

dashpole · 2017-03-14T19:10:21Z

@euank Ill try and get some numbers on "du" latency. Sounds like that may be slower than find, since it needs fstat. If we can collect both inodes and disk usage in comparable time to du, then this obviously would be a great thing to add.

vishh · 2017-03-14T21:19:38Z

One more future optimization to consider is that of using inotify to update usage instead of having to periodically walk the filesystem. inotify has been a bit unreliable, but its worth a try since the resource cost would become much smaller and we can get more accurate disk stats.

…

On Tue, Mar 14, 2017 at 12:10 PM, David Ashpole ***@***.***> wrote: @euank <https://github.com/euank> Ill try and get some numbers on "du" latency. Sounds like that may be slower than find, since it needs fstat. If we can collect both inodes and disk usage in comparable time to du, then this obviously would be a great thing to add. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1576 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGvIKAPdkn6x8IUvN9uvb6iE_Ai8aisIks5rluYigaJpZM4Lk1-s> .

euank · 2019-03-28T15:18:18Z

Merged as part of #2171; this one can be closed

euank force-pushed the find-inodes branch from 3898518 to 1d32c47 Compare January 16, 2017 18:12

euank commented Jan 16, 2017

View reviewed changes

timstclair assigned dashpole Jan 18, 2017

timstclair reviewed Jan 18, 2017

View reviewed changes

euank force-pushed the find-inodes branch from a403efd to 143f72f Compare January 18, 2017 09:44

euank force-pushed the find-inodes branch from 143f72f to 6e94871 Compare January 19, 2017 10:18

dashpole mentioned this pull request Feb 1, 2017

Update dependencies: aws-sdk-go to 1.6.10; also cadvisor kubernetes/kubernetes#40095

Merged

euank added 5 commits March 10, 2017 11:03

fs: get inodes via pure go instead of find

d00fd8c

This also makes the count more accurate since hardlinks no longer double-count.

fs: skip other devices when getting inode usage

a4d8343

Move to be a package level function

973fda4

Ignore spurious filesystem changes

796502c

fs: reduce map usage for inode counting

6a36b4c

euank force-pushed the find-inodes branch from 2892e34 to 6a36b4c Compare March 10, 2017 19:04

This was referenced Mar 10, 2017

Collect metrics for latency of container disk stats collection #1610

Open

Kubelet evictions - whats remaining? kubernetes/kubernetes#31362

Closed

timstclair mentioned this pull request Apr 3, 2017

BusyBox find does not support -printf #1556

Open

dixudx mentioned this pull request Aug 1, 2017

[kubelet] find: unrecognized: -printf kubernetes/kubernetes#49556

Closed

gautamdivgi mentioned this pull request Aug 17, 2018

Use non-blocking claimToken in fs.go prior to running du and find #2022

Open

namreg mentioned this pull request Feb 15, 2019

fs: get inodes and disk usage via pure go #2171

Merged

euank closed this Mar 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fs: get inodes via pure go instead of find #1576

fs: get inodes via pure go instead of find #1576

euank commented Jan 16, 2017

euank Jan 16, 2017

timstclair Jan 18, 2017

euank Jan 18, 2017

timstclair Jan 18, 2017

euank commented Jan 19, 2017 •

edited

Loading

timstclair commented Jan 19, 2017

euank commented Jan 31, 2017 •

edited

Loading

dashpole commented Feb 1, 2017 •

edited

Loading

euank commented Feb 1, 2017

dashpole commented Feb 1, 2017

euank commented Feb 1, 2017 •

edited

Loading

dashpole commented Feb 1, 2017

euank commented Feb 1, 2017

dashpole commented Feb 1, 2017

timstclair commented Feb 2, 2017

dashpole commented Feb 2, 2017

dashpole commented Feb 3, 2017

k8s-ci-robot commented Mar 10, 2017

euank commented Mar 10, 2017

timstclair commented Mar 10, 2017

euank commented Mar 14, 2017 •

edited

Loading

dashpole commented Mar 14, 2017

vishh commented Mar 14, 2017 via email

euank commented Mar 28, 2019


		inodes := map[uint64]struct{}{}

		err := filepath.Walk(dir, func(path string, info os.FileInfo, err error) error {

fs: get inodes via pure go instead of find #1576

fs: get inodes via pure go instead of find #1576

Conversation

euank commented Jan 16, 2017

euank Jan 16, 2017

Choose a reason for hiding this comment

timstclair Jan 18, 2017

Choose a reason for hiding this comment

euank Jan 18, 2017

Choose a reason for hiding this comment

timstclair Jan 18, 2017

Choose a reason for hiding this comment

euank commented Jan 19, 2017 • edited Loading

timstclair commented Jan 19, 2017

euank commented Jan 31, 2017 • edited Loading

dashpole commented Feb 1, 2017 • edited Loading

euank commented Feb 1, 2017

dashpole commented Feb 1, 2017

euank commented Feb 1, 2017 • edited Loading

dashpole commented Feb 1, 2017

euank commented Feb 1, 2017

dashpole commented Feb 1, 2017

timstclair commented Feb 2, 2017

dashpole commented Feb 2, 2017

dashpole commented Feb 3, 2017

k8s-ci-robot commented Mar 10, 2017

euank commented Mar 10, 2017

timstclair commented Mar 10, 2017

euank commented Mar 14, 2017 • edited Loading

dashpole commented Mar 14, 2017

vishh commented Mar 14, 2017 via email

euank commented Mar 28, 2019

euank commented Jan 19, 2017 •

edited

Loading

euank commented Jan 31, 2017 •

edited

Loading

dashpole commented Feb 1, 2017 •

edited

Loading

euank commented Feb 1, 2017 •

edited

Loading

euank commented Mar 14, 2017 •

edited

Loading