Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use as many goroutines for ForgetInode op #79

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

kungf
Copy link
Contributor

@kungf kungf commented Mar 17, 2020

in #30 forgetinode was changed into inline ServerOps, this may solove
memory oom, but the performance of rm op will be very slow, and it will
also hang other ops, so i think add a goroutine pool to limit the max
num of forgetinode goroutines, and it will not affect the performance

@kungf
Copy link
Contributor Author

kungf commented Mar 17, 2020

hi @stapelberg do you have some advise?

Copy link
Collaborator

@stapelberg stapelberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you done benchmarks for this? Can you share the results?

I’m asking because spawning goroutines is actually not necessarily beneficial. In fact, when I did my last measurements, it was more performant to never spawn a goroutine.

@@ -21,6 +21,7 @@ import (

"github.com/jacobsa/fuse"
"github.com/jacobsa/fuse/fuseops"
"github.com/panjf2000/ants/v2"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not keen on adding a new dependency just for this change. If we decide to go forward with this change, we should implement it without extra dependencies. Check out sourcegraph/gophercon-2018-liveblog#35 for inspiration

@@ -93,6 +94,12 @@ type fileSystemServer struct {
opsInFlight sync.WaitGroup
}

type opCtx struct {
c *fuse.Connection
ctx context.Context
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Embedding context.Context in structs is an anti-pattern: https://groups.google.com/forum/#!topic/golang-nuts/xRbzq8yzKWI

@kahing
Copy link
Contributor

kahing commented Mar 17, 2020

I don't think 100000 is the right number. Either we allocate all of them at startup which wastes 200MB for most cases, or we don't allocate them at startup in which case you have OOM during a storm of ForgetInode.

Using a very small pool maybe ok, note that implementation may have to lock anyway so having that many forget goroutines running isn't useful

@kungf
Copy link
Contributor Author

kungf commented Mar 18, 2020

I don't think 100000 is the right number. Either we allocate all of them at startup which wastes 200MB for most cases, or we don't allocate them at startup in which case you have OOM during a storm of ForgetInode.

hi, @kahing the 10w is the capacity, will not allocate until used, and if the goroutine idle too long time, it will be recycled!

@kungf kungf force-pushed the multi_forgetinode branch from 7dd5564 to 8fc09b8 Compare March 18, 2020 09:17
@kungf
Copy link
Contributor Author

kungf commented Mar 18, 2020

Have you done benchmarks for this? Can you share the results?

you can see from the follow result, 4 tasks,1000files, the "File removal" performance up from 22 to 5175!
Before modify

./mdtest "-n" "1000" "-u" "-z" "2" "-i" "3" "-F" "-d" "/home/wangyang5/polefs/mnt"
Path: /home/wangyang5/polefs
FS: 50.0 GiB   Used FS: 37.9%   Inodes: 25.0 Mi   Used Inodes: 0.8%

4 tasks, 3996 files

SUMMARY rate: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation     :       4530.794       4530.784       4530.788          0.004
   File stat         :     972973.342     972860.389     972935.691         53.246
   File read         :      17802.887      17802.792      17802.849          0.041
   File removal      :         22.489         22.489         22.489          0.000
   Tree creation     :        597.252        424.639        532.511         76.785
   Tree removal      :         58.258         16.345         35.345         17.333

-- finished at 03/18/2020 17:12:34 --

After modify

Command line used: ./mdtest "-n" "1000" "-u" "-z" "2" "-i" "3" "-F" "-d" "/home/wangyang5/polefs/mnt"
Path: /home/wangyang5/polefs
FS: 50.0 GiB   Used FS: 37.9%   Inodes: 25.0 Mi   Used Inodes: 0.8%

4 tasks, 3996 files

SUMMARY rate: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation     :       4469.286       4469.280       4469.283          0.002
   File stat         :    1013572.737    1012898.941    1013286.855        284.380
   File read         :      17506.221      17506.093      17506.136          0.060
   File removal      :       5175.368       5175.358       5175.363          0.004
   Tree creation     :        579.831        540.689        565.124         17.397
   Tree removal      :        447.424        427.932        435.875          8.356

-- finished at 03/18/2020 17:01:36 --

@kungf kungf changed the title use goroutine pool to concurrent forgetinode improve file delete performance Mar 18, 2020
in jacobsa#30 forgetinode was changed into inline ServerOps, this may solove
memory oom, but the performance of rm op will be very slow, and it will
also hang other ops, so i think limit the max num of forgetinode
goroutines, this can avoid oom but not affect performance
@kungf kungf force-pushed the multi_forgetinode branch from 8fc09b8 to 4b07f0d Compare March 18, 2020 09:35
@kungf kungf changed the title improve file delete performance use as many goroutines for ForgetInode op Mar 18, 2020
@stapelberg
Copy link
Collaborator

Can you share the results of that same benchmark, but with a goroutine pool of only 2 goroutines please?

@kungf
Copy link
Contributor Author

kungf commented Mar 19, 2020

2 goroutines

./mdtest "-n" "1000" "-u" "-z" "2" "-i" "3" "-F" "-d" "/home/wangyang5/polefs/mnt"
Path: /home/wangyang5/polefs
FS: 50.0 GiB   Used FS: 37.8%   Inodes: 25.0 Mi   Used Inodes: 0.8%

4 tasks, 3996 files

SUMMARY rate: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation     :       4578.542       4578.532       4578.538          0.004
   File stat         :     971056.708     970550.627     970813.018        207.029
   File read         :      18393.587      18393.405      18393.500          0.074
   File removal      :         46.243         46.243         46.243          0.000
   Tree creation     :        653.828        461.589        564.816         79.122
   Tree removal      :         71.337         27.775         45.803         18.560

100 goroutines

   ./mdtest "-n" "1000" "-u" "-z" "2" "-i" "3" "-F" "-d" "/home/wangyang5/polefs/mnt"
Path: /home/wangyang5/polefs
FS: 50.0 GiB   Used FS: 37.9%   Inodes: 25.0 Mi   Used Inodes: 0.8%

4 tasks, 3996 files

SUMMARY rate: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation     :       4580.740       4580.723       4580.733          0.008
   File stat         :     990979.648     990628.216     990842.974        153.729
   File read         :      18073.165      18073.068      18073.120          0.040
   File removal      :       2281.023       2281.021       2281.022          0.001
   Tree creation     :        624.400        446.060        558.440         79.864
   Tree removal      :        457.793        438.781        445.753          8.549

@riking
Copy link
Contributor

riking commented Jul 5, 2020

What we're seeing here is that the kernel will parallelize dispatch of operations that have no logical dependencies on each other. The test is looking for throughput of removals, while @stapelberg's optimization was looking for latency of close(2).

A better solution may be to keep a goroutine pool for all operations instead of spinning them up as requests come in, and then spinning up fresh goroutines only under high load if all pool goroutines are busy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants