Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce mutex contention #2826

Closed
wants to merge 1 commit into from
Closed

Conversation

behlendorf
Copy link
Contributor

Due to evidence of contention both the buf_hash_table and the
dbuf_hash_table sizes have been increased from 256 to 8192.

Signed-off-by: Brian Behlendorf behlendorf1@llnl.gov
Issue #1291

Due to evidence of contention both the buf_hash_table and the
dbuf_hash_table sizes have been increased from 256 to 8192.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#1291
@behlendorf behlendorf changed the title Reduce dbuf_find() mutex contention Reduce mutex contention Oct 23, 2014
@kernelOfTruth
Copy link
Contributor

this might be too much for smaller servers and/or boxes running in mirror mode and having mounted an additional zpool:

e.g.

cat /proc/meminfo 
MemTotal:       32897764 kB
MemFree:         4821128 kB
MemAvailable:    6059316 kB
Buffers:            3976 kB
Cached:           762492 kB
SwapCached:            0 kB
Active:          1411936 kB
Inactive:         398904 kB
Active(anon):    1070260 kB
Inactive(anon):   119704 kB
Active(file):     341676 kB
Inactive(file):   279200 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:      131264500 kB
SwapFree:       131264500 kB
Dirty:                40 kB
Writeback:             0 kB
AnonPages:       1044508 kB
Mapped:           312236 kB
Shmem:            145592 kB
Slab:           16964204 kB
SReclaimable:    1108784 kB
SUnreclaim:     15855420 kB
KernelStack:       18672 kB
PageTables:        19688 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    157582708 kB
Committed_AS:    3145688 kB
VmallocTotal:   34359738367 kB
VmallocUsed:     1445320 kB
VmallocChunk:   34358235104 kB
HardwareCorrupted:     0 kB
AnonHugePages:    249856 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      189316 kB
DirectMap2M:     4990976 kB
DirectMap1G:    28311552 kB
free -m
             total       used       free     shared    buffers     cached
Mem:         32126      27780       4346        140          3        743
-/+ buffers/cache:      27033       5093
Swap:       128187          0     128187

this is with running on Btrfs as an system volume,

and the following zpools:

zpool iostat -v
                     capacity     operations    bandwidth
pool              alloc   free   read  write   read  write
----------------  -----  -----  -----  -----  -----  -----
WD30EFRX          1.66T  1.00T     17     23   354K   115K
  mirror          1.66T  1.00T     17     23   354K   115K
    wd30efrx_002      -      -      8      6   185K   171K
    wd30efrx          -      -      8      6   185K   171K
cache                 -      -      -      -      -      -
  intelSSD180      108G   285M     95      2  5.17M   259K
----------------  -----  -----  -----  -----  -----  -----
WD40EFRX          2.65T   995G    155     74   395K  8.98M
  WD40EFRX        2.65T   995G    155     74   395K  8.98M
----------------  -----  -----  -----  -----  -----  -----

if I hadn't upgraded this box to 32 GiB & running with swap it probably (anyway) would be swapping a lot

since BUF_LOCKS isn't used with #2672 but MIN_BUF_LOCKS
I don't know how this scales or how much more memory could be used in total

the relevant code snippet in module/zfs/arc.c is:

/*
 * We want to allocate one hash lock for every 4GB of memory with a minimum
 * of MIN_BUF_LOCKS.
 */
uint64_t zfs_arc_ht_lock_shift = 32;
#define MIN_BUF_LOCKS   8192

@kernelOfTruth
Copy link
Contributor

output of slabtop:

 Active / Total Objects (% used)    : 36854605 / 37092738 (99.4%)
 Active / Total Slabs (% used)      : 1290266 / 1290266 (100.0%)
 Active / Total Caches (% used)     : 177 / 281 (63.0%)
 Active / Total Size (% used)       : 16738871.48K / 16821289.12K (99.5%)
 Minimum / Average / Maximum Object : 0.01K / 0.45K / 16.00K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME                   
5626593 5626529  99%    0.19K 267933       21   1071732K dentry
5458096 5458096 100%    0.28K 194932       28   1559456K dmu_buf_impl_t
5107456 5106844  99%    0.50K 159608       32   2553728K zio_buf_512
5043588 5043588 100%    0.83K 132726       38   4247232K dnode_t
5037184 5037184 100%    0.09K 109504       46    438016K sa_cache
3774656 3774137  99%    0.06K  58979       64    235916K kmalloc-64
3247944 3114560  95%    0.09K  77332       42    309328K kmalloc-96
1856412 1856331  99%    0.29K  68756       27    550048K arc_buf_hdr_t
476289 420624  88%    0.08K   9339       51     37356K arc_buf_t
364256 364165  99%    0.12K  11383       32     45532K kmalloc-128
265192 264397  99%   16.00K 132596        2   4243072K zio_buf_16384
209328 191395  91%    0.19K   9968       21     39872K kmalloc-192
160040 160018  99%    8.00K  40010        4   1280320K kmalloc-8192
 40560  21727  53%    2.00K   2535       16     81120K kmalloc-2048
 39304  39304 100%    0.12K   1156       34      4624K kernfs_node_cache
 34560  34560 100%    0.03K    270      128      1080K kmalloc-32
 23207  22708  97%    0.17K   1009       23      4036K vm_area_struct
 21760  21760 100%    0.06K    340       64      1360K anon_vma_chain
 20640  19724  95%    1.00K    645       32     20640K zio_buf_1024
 16821  14999  89%    1.50K    801       21     25632K zio_buf_1536
 16716  16716 100%    0.55K    597       28      9552K inode_cache
 15392  14265  92%    0.50K    481       32      7696K kmalloc-512
 14892  14892 100%    0.04K    146      102       584K l2arc_buf_hdr_t
 13504  13504 100%    0.06K    211       64       844K range_seg_cache
 11840  11840 100%    0.06K    185       64       740K iommu_iova
 11704  10844  92%    0.07K    209       56       836K anon_vma
 11517  11517 100%    0.96K    349       33     11168K btrfs_inode
 10276  10214  99%    0.57K    367       28      5872K radix_tree_node
  9175   9090  99%    0.62K    367       25      5872K proc_inode_cache
  9095   8755  96%    0.05K    107       85       428K zio_link_cache
  8992   8868  98%    2.00K    562       16     17984K zio_buf_2048
  8456   8456 100%    0.14K    302       28      1208K btrfs_extent_map
  8256   8063  97%    1.00K    258       32      8256K kmalloc-1024
  7968   7434  93%    0.25K    249       32      1992K filp
  7776   6764  86%    0.25K    243       32      1944K kmalloc-256
  7636   7636 100%    0.09K    166       46       664K btrfs_extent_state
  6348   6095  96%    2.50K    529       12     16928K zio_buf_2560
  5030   4582  91%    3.00K    503       10     16096K zio_buf_3072
  4592   4592 100%    0.07K     82       56       328K Acpi-Operand
  4473   3673  82%    0.19K    213       21       852K cred_jar
  3780   3549  93%    3.50K    420        9     13440K zio_buf_3584
  3650   3482  95%    0.62K    146       25      2336K shmem_inode_cache
  3625   3625 100%    0.27K    125       29      1000K btrfs_extent_buffer
  3159   3159 100%    0.10K     81       39       324K buffer_head
  3060   3060 100%    0.04K     30      102       120K Acpi-Namespace
  2304   2304 100%    0.25K     72       32       576K bio-0
  2295   2295 100%    0.05K     27       85       108K ftrace_event_field
  2280   2235  98%    5.00K    380        6     12160K zio_buf_5120

@kernelOfTruth
Copy link
Contributor

test with 3 pools in total:

zpool list 
NAME         SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
HGST5K4000  3.56T  2.47T  1.09T         -     0%    69%  1.00x  ONLINE  -
WD30EFRX    2.66T  1.66T  1.00T         -     2%    62%  1.00x  ONLINE  -
WD40EFRX    3.62T  2.65T   995G         -    30%    73%  1.00x  ONLINE  -
echo "0" > /sys/module/spl/parameters/spl_kmem_cache_reclaim

echo "16384" > /sys/module/spl/parameters/spl_kmem_cache_slab_limit

echo "5" > /sys/module/zfs/parameters/zfs_arc_shrink_shift

echo "0x100000000" > /sys/module/zfs/parameters/zfs_arc_max 
# 0x100000000 == 4 GB

cat /sys/module/zfs/parameters/zfs_arc_max
4294967296

maximum slab usage during rsync backup while all 3 zpools are imported:

cat /proc/meminfo 
MemTotal:       32897764 kB
MemFree:         1896132 kB
MemAvailable:    2956724 kB
Buffers:            1036 kB
Cached:           332304 kB
SwapCached:            0 kB
Active:           624780 kB
Inactive:         193748 kB
Active(anon):     521188 kB
Inactive(anon):    90308 kB
Active(file):     103592 kB
Inactive(file):   103440 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:      131264500 kB
SwapFree:       131252272 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:        485364 kB
Mapped:           169432 kB
Shmem:            126308 kB
Slab:           21374820 kB
SReclaimable:    1284724 kB
SUnreclaim:     20090096 kB
KernelStack:       20672 kB
PageTables:        14804 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    157582708 kB
Committed_AS:    2055392 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      830008 kB
VmallocChunk:   34358196208 kB
HardwareCorrupted:     0 kB
AnonHugePages:    147456 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      189316 kB
DirectMap2M:     4990976 kB
DirectMap1G:    28311552 kB

not sure if the same behavior would have been shown without #2672 but in my opinion it's better to be pro-active about this issue for the (near ?) future rather than running into problems later

hope this posted information is useful somehow ...

@behlendorf
Copy link
Contributor Author

@kernelOfTruth Thanks for pointing out #2672. Since I haven't looked carefully at those changes yet I wasn't aware they made a similar tuning. Doing it dynamically as was done in #2672 is clearly a better solution. It's a shame that change wasn't split from the other persistent L2ARC changes, it would have made it easier to review and merge.

As for the memory foot print of this smaller change it's not huge but definitely something to be aware of. This resizes the hash table so it's a fixed cost of about an additional 0.5 MB.

The goal from the original issue was to reduce contention of these locks which should help performance for some workloads. However, until we actually have some hard data showing an actually reduction in lock contention I'm in no rush to merge this. I mainly filed it so we wouldn't loose track of it.

@behlendorf
Copy link
Contributor Author

Merged as:

b31d8ea Reduce buf/dbuf mutex contention

@behlendorf behlendorf closed this Nov 14, 2014
@behlendorf behlendorf deleted the issue-1291 branch April 19, 2021 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Performance Performance improvement or performance problem
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants