-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix free memory calculation on v3.14+ #7170
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thank you for staying on top of this one and making sure it's really fixed.
#include <linux/vmstat.h> | ||
],[ | ||
(void) global_zone_page_state(0); | ||
],[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: spaces accidentally used here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh! Copied from config/kernel-vm_node_stat.m4
. I'll fix mine once buildbots have done their thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to fix both when you refresh this. At some point we should extend to style checker to cover the m4 files.
...speaking of buildbots, style check seemingly doesn't like the "unbalanced" braces due to multiple alternate closing lines:
Would this be the preferred style?
|
...nope, stylecheck doesn't like the closing parentheses on a line by themselves (which is a shame because I'm partial to that style). Which seems to leave us with the verbose but stylecheck clean:
|
My editor really isn't happy with it either. I'd suggest we wrap the entire statement like this, it's a little more verbose but more readable. diff --git a/module/zfs/arc.c b/module/zfs/arc.c
index 7aa221b..d830eb3 100644
--- a/module/zfs/arc.c
+++ b/module/zfs/arc.c
@@ -4700,10 +4700,17 @@ arc_free_memory(void)
return (ptob(si.freeram - si.freehigh));
#else
#ifdef ZFS_GLOBAL_NODE_PAGE_STATE
+#ifdef ZFS_GLOBAL_ZONE_PAGE_STATE
return (ptob(nr_free_pages() +
global_node_page_state(NR_INACTIVE_FILE) +
global_node_page_state(NR_INACTIVE_ANON) +
- global_node_page_state(NR_SLAB_RECLAIMABLE)));
+ global_zone_page_state(NR_SLAB_RECLAIMABLE)));
+#else
+ return (ptob(nr_free_pages() +
+ global_node_page_state(NR_INACTIVE_FILE) +
+ global_node_page_state(NR_INACTIVE_ANON) +
+ global_page_state(NR_SLAB_RECLAIMABLE)));
+#endif /* ZFS_GLOBAL_ZONE_PAGE_STATE */
#else
return (ptob(nr_free_pages() +
global_page_state(NR_INACTIVE_FILE) + |
Looks good, cleaner than my version, I'll include it next push. I had the |
I'd suggest leaving them out, personally I think it's pretty clear without them. Even more so now that it's wrapped in |
Like so?
|
599e398
to
e81e5e7
Compare
Looks good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch, thanks!
e81e5e7
to
570b523
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chrisrd Thank you for the PR!
I was looking at enum zone_stat_item
and enum node_stat_item
in the Linux kernel for various tags. I'm noticing that at some point NR_SLAB_RECLAIMABLE
moves from being part of enum zone_stat_item
to being a part of enum node_stat_item
.
In v4.12 (and prior), NR_SLAB_RECLAIMABLE
is part of enum zone_stat_item
as seen here.
In v4.13, NR_SLAB_RECLAIMABLE
was moved to enum node_stat_item
as seen here. The actual move occurred in commit torvalds/linux@385386c.
As of v4.16-rc1 NR_SLAB_RECLAIMABLE
is still part of enum node_stat_item
as seen here.
In v4.13 (and prior), the prototype for global_page_state
is static inline unsigned long global_page_state(enum zone_stat_item item)
as seen here.
In v4.14, the global_zone_page_state
rename occurs and it has a prototype of static inline unsigned long global_zone_page_state(enum zone_stat_item item)
as seen here.
In v4.12 and above, the prototype for global_node_page_state
is static inline unsigned long global_node_page_state(enum node_stat_item item)
as seen here.
Long story short, what I'm worried about (and this could be a lack of understanding on my part) is that for kernel version 4.13 and above, it appears that we should be using global_node_page_state(NR_SLAB_RECLAIMABLE)
based on the enum definitions.
A reasonable solution might be to break this up into 3 cases and use the right function/enums for each. The 3 cases are:
- v4.12 and older
- v4.13
- v4.14 and above
module/zfs/arc.c
Outdated
return (ptob(nr_free_pages() + | ||
global_node_page_state(NR_INACTIVE_FILE) + | ||
global_node_page_state(NR_INACTIVE_ANON) + | ||
global_node_page_state(NR_SLAB_RECLAIMABLE))); | ||
global_page_state(NR_SLAB_RECLAIMABLE))); | ||
#endif /* ZFS_GLOBAL_ZONE_PAGE_STATE */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please triple check this but I believe we should be instead checking for the change of NR_SLAB_RECLAIMABLE from a zone_stat_item to a node_stat_item. In fact we never need to use the global_zone_page_state
interface since by the time it was added NR_SLAB_RECLAIMABLE
was moved to the node.
#if defined(ZFS_GLOBAL_NODE_PAGE_STATE) && \
defined(ZFS_NODE_STAT_NR_SLAB_RECLAIMABLE)
/* 4.13+ API */
return (ptob(nr_free_pages() +
global_node_page_state(NR_INACTIVE_FILE) +
global_node_page_state(NR_INACTIVE_ANON) +
global_node_page_state(NR_SLAB_RECLAIMABLE)));
#elif defined(ZFS_GLOBAL_NODE_PAGE_STATE)
/* 4.9 - 4.12 API */
return (ptob(nr_free_pages() +
global_node_page_state(NR_INACTIVE_FILE) +
global_node_page_state(NR_INACTIVE_ANON) +
global_page_state(NR_SLAB_RECLAIMABLE)));
#else
/* x - 4.8 API */
return (ptob(nr_free_pages() +
global_page_state(NR_INACTIVE_FILE) +
global_page_state(NR_INACTIVE_ANON) +
global_page_state(NR_SLAB_RECLAIMABLE)));
#endif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@behlendorf That looks correct to me.
Oh wow, it's even worse than I imagined, even with my spelunking through the history. Looking at... |
I agree something like However I'm a bit stumped as to how to create config tests for these. The basic issue is, we're needing to detect where we're using the wrong enum type in a function call. Unfortunately, in standard C, one enum type isn't distinguished from another.
So at this point the only way I've found is to use But can we reasonably require If so, is it possible to have Or is there another way? |
Incidentally, I found this useful to see if there were any other instances of these things moving around in recent times (there aren't ...yet):
|
I was looking at the gcc warnings yesterday and ran into
That would force a comparison between enum values and gcc would warn if there’s a bad comparison there? |
@chrisrd @dinatale2 I don't see a good way to do this either, and we don't want to bring in any additional dependencies. How about opting for a more primitive approach. Instead of invoking gcc and keying off a compiler warning/error, we could just match the pattern in the header. Something like this.
We should check for |
This might explain a lot of lost hours of sleep with memory loaded scst systems inexplicably buying the farm when the kernel OOMs. Thank you! |
570b523
to
524d561
Compare
New patch pushed, with full auto-configure for global page state API changes. |
ba8a223
to
3f2a65c
Compare
OK, I'm officially perplexed about the buildbots... Thus far, with "7 failing, 2 pending, and 14 successful checks", all the What's the difference between the two? Some and maybe all of the
...but the make fails like:
which comes out of Any ideas? And
...no idea what that's about either. |
Same thing on my side, building using autogen.sh + configure + make works, |
configure:26815: checking whether enum node_stat_item contains NR_SLAB_RECLAIMABLE This should give you a clue |
Thanks - I'll look after those comments. What do you think of current version vs the "pure autoconf" style for Worthwhile changing to the pure version? |
Yes, please do. I like the pure version much better and since you've already done the work let's include it when you refresh this. |
eb70713
to
757658a
Compare
Refreshed! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for tackling this bit of insanity!
Is anyone looking after a patch stack for any forthcoming zfs-0.7.7? This definitely needs to go in... |
@chrisrd yes, and I've already adding this PR to the project. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the effort on this PR!
@chrisrd if you can rebase this one last time on master to resolve the new |
Provide infrastructure to auto-configure to enum and API changes in the global page stats used for our free memory calculations. arc_free_memory has been broken since an API change in Linux v3.14: 2016-07-28 v4.8 599d0c95 mm, vmscan: move LRU lists to node 2016-07-28 v4.8 75ef7184 mm, vmstat: add infrastructure for per-node vmstats These commits moved some of global_page_state() into global_node_page_state(). The API change was particularly egregious as, instead of breaking the old code, it silently did the wrong thing and we continued using global_page_state() where we should have been using global_node_page_state(), thus indexing into the wrong array via NR_SLAB_RECLAIMABLE et al. There have been further API changes along the way: 2017-07-06 v4.13 385386cf mm: vmstat: move slab statistics from zone to node counters 2017-09-06 v4.14 c41f012a mm: rename global_page_state to global_zone_page_state ...and various (incomplete, as it turns out) attempts to accomodate these changes in ZoL: 2017-08-24 2209e40 Linux 4.8+ compatibility fix for vm stats 2017-09-16 787acae Linux 3.14 compat: IO acct, global_page_state, etc 2017-09-19 661907e Linux 4.14 compat: IO acct, global_page_state, etc The config infrastructure provided here resolves these issues going back to the original API change in v3.14 and is robust against further Linux changes in this area. Signed-off-by: Chris Dunlop <chris@onthe.net.au>
757658a
to
9c2fe75
Compare
Done! |
Provide infrastructure to auto-configure to enum and API changes in the global page stats used for our free memory calculations. arc_free_memory has been broken since an API change in Linux v3.14: 2016-07-28 v4.8 599d0c95 mm, vmscan: move LRU lists to node 2016-07-28 v4.8 75ef7184 mm, vmstat: add infrastructure for per-node vmstats These commits moved some of global_page_state() into global_node_page_state(). The API change was particularly egregious as, instead of breaking the old code, it silently did the wrong thing and we continued using global_page_state() where we should have been using global_node_page_state(), thus indexing into the wrong array via NR_SLAB_RECLAIMABLE et al. There have been further API changes along the way: 2017-07-06 v4.13 385386cf mm: vmstat: move slab statistics from zone to node counters 2017-09-06 v4.14 c41f012a mm: rename global_page_state to global_zone_page_state ...and various (incomplete, as it turns out) attempts to accomodate these changes in ZoL: 2017-08-24 2209e40 Linux 4.8+ compatibility fix for vm stats 2017-09-16 787acae Linux 3.14 compat: IO acct, global_page_state, etc 2017-09-19 661907e Linux 4.14 compat: IO acct, global_page_state, etc The config infrastructure provided here resolves these issues going back to the original API change in v3.14 and is robust against further Linux changes in this area. Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Chris Dunlop <chris@onthe.net.au> Closes openzfs#7170
Provide infrastructure to auto-configure to enum and API changes in the global page stats used for our free memory calculations. arc_free_memory has been broken since an API change in Linux v3.14: 2016-07-28 v4.8 599d0c95 mm, vmscan: move LRU lists to node 2016-07-28 v4.8 75ef7184 mm, vmstat: add infrastructure for per-node vmstats These commits moved some of global_page_state() into global_node_page_state(). The API change was particularly egregious as, instead of breaking the old code, it silently did the wrong thing and we continued using global_page_state() where we should have been using global_node_page_state(), thus indexing into the wrong array via NR_SLAB_RECLAIMABLE et al. There have been further API changes along the way: 2017-07-06 v4.13 385386cf mm: vmstat: move slab statistics from zone to node counters 2017-09-06 v4.14 c41f012a mm: rename global_page_state to global_zone_page_state ...and various (incomplete, as it turns out) attempts to accomodate these changes in ZoL: 2017-08-24 2209e40 Linux 4.8+ compatibility fix for vm stats 2017-09-16 787acae Linux 3.14 compat: IO acct, global_page_state, etc 2017-09-19 661907e Linux 4.14 compat: IO acct, global_page_state, etc The config infrastructure provided here resolves these issues going back to the original API change in v3.14 and is robust against further Linux changes in this area. Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Chris Dunlop <chris@onthe.net.au> Closes openzfs#7170
This is a squashed patchset for zfs-0.7.7. The individual commits are in the tonyhutter:zfs-0.7.7-hutter branch. I squashed the commits so that buildbot wouldn't have to run against each one, and because github/builbot seem to have a maximum limit of 30 commits they can test from a PR. - Linux 4.16 compat: get_disk_and_module() openzfs#7264 - Change checksum & IO delay ratelimit values openzfs#7252 - Increment zil_itx_needcopy_bytes properly openzfs#6988 openzfs#7176 - Fix some typos openzfs#7237 - Fix zpool(8) list example to match actual format openzfs#7244 - Add SMART self-test results to zpool status -c openzfs#7178 - Add scrub after resilver zed script openzfs#4662 openzfs#7086 - Fix free memory calculation on v3.14+ openzfs#7170 - Report duration and error in mmp_history entries openzfs#7190 - Do not initiate MMP writes while pool is suspended openzfs#7182 - Linux 4.16 compat: use correct *_dec_and_test() openzfs#7179 openzfs#7211 - Allow modprobe to fail when called within systemd openzfs#7174 - Add SMART attributes for SSD and NVMe openzfs#7183 openzfs#7193 - Correct count_uberblocks in mmp.kshlib openzfs#7191 - Fix config issues: frame size and headers openzfs#7169 - Clarify zinject(8) explanation of -e openzfs#7172 - OpenZFS 8857 - zio_remove_child() panic due to already destroyed parent zio openzfs#7168 - 'zfs receive' fails with "dataset is busy" openzfs#7129 openzfs#7154 - contrib/initramfs: add missing conf.d/zfs openzfs#7158 - mmp should use a fixed tag for spa_config locks openzfs#6530 openzfs#7155 - Handle zap_add() failures in mixed case mode openzfs#7011 openzfs#7054 - Fix zdb -ed on objset for exported pool openzfs#7099 openzfs#6464 - Fix zdb -E segfault openzfs#7099 - Fix zdb -R decompression openzfs#7099 openzfs#4984 - Fix racy assignment of zcb.zcb_haderrors openzfs#7099 - Fix zle_decompress out of bound access openzfs#7099 - Fix zdb -c traverse stop on damaged objset root openzfs#7099 - Linux 4.11 compat: avoid refcount_t name conflict openzfs#7148 - Linux 4.16 compat: inode_set_iversion() openzfs#7148 - OpenZFS 8966 - Source file zfs_acl.c, function zfs_aclset_common contains a use after end of the lifetime of a local variable openzfs#7141 - Remove deprecated zfs_arc_p_aggressive_disable openzfs#7135 - Fix default libdir for Debian/Ubuntu openzfs#7083 openzfs#7101 - Bug fix in qat_compress.c for vmalloc addr check openzfs#7125 - Fix systemd_ RPM macros usage on Debian-based distributions openzfs#7074 openzfs#7100 - Emit an error message before MMP suspends pool openzfs#7048 - ZTS: Fix create-o_ashift test case openzfs#6924 openzfs#6977 - Fix --with-systemd on Debian-based distributions (openzfs#6963) openzfs#6591 openzfs#6963 - Remove vn_rename and vn_remove dependency openzfs/spl#648 openzfs#6753 - Add support for "--enable-code-coverage" option openzfs#6670 - Make "-fno-inline" compile option more accessible openzfs#6605 - Add configure option to enable gcov analysis openzfs#6642 - Implement --enable-debuginfo to force debuginfo openzfs#2734 - Make --enable-debug fail when given bogus args openzfs#2734 Signed-off-by: Tony Hutter <hutter2@llnl.gov> Requires-spl: refs/pull/690/head
Yeah, I pulled the 0.7.7 diff and the script IS there, but not executable - currently using scriptage in our build system to chmod+x it after the diff application. |
Provide infrastructure to auto-configure to enum and API changes in the global page stats used for our free memory calculations. arc_free_memory has been broken since an API change in Linux v3.14: 2016-07-28 v4.8 599d0c95 mm, vmscan: move LRU lists to node 2016-07-28 v4.8 75ef7184 mm, vmstat: add infrastructure for per-node vmstats These commits moved some of global_page_state() into global_node_page_state(). The API change was particularly egregious as, instead of breaking the old code, it silently did the wrong thing and we continued using global_page_state() where we should have been using global_node_page_state(), thus indexing into the wrong array via NR_SLAB_RECLAIMABLE et al. There have been further API changes along the way: 2017-07-06 v4.13 385386cf mm: vmstat: move slab statistics from zone to node counters 2017-09-06 v4.14 c41f012a mm: rename global_page_state to global_zone_page_state ...and various (incomplete, as it turns out) attempts to accomodate these changes in ZoL: 2017-08-24 2209e40 Linux 4.8+ compatibility fix for vm stats 2017-09-16 787acae Linux 3.14 compat: IO acct, global_page_state, etc 2017-09-19 661907e Linux 4.14 compat: IO acct, global_page_state, etc The config infrastructure provided here resolves these issues going back to the original API change in v3.14 and is robust against further Linux changes in this area. Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Chris Dunlop <chris@onthe.net.au> Closes openzfs#7170
This is a squashed patchset for zfs-0.7.7. The individual commits are in the tonyhutter:zfs-0.7.7-hutter branch. I squashed the commits so that buildbot wouldn't have to run against each one, and because github/builbot seem to have a maximum limit of 30 commits they can test from a PR. - Fix MMP write frequency for large pools openzfs#7205 openzfs#7289 - Handle zio_resume and mmp => off openzfs#7286 - Fix zfs-kmod builds when using rpm >= 4.14 openzfs#7284 - zdb and inuse tests don't pass with real disks openzfs#6939 openzfs#7261 - Take user namespaces into account in policy checks openzfs#6800 openzfs#7270 - Detect long config lock acquisition in mmp openzfs#7212 - Linux 4.16 compat: get_disk_and_module() openzfs#7264 - Change checksum & IO delay ratelimit values openzfs#7252 - Increment zil_itx_needcopy_bytes properly openzfs#6988 openzfs#7176 - Fix some typos openzfs#7237 - Fix zpool(8) list example to match actual format openzfs#7244 - Add SMART self-test results to zpool status -c openzfs#7178 - Add scrub after resilver zed script openzfs#4662 openzfs#7086 - Fix free memory calculation on v3.14+ openzfs#7170 - Report duration and error in mmp_history entries openzfs#7190 - Do not initiate MMP writes while pool is suspended openzfs#7182 - Linux 4.16 compat: use correct *_dec_and_test() - Allow modprobe to fail when called within systemd openzfs#7174 - Add SMART attributes for SSD and NVMe openzfs#7183 openzfs#7193 - Correct count_uberblocks in mmp.kshlib openzfs#7191 - Fix config issues: frame size and headers openzfs#7169 - Clarify zinject(8) explanation of -e openzfs#7172 - OpenZFS 8857 - zio_remove_child() panic due to already destroyed parent zio openzfs#7168 - 'zfs receive' fails with "dataset is busy" openzfs#7129 openzfs#7154 - contrib/initramfs: add missing conf.d/zfs openzfs#7158 - mmp should use a fixed tag for spa_config locks openzfs#6530 openzfs#7155 - Handle zap_add() failures in mixed case mode openzfs#7011 openzfs#7054 - Fix zdb -ed on objset for exported pool openzfs#7099 openzfs#6464 - Fix zdb -E segfault openzfs#7099 - Fix zdb -R decompression openzfs#7099 openzfs#4984 - Fix racy assignment of zcb.zcb_haderrors openzfs#7099 - Fix zle_decompress out of bound access openzfs#7099 - Fix zdb -c traverse stop on damaged objset root openzfs#7099 - Linux 4.11 compat: avoid refcount_t name conflict openzfs#7148 - Linux 4.16 compat: inode_set_iversion() openzfs#7148 - OpenZFS 8966 - Source file zfs_acl.c, function zfs_aclset_common contains a use after end of the lifetime of a local variable openzfs#7141 - Remove deprecated zfs_arc_p_aggressive_disable openzfs#7135 - Fix default libdir for Debian/Ubuntu openzfs#7083 openzfs#7101 - Bug fix in qat_compress.c for vmalloc addr check openzfs#7125 - Fix systemd_ RPM macros usage on Debian-based distributions openzfs#7074 openzfs#7100 - Emit an error message before MMP suspends pool openzfs#7048 - ZTS: Fix create-o_ashift test case openzfs#6924 openzfs#6977 - Fix --with-systemd on Debian-based distributions (openzfs#6963) openzfs#6591 openzfs#6963 - Remove vn_rename and vn_remove dependency openzfs/spl#648 openzfs#6753 - Add support for "--enable-code-coverage" option openzfs#6670 - Make "-fno-inline" compile option more accessible openzfs#6605 - Add configure option to enable gcov analysis openzfs#6642 - Implement --enable-debuginfo to force debuginfo openzfs#2734 - Make --enable-debug fail when given bogus args openzfs#2734 Signed-off-by: Tony Hutter <hutter2@llnl.gov> Requires-spl: refs/pull/690/head
This is a squashed patchset for zfs-0.7.7. The individual commits are in the tonyhutter:zfs-0.7.7-hutter branch. I squashed the commits so that buildbot wouldn't have to run against each one, and because github/builbot seem to have a maximum limit of 30 commits they can test from a PR. - Fix MMP write frequency for large pools openzfs#7205 openzfs#7289 - Handle zio_resume and mmp => off openzfs#7286 - Fix zfs-kmod builds when using rpm >= 4.14 openzfs#7284 - zdb and inuse tests don't pass with real disks openzfs#6939 openzfs#7261 - Take user namespaces into account in policy checks openzfs#6800 openzfs#7270 - Detect long config lock acquisition in mmp openzfs#7212 - Linux 4.16 compat: get_disk_and_module() openzfs#7264 - Change checksum & IO delay ratelimit values openzfs#7252 - Increment zil_itx_needcopy_bytes properly openzfs#6988 openzfs#7176 - Fix some typos openzfs#7237 - Fix zpool(8) list example to match actual format openzfs#7244 - Add SMART self-test results to zpool status -c openzfs#7178 - Add scrub after resilver zed script openzfs#4662 openzfs#7086 - Fix free memory calculation on v3.14+ openzfs#7170 - Report duration and error in mmp_history entries openzfs#7190 - Do not initiate MMP writes while pool is suspended openzfs#7182 - Linux 4.16 compat: use correct *_dec_and_test() - Allow modprobe to fail when called within systemd openzfs#7174 - Add SMART attributes for SSD and NVMe openzfs#7183 openzfs#7193 - Correct count_uberblocks in mmp.kshlib openzfs#7191 - Fix config issues: frame size and headers openzfs#7169 - Clarify zinject(8) explanation of -e openzfs#7172 - OpenZFS 8857 - zio_remove_child() panic due to already destroyed parent zio openzfs#7168 - 'zfs receive' fails with "dataset is busy" openzfs#7129 openzfs#7154 - contrib/initramfs: add missing conf.d/zfs openzfs#7158 - mmp should use a fixed tag for spa_config locks openzfs#6530 openzfs#7155 - Handle zap_add() failures in mixed case mode openzfs#7011 openzfs#7054 - Fix zdb -ed on objset for exported pool openzfs#7099 openzfs#6464 - Fix zdb -E segfault openzfs#7099 - Fix zdb -R decompression openzfs#7099 openzfs#4984 - Fix racy assignment of zcb.zcb_haderrors openzfs#7099 - Fix zle_decompress out of bound access openzfs#7099 - Fix zdb -c traverse stop on damaged objset root openzfs#7099 - Linux 4.11 compat: avoid refcount_t name conflict openzfs#7148 - Linux 4.16 compat: inode_set_iversion() openzfs#7148 - OpenZFS 8966 - Source file zfs_acl.c, function zfs_aclset_common contains a use after end of the lifetime of a local variable openzfs#7141 - Remove deprecated zfs_arc_p_aggressive_disable openzfs#7135 - Fix default libdir for Debian/Ubuntu openzfs#7083 openzfs#7101 - Bug fix in qat_compress.c for vmalloc addr check openzfs#7125 - Fix systemd_ RPM macros usage on Debian-based distributions openzfs#7074 openzfs#7100 - Emit an error message before MMP suspends pool openzfs#7048 - ZTS: Fix create-o_ashift test case openzfs#6924 openzfs#6977 - Fix --with-systemd on Debian-based distributions (openzfs#6963) openzfs#6591 openzfs#6963 - Remove vn_rename and vn_remove dependency openzfs/spl#648 openzfs#6753 - Add support for "--enable-code-coverage" option openzfs#6670 - Make "-fno-inline" compile option more accessible openzfs#6605 - Add configure option to enable gcov analysis openzfs#6642 - Implement --enable-debuginfo to force debuginfo openzfs#2734 - Make --enable-debug fail when given bogus args openzfs#2734 Signed-off-by: Tony Hutter <hutter2@llnl.gov> Requires-spl: refs/pull/690/head
Provide infrastructure to auto-configure to enum and API changes in the global page stats used for our free memory calculations. arc_free_memory has been broken since an API change in Linux v3.14: 2016-07-28 v4.8 599d0c95 mm, vmscan: move LRU lists to node 2016-07-28 v4.8 75ef7184 mm, vmstat: add infrastructure for per-node vmstats These commits moved some of global_page_state() into global_node_page_state(). The API change was particularly egregious as, instead of breaking the old code, it silently did the wrong thing and we continued using global_page_state() where we should have been using global_node_page_state(), thus indexing into the wrong array via NR_SLAB_RECLAIMABLE et al. There have been further API changes along the way: 2017-07-06 v4.13 385386cf mm: vmstat: move slab statistics from zone to node counters 2017-09-06 v4.14 c41f012a mm: rename global_page_state to global_zone_page_state ...and various (incomplete, as it turns out) attempts to accomodate these changes in ZoL: 2017-08-24 2209e40 Linux 4.8+ compatibility fix for vm stats 2017-09-16 787acae Linux 3.14 compat: IO acct, global_page_state, etc 2017-09-19 661907e Linux 4.14 compat: IO acct, global_page_state, etc The config infrastructure provided here resolves these issues going back to the original API change in v3.14 and is robust against further Linux changes in this area. Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Chris Dunlop <chris@onthe.net.au> Closes openzfs#7170
This is a squashed patchset for zfs-0.7.7. The individual commits are in the tonyhutter:zfs-0.7.7-hutter branch. I squashed the commits so that buildbot wouldn't have to run against each one, and because github/builbot seem to have a maximum limit of 30 commits they can test from a PR. - Fix MMP write frequency for large pools openzfs#7205 openzfs#7289 - Handle zio_resume and mmp => off openzfs#7286 - Fix zfs-kmod builds when using rpm >= 4.14 openzfs#7284 - zdb and inuse tests don't pass with real disks openzfs#6939 openzfs#7261 - Take user namespaces into account in policy checks openzfs#6800 openzfs#7270 - Detect long config lock acquisition in mmp openzfs#7212 - Linux 4.16 compat: get_disk_and_module() openzfs#7264 - Change checksum & IO delay ratelimit values openzfs#7252 - Increment zil_itx_needcopy_bytes properly openzfs#6988 openzfs#7176 - Fix some typos openzfs#7237 - Fix zpool(8) list example to match actual format openzfs#7244 - Add SMART self-test results to zpool status -c openzfs#7178 - Add scrub after resilver zed script openzfs#4662 openzfs#7086 - Fix free memory calculation on v3.14+ openzfs#7170 - Report duration and error in mmp_history entries openzfs#7190 - Do not initiate MMP writes while pool is suspended openzfs#7182 - Linux 4.16 compat: use correct *_dec_and_test() - Allow modprobe to fail when called within systemd openzfs#7174 - Add SMART attributes for SSD and NVMe openzfs#7183 openzfs#7193 - Correct count_uberblocks in mmp.kshlib openzfs#7191 - Fix config issues: frame size and headers openzfs#7169 - Clarify zinject(8) explanation of -e openzfs#7172 - OpenZFS 8857 - zio_remove_child() panic due to already destroyed parent zio openzfs#7168 - 'zfs receive' fails with "dataset is busy" openzfs#7129 openzfs#7154 - contrib/initramfs: add missing conf.d/zfs openzfs#7158 - mmp should use a fixed tag for spa_config locks openzfs#6530 openzfs#7155 - Handle zap_add() failures in mixed case mode openzfs#7011 openzfs#7054 - Fix zdb -ed on objset for exported pool openzfs#7099 openzfs#6464 - Fix zdb -E segfault openzfs#7099 - Fix zdb -R decompression openzfs#7099 openzfs#4984 - Fix racy assignment of zcb.zcb_haderrors openzfs#7099 - Fix zle_decompress out of bound access openzfs#7099 - Fix zdb -c traverse stop on damaged objset root openzfs#7099 - Linux 4.11 compat: avoid refcount_t name conflict openzfs#7148 - Linux 4.16 compat: inode_set_iversion() openzfs#7148 - OpenZFS 8966 - Source file zfs_acl.c, function zfs_aclset_common contains a use after end of the lifetime of a local variable openzfs#7141 - Remove deprecated zfs_arc_p_aggressive_disable openzfs#7135 - Fix default libdir for Debian/Ubuntu openzfs#7083 openzfs#7101 - Bug fix in qat_compress.c for vmalloc addr check openzfs#7125 - Fix systemd_ RPM macros usage on Debian-based distributions openzfs#7074 openzfs#7100 - Emit an error message before MMP suspends pool openzfs#7048 - ZTS: Fix create-o_ashift test case openzfs#6924 openzfs#6977 - Fix --with-systemd on Debian-based distributions (openzfs#6963) openzfs#6591 openzfs#6963 - Remove vn_rename and vn_remove dependency openzfs/spl#648 openzfs#6753 - Fix "--enable-code-coverage" debug build openzfs#6674 - Update codecov.yml openzfs#6669 - Add support for "--enable-code-coverage" option openzfs#6670 - Make "-fno-inline" compile option more accessible openzfs#6605 - Add configure option to enable gcov analysis openzfs#6642 - Implement --enable-debuginfo to force debuginfo openzfs#2734 - Make --enable-debug fail when given bogus args openzfs#2734 Signed-off-by: Tony Hutter <hutter2@llnl.gov> Requires-spl: refs/pull/690/head
Provide infrastructure to auto-configure to enum and API changes in the global page stats used for our free memory calculations. arc_free_memory has been broken since an API change in Linux v3.14: 2016-07-28 v4.8 599d0c95 mm, vmscan: move LRU lists to node 2016-07-28 v4.8 75ef7184 mm, vmstat: add infrastructure for per-node vmstats These commits moved some of global_page_state() into global_node_page_state(). The API change was particularly egregious as, instead of breaking the old code, it silently did the wrong thing and we continued using global_page_state() where we should have been using global_node_page_state(), thus indexing into the wrong array via NR_SLAB_RECLAIMABLE et al. There have been further API changes along the way: 2017-07-06 v4.13 385386cf mm: vmstat: move slab statistics from zone to node counters 2017-09-06 v4.14 c41f012a mm: rename global_page_state to global_zone_page_state ...and various (incomplete, as it turns out) attempts to accomodate these changes in ZoL: 2017-08-24 2209e40 Linux 4.8+ compatibility fix for vm stats 2017-09-16 787acae Linux 3.14 compat: IO acct, global_page_state, etc 2017-09-19 661907e Linux 4.14 compat: IO acct, global_page_state, etc The config infrastructure provided here resolves these issues going back to the original API change in v3.14 and is robust against further Linux changes in this area. Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Chris Dunlop <chris@onthe.net.au> Closes #7170
So would this bug cause zfs_arc_max to be ignored and grow larger? I'm seeing this happen on my raspberry pi 3B+… |
Description
Fix free memory calculation on v3.14+
Motivation and Context
arc_free_memory()
has been broken since an API change in Linux v3.14:2016-07-28 599d0c95 mm, vmscan: move LRU lists to node
That moved some of
global_page_state()
intoglobal_node_page_state()
. The API change was particularly egregious as, instead of breaking the old code, it silently did the wrong thing and we continued usingglobal_page_state()
where we should have been usingglobal_node_page_state()
, thus indexing into the wrong array viaNR_SLAB_RECLAIMABLE
et al.This was nearly fixed in ZoL by:
2017-08-24 2209e40 Linux 4.8+ compatibility fix for vm stats
...which put in place the config infrastructure to detect the API change and fix an instance of
NR_FILE_PAGES
inarc_evictable_memory()
. Unfortunately the free memory calculation at that time was done by SPL freemem, and the API fix wasn't applied to the SPL.As a result our free memory calculation remained broken until it was again nearly fixed by:
2017-09-16 787acae Linux 3.14 compat: IO acct, global_page_state, etc
2017-09-19 661907e Linux 4.14 compat: IO acct, global_page_state, etc (#6655)
These two patches copy SPL freemem to
arc_free_memory()
and do theglobal_node_page_state()
split, almost correctly...Unfortunately they assume
NR_SLAB_RECLAIMABLE
moved toglobal_node_page_state()
along withNR_INACTIVE_FILE
andNR_INACTIVE_ANON
, butNR_SLAB_RECLAIMABLE
actually stayed inglobal_page_state()
, thus leaving our free memory calculation broken for 19 months on Linux v3.14+ - until now!Additionally, Linux v4.14 (torvalds/linux@c41f012a) changes
global_page_state()
toglobal_zone_page_state()
, so fix that as well.Note:
This patch should be added to any forthcoming zfs-0.7.7 patch stack - is anyone going to look after that?
How Has This Been Tested?
A previous version of this patch (w/o 4.14 API change) has been running for a week on a heavily loaded 0.7.6 / v4.9.76 system w/ 1500-2000 mounted filesystems/snapshots, and has enormously improved the sanity of the system, including removing extreme loads associated with otherwise inexplicable
zfs_prune
storms, and occasional wild swings in ARC usage.Time to let the buildbots at it...
Types of changes
Checklist:
Signed-off-by
.