Fix ENOSPC for extended quota #15312

akashb-22 · 2023-09-25T15:15:47Z

Motivation and Context

When unlinking multiple files from a pool at 100% capacity, it was possible for ENOSPC to be returned after the first few unlinks. This issue was fixed previously by PR #13172 but then, was again introduced by PR #13839. The intent of PR13839 is to have a quota limitation for the dataset by around 3%, So that the user can write more data than the quota is set and have more stable bandwidth, in the case we are overwriting data.

Looks like the same wasn't handled while the quota was exceeded and while having deferred frees.

Unluckily, this wasn't caught by the existing ZTS test case for some reason, but the same test failed when using small files for create/unlink /s at 100% fs capacity.

Reviewed-by: Dipak Ghosh dipak.ghosh@hpe.com
Signed-off-by: Akash B akash-b@hpe.com

Description

This is resolved using the existing mechanism of returning ERESTART when over quota as long as we know enough space will shortly be available after processing the pending deferred frees.

Also, updated the existing test case which reliably reproduces the issue without this patch.

How Has This Been Tested?

Test Reproducer:

+ enospc
+ fio --name=fillup --ioengine=libaio --fallocate=none --iodepth=1 --rw=write --bs=4M --size=10M --numjobs=4096 --allow_mounted_write=1 --directory=/mnt/ost0/ --group_reporting
+ echo 3 > /proc/sys/vm/drop_caches
+ zpool list
NAME        SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
pool-oss0  17.0G  16.5G   529M        -         -    29%    96%  1.00x    ONLINE  -
+ zfs list
NAME             USED  AVAIL     REFER  MOUNTPOINT
pool-oss0       13.5G     0B      128K  /pool-oss0
pool-oss0/ost0  13.5G     0B     13.5G  /mnt/ost0
+rm -f /mnt/ost0/*
rm: cannot remove '/mnt/ost0/fillup.1929.0': No space left on device
rm: cannot remove '/mnt/ost0/fillup.193.0': No space left on device
rm: cannot remove '/mnt/ost0/fillup.1930.0': No space left on device
rm: cannot remove '/mnt/ost0/fillup.1931.0': No space left on device
rm: cannot remove '/mnt/ost0/fillup.1932.0': No space left on device
rm: cannot remove '/mnt/ost0/fillup.1933.0': No space left on device
rm: cannot remove '/mnt/ost0/fillup.1934.0': No space left on device
rm: cannot remove '/mnt/ost0/fillup.1935.0': No space left on device
<truncted->
+ zpool list
NAME        SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
pool-oss0  17.0G  14.3G  2.70G        -         -     7%    84%  1.00x    ONLINE  -
+ zfs list
NAME             USED  AVAIL     REFER  MOUNTPOINT
pool-oss0       11.7G  1.78G      128K  /pool-oss0
pool-oss0/ost0  11.7G  1.78G     11.7G  /mnt/ost0
++ zpool list pool-oss0 -H -o cap
++ tr % ' '
+ s='84 '
+ '[' 84 -ge 8 ']'
+ echo 'Test failed.'
Test failed.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

behlendorf

Thanks for running this down. The fix itself looks good, just a nit about the test case.

tests/zfs-tests/tests/functional/no_space/enospc_rm.ksh

ghoshdipak

LGTM

amotin

I haven't looked very deep on the big picture, but it seems the "else" block of "ASSERT3U(used_on_disk, >=, quota)" after the "else if" is unreachable since #13839 and should be removed. The condition of the "if" is always true now.

amotin · 2023-09-26T15:21:17Z

@akashb-22 It is not the assertion, it is whole "else" is unreachable because "used_on_disk < quota" in "if" is always true.

akashb-22 · 2023-09-26T15:30:14Z

@amotin Yes, thanks for pointing this out. The entire "else" part is unreachable. I'll update it once I do a few more checks.

When unlinking multiple files from a pool at 100% capacity, it was possible for ENOSPC to be returned after the first few unlinks. This issue was fixed previously by PR openzfs#13172 but then this was again introduced by PR openzfs#13839. This is resolved using the existing mechanism of returning ERESTART when over quota as long as we know enough space will shortly be available after processing the pending deferred frees. Also, updated the existing testcase which reliably reproduced the issue without this patch. Reviewed-by: Dipak Ghosh <dipak.ghosh@hpe.com> Signed-off-by: Akash B <akash-b@hpe.com>

When unlinking multiple files from a pool at 100% capacity, it was possible for ENOSPC to be returned after the first few unlinks. This issue was fixed previously by PR openzfs#13172 but then this was again introduced by PR openzfs#13839. This is resolved using the existing mechanism of returning ERESTART when over quota as long as we know enough space will shortly be available after processing the pending deferred frees. Also, updated the existing testcase which reliably reproduced the issue without this patch. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Dipak Ghosh <dipak.ghosh@hpe.com> Signed-off-by: Akash B <akash-b@hpe.com> Closes openzfs#15312

When unlinking multiple files from a pool at 100% capacity, it was possible for ENOSPC to be returned after the first few unlinks. This issue was fixed previously by PR #13172 but then this was again introduced by PR #13839. This is resolved using the existing mechanism of returning ERESTART when over quota as long as we know enough space will shortly be available after processing the pending deferred frees. Also, updated the existing testcase which reliably reproduced the issue without this patch. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Dipak Ghosh <dipak.ghosh@hpe.com> Signed-off-by: Akash B <akash-b@hpe.com> Closes #15312

When unlinking multiple files from a pool at 100% capacity, it was possible for ENOSPC to be returned after the first few unlinks. This issue was fixed previously by PR openzfs#13172 but then this was again introduced by PR openzfs#13839. This is resolved using the existing mechanism of returning ERESTART when over quota as long as we know enough space will shortly be available after processing the pending deferred frees. Also, updated the existing testcase which reliably reproduced the issue without this patch. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Dipak Ghosh <dipak.ghosh@hpe.com> Signed-off-by: Akash B <akash-b@hpe.com> Closes openzfs#15312

akashb-22 force-pushed the ext_enospc branch from 901832a to af99099 Compare September 25, 2023 17:39

behlendorf approved these changes Sep 26, 2023

View reviewed changes

tests/zfs-tests/tests/functional/no_space/enospc_rm.ksh Outdated Show resolved Hide resolved

behlendorf requested a review from amotin September 26, 2023 00:13

behlendorf added Status: Accepted Ready to integrate (reviewed, tested) Status: Code Review Needed Ready for review and testing and removed Status: Accepted Ready to integrate (reviewed, tested) labels Sep 26, 2023

akashb-22 force-pushed the ext_enospc branch from af99099 to 9952207 Compare September 26, 2023 05:55

ghoshdipak approved these changes Sep 26, 2023

View reviewed changes

amotin suggested changes Sep 26, 2023

View reviewed changes

akashb-22 force-pushed the ext_enospc branch from 9952207 to 38f73b6 Compare September 26, 2023 15:18

behlendorf added Status: Code Review Needed Ready for review and testing and removed Status: Code Review Needed Ready for review and testing labels Sep 26, 2023

akashb-22 force-pushed the ext_enospc branch from 38f73b6 to 719b1e6 Compare September 27, 2023 07:19

amotin approved these changes Sep 27, 2023

View reviewed changes

behlendorf approved these changes Sep 28, 2023

View reviewed changes

behlendorf merged commit ba769ea into openzfs:master Sep 28, 2023
22 of 26 checks passed

akashb-22 deleted the ext_enospc branch September 29, 2023 04:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix ENOSPC for extended quota #15312

Fix ENOSPC for extended quota #15312

akashb-22 commented Sep 25, 2023

behlendorf left a comment

ghoshdipak left a comment

amotin left a comment

amotin commented Sep 26, 2023 •

edited

Loading

akashb-22 commented Sep 26, 2023

Fix ENOSPC for extended quota #15312

Fix ENOSPC for extended quota #15312

Conversation

akashb-22 commented Sep 25, 2023

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

behlendorf left a comment

Choose a reason for hiding this comment

ghoshdipak left a comment

Choose a reason for hiding this comment

amotin left a comment

Choose a reason for hiding this comment

amotin commented Sep 26, 2023 • edited Loading

akashb-22 commented Sep 26, 2023

amotin commented Sep 26, 2023 •

edited

Loading