Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ENOSPC for extended quota #15312

Merged
merged 1 commit into from
Sep 28, 2023
Merged

Conversation

akashb-22
Copy link
Contributor

Motivation and Context

When unlinking multiple files from a pool at 100% capacity, it was possible for ENOSPC to be returned after the first few unlinks. This issue was fixed previously by PR #13172 but then, was again introduced by PR #13839. The intent of PR13839 is to have a quota limitation for the dataset by around 3%, So that the user can write more data than the quota is set and have more stable bandwidth, in the case we are overwriting data.

Looks like the same wasn't handled while the quota was exceeded and while having deferred frees.

Unluckily, this wasn't caught by the existing ZTS test case for some reason, but the same test failed when using small files for create/unlink /s at 100% fs capacity.

Reviewed-by: Dipak Ghosh dipak.ghosh@hpe.com
Signed-off-by: Akash B akash-b@hpe.com

Description

This is resolved using the existing mechanism of returning ERESTART when over quota as long as we know enough space will shortly be available after processing the pending deferred frees.

Also, updated the existing test case which reliably reproduces the issue without this patch.

How Has This Been Tested?

Test Reproducer:

+ enospc
+ fio --name=fillup --ioengine=libaio --fallocate=none --iodepth=1 --rw=write --bs=4M --size=10M --numjobs=4096 --allow_mounted_write=1 --directory=/mnt/ost0/ --group_reporting
+ echo 3 > /proc/sys/vm/drop_caches
+ zpool list
NAME        SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
pool-oss0  17.0G  16.5G   529M        -         -    29%    96%  1.00x    ONLINE  -
+ zfs list
NAME             USED  AVAIL     REFER  MOUNTPOINT
pool-oss0       13.5G     0B      128K  /pool-oss0
pool-oss0/ost0  13.5G     0B     13.5G  /mnt/ost0
+rm -f /mnt/ost0/*
rm: cannot remove '/mnt/ost0/fillup.1929.0': No space left on device
rm: cannot remove '/mnt/ost0/fillup.193.0': No space left on device
rm: cannot remove '/mnt/ost0/fillup.1930.0': No space left on device
rm: cannot remove '/mnt/ost0/fillup.1931.0': No space left on device
rm: cannot remove '/mnt/ost0/fillup.1932.0': No space left on device
rm: cannot remove '/mnt/ost0/fillup.1933.0': No space left on device
rm: cannot remove '/mnt/ost0/fillup.1934.0': No space left on device
rm: cannot remove '/mnt/ost0/fillup.1935.0': No space left on device
<truncted->
+ zpool list
NAME        SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
pool-oss0  17.0G  14.3G  2.70G        -         -     7%    84%  1.00x    ONLINE  -
+ zfs list
NAME             USED  AVAIL     REFER  MOUNTPOINT
pool-oss0       11.7G  1.78G      128K  /pool-oss0
pool-oss0/ost0  11.7G  1.78G     11.7G  /mnt/ost0
++ zpool list pool-oss0 -H -o cap
++ tr % ' '
+ s='84 '
+ '[' 84 -ge 8 ']'
+ echo 'Test failed.'
Test failed.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for running this down. The fix itself looks good, just a nit about the test case.

tests/zfs-tests/tests/functional/no_space/enospc_rm.ksh Outdated Show resolved Hide resolved
@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) Status: Code Review Needed Ready for review and testing and removed Status: Accepted Ready to integrate (reviewed, tested) labels Sep 26, 2023
Copy link

@ghoshdipak ghoshdipak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@amotin amotin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked very deep on the big picture, but it seems the "else" block of "ASSERT3U(used_on_disk, >=, quota)" after the "else if" is unreachable since #13839 and should be removed. The condition of the "if" is always true now.

@amotin
Copy link
Member

amotin commented Sep 26, 2023

@akashb-22 It is not the assertion, it is whole "else" is unreachable because "used_on_disk < quota" in "if" is always true.

@akashb-22
Copy link
Contributor Author

@amotin Yes, thanks for pointing this out. The entire "else" part is unreachable. I'll update it once I do a few more checks.

@behlendorf behlendorf added Status: Code Review Needed Ready for review and testing and removed Status: Code Review Needed Ready for review and testing labels Sep 26, 2023
When unlinking multiple files from a pool at 100% capacity, it
was possible for ENOSPC to be returned after the first few unlinks.
This issue was fixed previously by PR openzfs#13172 but then this was
again introduced by PR openzfs#13839.

This is resolved using the existing mechanism of returning ERESTART
when over quota as long as we know enough space will shortly be
available after processing the pending deferred frees.

Also, updated the existing testcase which reliably reproduced the
issue without this patch.

Reviewed-by: Dipak Ghosh <dipak.ghosh@hpe.com>
Signed-off-by: Akash B <akash-b@hpe.com>
@behlendorf behlendorf merged commit ba769ea into openzfs:master Sep 28, 2023
22 of 26 checks passed
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Sep 28, 2023
When unlinking multiple files from a pool at 100% capacity, it
was possible for ENOSPC to be returned after the first few unlinks.
This issue was fixed previously by PR openzfs#13172 but then this was
again introduced by PR openzfs#13839.

This is resolved using the existing mechanism of returning ERESTART
when over quota as long as we know enough space will shortly be
available after processing the pending deferred frees.

Also, updated the existing testcase which reliably reproduced the
issue without this patch.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Dipak Ghosh <dipak.ghosh@hpe.com>
Signed-off-by: Akash B <akash-b@hpe.com>
Closes openzfs#15312
behlendorf pushed a commit that referenced this pull request Sep 28, 2023
When unlinking multiple files from a pool at 100% capacity, it
was possible for ENOSPC to be returned after the first few unlinks.
This issue was fixed previously by PR #13172 but then this was
again introduced by PR #13839.

This is resolved using the existing mechanism of returning ERESTART
when over quota as long as we know enough space will shortly be
available after processing the pending deferred frees.

Also, updated the existing testcase which reliably reproduced the
issue without this patch.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Dipak Ghosh <dipak.ghosh@hpe.com>
Signed-off-by: Akash B <akash-b@hpe.com>
Closes #15312
@akashb-22 akashb-22 deleted the ext_enospc branch September 29, 2023 04:05
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Dec 12, 2023
When unlinking multiple files from a pool at 100% capacity, it
was possible for ENOSPC to be returned after the first few unlinks.
This issue was fixed previously by PR openzfs#13172 but then this was
again introduced by PR openzfs#13839.

This is resolved using the existing mechanism of returning ERESTART
when over quota as long as we know enough space will shortly be
available after processing the pending deferred frees.

Also, updated the existing testcase which reliably reproduced the
issue without this patch.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Dipak Ghosh <dipak.ghosh@hpe.com>
Signed-off-by: Akash B <akash-b@hpe.com>
Closes openzfs#15312
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Code Review Needed Ready for review and testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants