Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zpool trim service fails on pools that contain trimmable cache devices #14488

Open
Rudd-O opened this issue Feb 13, 2023 · 17 comments
Open

zpool trim service fails on pools that contain trimmable cache devices #14488

Rudd-O opened this issue Feb 13, 2023 · 17 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@Rudd-O
Copy link
Contributor

Rudd-O commented Feb 13, 2023

System information

Type | Version/Name
Linux | 37
Distribution Name | Fedora
Distribution Version | 37
Kernel Version | 6.1.10-200.fc37.x86_64
Architecture | x86_64
OpenZFS Version |master branch

Describe the problem you're observing

I've enabled weekly trimming of my pool. The pool contains rotational devices as data bearing devices, and one SSD as cache device. All of them are encrypted with LUKS, with the dm option discard so the SSDs are trimmable. blkdiscard of the LUKS device works perfectly.

Nevertheless, attempting a trim through the unit (/sbin/zpool trim -w backups is basically what it resolves to) fails with log:

Feb 13 00:48:29 milena.dragonfear systemd[1]: Started zfs-trim@backups.service - zpool trim on backups.
Feb 13 00:48:29 milena.dragonfear sh[29079]: cannot trim: no devices in pool support trim operations
Feb 13 00:48:29 milena.dragonfear systemd[1]: zfs-trim@backups.service: Main process exited, code=exited, status=255/EXCEPTION
Feb 13 00:48:29 milena.dragonfear systemd[1]: zfs-trim@backups.service: Failed with result 'exit-code'.

I would expect at least to see the SSD cache device trimmed.

@Rudd-O Rudd-O added the Type: Defect Incorrect behavior (e.g. crash, hang) label Feb 13, 2023
@Haravikk
Copy link

man zpoolprops says under autotrim:

TRIM on L2ARC devices is enabled by setting l2arc_trim_ahead > 0.

So it sounds like this might be intended behaviour, and we're supposed to set l2arc_trim_ahead somewhere instead, although I'm not sure where (vdev setting? I don't usually mess with those).

Not the most intuitive behaviour though, I too would expect TRIM to apply all TRIM enabled devices in the pool. That said, TRIM may not be necessary depending upon the SSD(s) you're using?

@Rudd-O
Copy link
Contributor Author

Rudd-O commented Feb 20, 2023

I'll try that out. Had no idea.

@Rudd-O
Copy link
Contributor Author

Rudd-O commented Feb 20, 2023

l2arc_trim_ahead is not documented to be a module parameter in the man page, by the way. That seems wrong. I thought adding it as a zpool property was what I needed to do.

@Rudd-O
Copy link
Contributor Author

Rudd-O commented Feb 20, 2023

Actually, that tunable is not what I want anyway. I don't want autotrim. I want zpool-trim@.service to work. Autotrim does not work with my devices correctly — the machine hangs every couple days with it enabled. If I issue a zpool trim command on a pool that has a trimmable L2 ARC device, I just want that to work.

@mailinglists35
Copy link

what does it mean "l2arc_trim_ahead > 0" ? 1? anything?

@mailinglists35
Copy link

I've done echo 1 | sudo tee /sys/module/zfs/parameters/l2arc_trim_ahead but l2arc still does not get trimmed.

log device gets trimmed anyway regardless of l2arc_trim_ahead 0 or 1

@mailinglists35
Copy link

hm, description of that parameter is in the original pull request
#9789

@mailinglists35 mailinglists35 mentioned this issue Jun 7, 2023
12 tasks
@gamanakis
Copy link
Contributor

@mailinglists35 thanks for bringing this to my attention. Set the zfs module parameter l2arc_trim_ahead to 100 and report what zpool status says.

@mailinglists35
Copy link

same behaviour with l2arc_trim_ahead set to 100, the cache device remains untrimmed

@mailinglists35
Copy link

$ zpool status -t mypool'
  pool: mypool
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
  scan: resilvered 21.6G in 00:07:39 with 0 errors on Thu May 25 09:08:04 2023
config:

        NAME            STATE     READ WRITE CKSUM
        mypool            DEGRADED     0     0     0
          mirror-0      DEGRADED     0     0     0
            hgst        ONLINE       0     0     0  (trim unsupported)
            enterprise  OFFLINE      0     0     0  (trim unsupported)
            iron        ONLINE       0     0     0  (trim unsupported)
        logs
          zlog.mypool     ONLINE       0     0     0  (100% trimmed, completed at Wed 07 Jun 2023 11:45:55 PM EEST)
        cache
          zcache.mypool   ONLINE       0     0     0  (untrimmed)

errors: No known data errors

@mailinglists35
Copy link

interesting, it trims the l2arc on zpool add, but not on zpool trim. below is a pool with newly added cache:

$ zpool status exos -t
  pool: exos
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 16:08:40 with 0 errors on Sun May 14 16:32:41 2023
config:

        NAME                  STATE     READ WRITE CKSUM
        exos                  ONLINE       0     0     0
          exos-x18-luks       ONLINE       0     0     0  (trim unsupported)
        logs
          mainvg-zlog.exos    ONLINE       0     0     0  (untrimmed)
        cache
          mainvg-zcache.exos  ONLINE       0     0     0  (100% trimmed, completed at Thu 08 Jun 2023 12:07:22 AM EEST)

errors: No known data errors

@mailinglists35
Copy link

it only trims on add, not on import

@gamanakis
Copy link
Contributor

gamanakis commented Jun 7, 2023

the untrimming showing in zpool status refers to whether the device was trimmed on addition.
To see if it is being trimmed when writing to L2ARC do: zpool iostat -l and watch the trim queue. That should be reporting some number other than 0 when writing to l2arc and l2arc_trim_ahead>0. See also the manpage man/man4/zfs.4 about that variable.

@Rudd-O
Copy link
Contributor Author

Rudd-O commented Sep 19, 2023

[root@milena ~]# echo 100 > /sys/module/zfs/parameters/l2arc_trim_ahead 
[root@milena ~]# cat /sys/module/zfs/parameters/l2arc_trim_ahead 
100
[root@milena ~]# zpool trim -w backups
cannot trim: no devices in pool support trim operations

Setting this parameter does not change the error message produced by zpool trim.

@Rudd-O
Copy link
Contributor Author

Rudd-O commented Sep 19, 2023

If the zfs trim command is going to tell me that nothing can be trimmed, at least this should not be an error condition in the zfs trim command, since there is no error — just a report of the fact that nothing will be done.

That would fix the weekly and monthly timer units that exist for trimming.

@danieldietsch
Copy link

Did you have any luck getting trim for cache to work?
I am not sure if I have a similar problem. I have two SSDs, each with two partitions. I use the two smaller partitions as mirrored log, and the two larger ones as cache. Each partition is its own LUKS device with allow_discards set, but it seems the pool does not recognize these as trimable.
Is this expected?

I am on Gentoo with ZFS 2.2.2 with a custom 6.5.10 kernel and cryptsetup 2.6.1.

❯ zpool status -t
  pool: crypto-storage-zfs-tng
 state: ONLINE
  scan: resilvered 8.07T in 1 days 07:11:25 with 0 errors on Sun Dec  3 04:52:23 2023
config:

        NAME                         STATE     READ WRITE CKSUM
        crypto-storage-zfs-tng       ONLINE       0     0     0
          raidz2-0                   ONLINE       0     0     0
            crypto-storage-a ONLINE       0     0     0  (trim unsupported)
            crypto-storage-b ONLINE       0     0     0  (trim unsupported)
            crypto-storage-c ONLINE       0     0     0  (trim unsupported)
            crypto-storage-d ONLINE       0     0     0  (trim unsupported)
            crypto-storage-e ONLINE       0     0     0  (trim unsupported)
            crypto-storage-f ONLINE       0     0     0  (trim unsupported)
        logs
          mirror-2                   ONLINE       0     0     0
            crypto-zil-i             ONLINE       0     0     0  (trim unsupported)
            crypto-zil-j             ONLINE       0     0     0  (trim unsupported)
        cache
          crypto-l2arc-i             ONLINE       0     0     0  (trim unsupported)
          crypto-l2arc-j             ONLINE       0     0     0  (trim unsupported)

errors: No known data errors

@Rudd-O
Copy link
Contributor Author

Rudd-O commented Dec 5, 2023

Not on my side — I have been removing the cache device from the pool, trimming it manually, and re-adding the cache device once trim is done. This is, of course, a pain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

5 participants