Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data corruption after TRIM #14513

Open
notramo opened this issue Feb 19, 2023 · 21 comments
Open

Data corruption after TRIM #14513

notramo opened this issue Feb 19, 2023 · 21 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@notramo
Copy link

notramo commented Feb 19, 2023

System information

Type Version/Name
Distribution Name Void Linux
Distribution Version musl, up to date (rolling)
Kernel Version 6.1.8_1
Architecture x86_64
OpenZFS Version 2.1.7-1

Describe the problem you're observing

Data corruption in some files and lot of checksum errors.

Describe how to reproduce the problem

I ran zpool scrub on a simple zpool, consisting of a single partition on a 120 GiB SSD. It finished without any errors (neither data or checksum).
The I ran zpool trim on the pool. Then a few minutes later I saw 2 checksum errors by the report. Then a few more.
(I'm not sure if the first error appeared before or after the TRIM.)
I ran zfs trim again, because the first time it finished quite fast despite there were much space to trim. Few more errors. (Probably not related to running TRIM again, because the errors has been constantly increasing since then.)
Then I was thinking maybe the first scrub before the trim somehow did not notice the errors, so I ran zfs scrub again. The errors grown to 100-300 in the first few seconds, so I stopped it.
At this point I couldn't use zfs send to save snapshot, so I used rsync to backup the data. It reported I/O error for a few files.

It's an SSD, which report 512 blocks size, so I used ashift=9. Later I learned that most SSDs actually use larger blocks, but report 512. I don't know if it affected the trim.

The sda1 BTRFS partition contains ~2 GiB data (kernels, initramfs, and boot related stuff). I run a TRIM and a btrfs scrub on it. No errors were found.

A SMART extended self test was running when I first ran the zfs scrub (and possibly when I run zfs trim the first time, but it could have finished then).

Native encryption and compression is active.

Partition scheme:

NAME   FSTYPE     FSVER LABEL
sda
├─sda1 btrfs            VoidBoot
├─sda2 zfs_member 5000  rpool
└─sda3 swap       1     VoidSwap

Output of zpool status before submitting issue:

  pool: rpool
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub canceled on Sun Feb 19 16:29:42 2023
config:

	NAME        STATE     READ WRITE CKSUM
	rpool       DEGRADED     0     0     0
	  sda2      DEGRADED     0     0 4.28K  too many errors

errors: 481 data errors, use '-v' for a list

Include any warning/errors/backtraces from the system logs

No errors in dmesg.

@notramo notramo added the Type: Defect Incorrect behavior (e.g. crash, hang) label Feb 19, 2023
@rincebrain
Copy link
Contributor

rincebrain commented Feb 19, 2023

n.b. zpool trim isn't going to block until done unless you use -w, it's going to do it in the background, and a lot of drives IME will just internally queue up the TRIM commands and return done to them almost instantly while actually releasing the blocks in the background, which on some (NVMe) drives, you can see with the "space allocated" counter, You can see how far ZFS has progressed on it from its perspective with zpool status -t.

It'd be useful to know the model and make of the SSD, and firmware version, as some SSDs have had...questionable TRIM implementations. (Not saying ZFS can't have a bug here, just useful as a data point if it turns out that it only breaks on WD SSDs or Samsung or something.)

e: It'd also be interesting if you have any of the zpool events -v output for any of the checksum errors, because that can include how much the checksum differed from expected.

It could, of course, also be native encryption being on fire, because why not.

@notramo
Copy link
Author

notramo commented Feb 19, 2023

I have used it with this file system for more than a year, and never got an error. It went without trimming, because I only noticed today that autotrim was disabled.
The first scrub (before trim) however returned 0 errors, and they only started appearing after zfs trim.
However, there is zero read or write errors, only a lot of checksum errors.

The model of the drive is WDC WDS120G2G0A-00JH30.

Worth noting that there is a swap partition on the drive (sda3), which has been extensively used (swappiness 60), but I never got a kernel panic or any other memory error which would indicate that the drive fails under the swap partition. Neither got an error on the BTRFS boot partition.

@rincebrain
Copy link
Contributor

rincebrain commented Feb 19, 2023

Of course, I'm not trying to suggest you did something wrong or that it's necessarily anything else at fault, just trying to figure out what makes your setup different from many other people who haven't been bitten by this.

This suggests TRIM support on that drive might be in The Bad Place(tm) sometimes, and I don't see an existing erratum in Linux's libata for it? Unclear why btrfs might not be upset about that, though, maybe not using enough queued IO, or not doing writes at the same time?

You could test this, I think, by forcibly telling Linux to apply that erratum to that drive. Something like libata.force=noncqtrim to your kernel parameters would disable NCQ while doing TRIM for all drives in the machine, and then there's also syntax for being more granular.

@djkazic
Copy link

djkazic commented Dec 9, 2023

I've been hit by this issue as well.

arch: x86_64
os: ubuntu 20.04 (kernel 5.6)
zfs: 2.2.2
libata.force=noncqtrim: applied to kernel options
drives: SanDisk SSD Plus 2TB (fw UP4504RL)
vdev topology: 2-way mirror

I also tried setting queue depth to 1 without improvement. I can repro it everytime by trimming one device, then scrubbing. That device will show 10 to 30 checksum errors, which are fixed by the scrub.

@rincebrain
Copy link
Contributor

You're absolutely sure you're actually running 2.2.2?

Because #15395 would be the obvious reason to suspect something going foul here, and that's in 2.2.0+.

@djkazic
Copy link

djkazic commented Dec 9, 2023

Yes, I'm sure.

> $ zfs --version
zfs-2.2.2-1
zfs-kmod-2.2.2-1

> $ zpool --version
zfs-2.2.2-1
zfs-kmod-2.2.2-1

@rincebrain
Copy link
Contributor

#15588 becomes my default guess for a thing to try, then, though I'd also test that you can't reproduce this with a non-ZFS configuration to be sure it's not just the drive doing something ridiculous with TRIM.

@djkazic
Copy link

djkazic commented Dec 9, 2023

Hmm, okay. I'll wait for the next minor release that will include it, this is a production system so I can't test it with non-ZFS filesystems at the moment, though I could buy a replacement drive from a different manufacturer and replace one of the two drives for the mirror vdev.

Is there any unofficial list on the best SSDs to use with ZFS? I know Samsung SATA drives have NCQ TRIM issues, should I get a Micron one? I'm a bit lost.

@rincebrain
Copy link
Contributor

I wouldn't expect that fix to go into a minor release, though I could be wrong.

I'm not really aware of any SSDs that should have visible issues with ZFS where the underlying issue is an interaction with the SSD. #14793 has a number of people getting upset about certain models of SSD, but a number of them also reported getting a better PSU made the issues disappear, so who knows.

There's also #15588 complicating matters, but that seems more controller and memory pressure-specific and not drive.

@robn
Copy link
Member

robn commented Dec 9, 2023

I highly doubt #15588 will have anything to do with it - it doesn't change TRIM.

@djkazic
Copy link

djkazic commented Dec 9, 2023

I suspect this may be down to this model of drive's firmware so I've ordered two commonly used drives to replace them. Will see if this can fix the issue in the next few days.

@djkazic
Copy link

djkazic commented Dec 11, 2023

Update: after switching to 870 Evos the issue is no longer reproducible. No corruption or checksum errors on TRIM.

@LTek-online
Copy link

I've got the same error on nearly the same SSDs. In my case it's a Western Digital WDS240G2G0A-00JH30 and a SanDisk SDSSDA-240G (they are both the same drive, one is just rebranded).

When debugging I tested this error with a lot of different drives and it only seems to affect these two one. One notable difference of these two drives to all the other ones I've tested is that only these drives use DRAT TRIM (deterministic return after trim). All the other ones I have tested either claim to have non deterministic trim or RZAT TRIM (return zero after trim).

I'm wondering if maybe ZFS has a problem with DRAT TRIM ssds or if it's just the SanDisk / WD ssds behaving badly.

I also verified through testing that it's not the PSU / Sata Controller / Sata cables / RAM / CPU causing these issues. I've tried three different machines with different hardware and the error always only manifests with those two SSDs. The S.M.A.R.T values are all ok and the error also only occurs after deleting some files and then running a trim and then a scrub. Without trim, no error occurs at all.

System info:

Info Value
ZFS Ver 2.1.11-1
OS Debian 12
Kernel 6.1.0-16-amd64

@rincebrain
Copy link
Contributor

cf. #15395 in 2.1.14.

@notramo
Copy link
Author

notramo commented Jan 20, 2024

I am now using the TRIM without any issues, on the same drive as I opened the issue originally about. However, there are some differences since then:

  • I reformatted the pool, with ashift=12. Previously it was probably ashift=9. The SSD might have TRIM issues with smaller ashift.
  • The SSD is in a different, newer laptop.

I don't know how these could affect the bugs, but I thought I'll throw in this info here in case someone finds it useful.

@LTek-online
Copy link

I did now retest it ashift=12 like @notramo and also had success. In my case there was no hardware change, so only the ashift=12 value changed. No checksum errors after running trim.
Also, as I doubted the trim capabilities of the drives I retested it according to this doc https://wiki.ubuntuusers.de/SSD/TRIM/Testen/ and found that the drive actually uses RZAT trim.

@behlendorf
Copy link
Contributor

If you've updated to 2.1.14 or newer and are no longer seeing this it was probably caused by #15395 which has been resolved.

@LTek-online
Copy link

I retested it on 2.2.2-4 in Debian 13 and the issue is still present.

@behlendorf
Copy link
Contributor

That's unfortunate, version 2.2.2 does include the fix for #15395.

@notramo
Copy link
Author

notramo commented Feb 2, 2024

It's unlikely that #15395 caused it because I did not put the system under reasonable write workload while trimming.

@DISTREAT
Copy link

DISTREAT commented Nov 25, 2024

I am facing the exact same situation using a Marvell-based SanDisk SSD Plus.

There have not been any issues until enabling auto-trim. In addition, the issues only occured after a TRIM operation, as can be seen in the zpool history.

In terms of my setup, I am also using ashift=9, native encryption, and am running NixOS. However, I will also try to up the ashift value to 12, in case the sector size is being misreported.

Update: apparently no issues so far, invoking trim manually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

7 participants