-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zfs incremental recv fails with 'destination already exists' and corrupts pool #7735
Comments
Today's debug results: By carefully using the method from zfs_revert described at #6497 and #5389, I managed to delete the latest TXG uberblock, and finally could import the pool. I'll now start a scrub and sleep. Any more ideas? I doubt that retrying the recv will work, but maybe with other send options... @rincebrain could this be another variant of the |
@dioni21 If it's a send bug, I would swear it's not related, since ignore_hole_birth should explicitly disable ever using that kind of information, so even if there are more hole_birth bugs, it shouldn't apply here. It'd have been nice if we could have somehow kept the mangled block around to compare and see what, if anything, else was hosed, but that's a burnt bridge now. Given the number of bugs that appear to involve impossible values showing up in structures, I have to wonder if you've got a CPU or RAM problem with bits flipping. |
We have talked about I've been told that using dedup is NOT recommended. Maybe most people do not use it, and do not see these bugs?
Indeed, I still have the second disk from first copy, in which I am almost sure that the very same bug has happened, and upgrading kernel only made it worse. But what exactly do you want, and how do I extract? Here is a dump of the cleaned uber block, for example:
If that was the problem, it would not be a repeated bug, at the very same point ( I really thought I had some bad hardware when I had similar issues as #7723 but that was simply a recent kernel bug. I have run memcheck and mprime, 24hrs each, without issues. |
@dioni21 I'm not saying it can't be a bug, just that if it is, it's not related to the hole_birth bug, and other than bugs in the not-in-a-release-yet encryption code, I haven't seen any zfs send bugs that aren't filed by you in recent memory. It's not impossible that nobody else has run into it, since the ZFS code doesn't have secret telemetry phoning home to report who uses what features, but particularly with how you're running into it reliably, I would be quite surprised if it's "just" that nobody who uses send -D reported a bug. If it were something like #7723, then you'd be in a rather sad position, because that bug appears to have not been bad hardware, but instead, a bad kernel that stomped on memory. If the bug is like #6439, then dnodesize=legacy might work around it too, and then we know that the codepaths that need examining are ones where send -D is enabled and dnodesize is not legacy, but that makes it still surprising that it's a reliably reproducible race condition. |
Okay, I spent a little bit of time and couldn't easily reproduce this last night with those exact pool properties and send flags on a single-disk pool. I'm trying to think of a good set of instructions to give you for further investigation without you having to share the datasets in question. If I had the send streams, I think what I'd do with them would be trying to send e.g. half of the incremental stream and seeing if it panics, if not, 75%, if so, 25%, and so on, and continue until I ended up with whichever specific record in the stream made it eat paste, and then go tear apart the send stream for all the records related to that object. (I'd probably cheat and start at something like ~1684 MiB into the incremental stream, since that's the last part of your output before it panics, but that's just a premature optimization.) Another approach would be to do the above until I knew what specific item's records were inducing the panic, then, instead of trying to tear apart the send stream, using zdb to investigate the object(s) in question on the sender side and see how they're "fine" on the source but causing fires when recv'd. |
@rincebrain first of all, I wish to thank you a lot to keep giving me attention! I have some more data to add from recent tests.
😢
Humm. I think it's feasible. I'll have to change mbuffer to dd so that I can see which block gives error. But now that you said this, I noticed that each run stopped at a different point. Those that I have logged:
Everytime I read from the original pool, not from a dump file. It is possible that on some of these times there were a backend free operation, from the last
I'm not sure what this means, though...
One of the latest tests I've done was running a I could run a zdb overnight on the source pool exported, any suggestions? |
Yeah, I was thinking it might be a background processing task that mangled data, so if you decided to try chopping up the stream to figure out which part breaks it, I'd include something like a zpool sync; sleep 60; to see if it panicked at the point that you cut off the input. |
Yes, but I'm sure that my very first run was before upgrading to that kernel. And since I knew about it, I'm running in the previously safe kernel again. No more panic due to that.
I thought about that to. I changed the new dataset to
Not finished yet. Currently on 160G of a 340G source dataset. Also, a friend insisted that
I even did remove the l2arc device (an lvm on ssd), no change... Another try:
The panic message on import is now somewhat different, and gave me some fear:
Not good, lets disable it for next tests:
|
That doesn't look especially surprising - it's trying to repair a DDT entry and the bp is NULL, so it tries to reach into the bp and NULL pointer+epsilon dereferences. I think most of the settings you used on tank/jonny are academic, since zfs send -R will overwrite them with whatever the settings were on the source side, unless you use -x or -o to override anything in received datasets. |
Note that in this run I did not use Current settings are:
|
I thought we were already certain that the presence or absence of the -D flag induces this problem, from #7703? |
I think this is another bug. Maybe related to dedup, but not to Since we found that
(Humm, maybe the And on the second pass:
Where it stopped after sending two good datasets. BTW: I recently renamed the |
Ah, I hadn't seen that you had changed the send command to not include dedup above. I...don't recall any such bug about snapshot names? |
Since I'm guessing that something about the send stream is making the receiver scribble over things wildly, I'd probably suggest at least trying to build a kASAN kernel+modules and seeing if it can tell you anything useful about the memory accesses. It might not work, since there's definitely a chance it's processing "valid" data and ending up with its pants on its head, but it's worth a try. |
Beware, long text below...
I did not know KASAN, very interesting... But today I was thinking about moving to ZFS master code. 0.7 is lagging behind with many features and bug fixes. For example, the only issue I could find on github with the same error message is on #2739. It appears to be related to clones, which I don't have. The issue is closed and marked for 0.8.0 milestone, but I could not find any patch. Is it solved? Do you think this is a bad idea? I have been in master sometime ago, but downgraded to stable after finding its yum repo, and hoping for more stability. Anyway, today's test report: Yesterday I said:
I was not sure if the reason for this was that the sending stream has been generated with mixed options. So I did some runs with the exact same input. First, creating the input, and saving it in a ext4 filesystem:
Now, let do some receives/panic/revert/reboot/retries:
The CTRL-C was needed to stop dd and get its last status.
Let's try again:
There is a long time between
You also said before:
Knowing that it always stop around 1.4G should I chop the input file to 75% (and so on) of this size and try again? I suspect your intention to see how far can the stream go before returning EEXISTS, right? I could do this tomorrow. But let's continue with the tests I already did:
Suspecting anything in dedup code, I did an incremental transfer with dedup disabled:
No luck yet, but the recv processing was much faster without the dedup code. ;-) And what if I copied the WHOLE dataset without dedup?
The last commands took too much time. Probably because of background freeing in a SATA disk. So let's wait for it to finish before continuing. I will not be present, so automate it:
When I got home, do the incremental part:
Hey! Error, but no panic??? Wait?
Now, PANIC!
And lastly, something I did earlier. Only two datasets did not receive the incremental stream. Let's see if only one dataset is problematic.
Hey, no panic! And no need to rollback, since I did not use
Again in the morning:
So, it is not exact, but always near the same point that the problem occur. The good news is that I can use this stream to do the stream partitioning test, since I do not need to reboot to retry. The bad news is that maybe there are TWO bugs. One for the EEXISTS, and another for the data corruption. What's next? Well, I will stop tests for a while, and wait for your suggestion on the next step:
I love to debug, but this is taking too much time already, and my family wants to use the home primary computer, which is somewhat on hold. |
@rincebrain Any suggestion from the above? I'm inclined to go master and restart send/recv with a clean pool. But running on master could have been the initial problem. BTW: I already compiled the master modules (commit id fb7307b) , and did another chance to
The error occurred at this part of the file:
So, can I conclude that there is a serious problem in my dedup tables? Problem is: what pointer has the value Right now my source pool has a |
I mean, if ddt_phys_total_refcnt is inlined/a macro, that'd explain your line numbering. I would not suggest using master without having good backups, not because I know of a great many outstanding bugs against it, but because a lot of big features have been merged, and some of them have had bugs found in the past. I think we already knew your DDT was sometimes getting mangled from #7703 - though I thought you had started receiving without dedup being enabled at all to reproduce this. |
Compilation options, by default, are
Yes, but the source pool has already been there, some time ago. And I'm getting out of options... :-(
I did this only for a single dataset. Not for the whole pool. |
Ah, you only disabled dedup for one dataset. So, I have two thoughts. One is that, given that you need this data to be up and running for others, you should probably just resort to rsyncing and recreating the snapshots, rather than debugging this, as much as I would prefer to find out what's rotten in the state of Denmark. The other is that, assuming you want to keep debugging this, what you'd need would be the exact instructions where ASAN is reporting a NULL read to be certain what the operation involved is - basically looking at the output of the disas[semble] command from gdb on the zdb binary in question, or if you can convince ASAN to not try to list symbols, and instead list specific offsets into the function (which is probably there in the difference between 0x417b6c and where zdb_count_block "starts", e.g. info line zdb_count_block on the zdb binary will probably say something like starting at 0x417b00 and then you'd want to look at the instructions around the pc location from ASAN's output to figure out what the hell it was actually noticing on fire). |
😂😂😂😂😂😂😂😂
Since I had some time, I went the other way: cleared the destination pool, disabled dedup in the source pool (I know this will not clear dedup, but I need this to avoid --replicate to create new pools with dedup), and restarted a copy with the simplest approach:
In short: no problems. I even ran a zdb in this new pool. No problems also!
Ok, next step, incremental send:
Oh GOD! Again? Also:
Now I can be sure that the problem is in the source pool. I am stubborn, and will make a new try, with a new set of snapshots and see if it makes a difference. If not, will resort to rsync. Thanks again for you help until now... |
New snapshot, new life. Start from the very beginning. Clear all snapshots from source pool. Also, reset zfs (most) parameters to original setup (dedup,xattr,dnodesize). Create a new snapshot to start Re-add the second disk to destination pool, it now or never! Go!
So far, so good. Now, the incremental step:
Hey, where's the bug? Maybe it was not in the Some things are still different from the original setup:
Anyway, I think I'll say it's done for now. I'll finish my hard drive migration and hope for no more encounters with this "bug/feature/glitch". |
This will likely be fixed by a patch stack to port some recent fixes to the 0.7-release branch. These fixes deal with receiving incremental streams in which dnode slots are recycled and their size changes: dweeezil:zfs-0.7-release-recv-dnode. There was previously a patch from #6576 to fix this problem but it was reverted. The patch stack I referred to above pulls in some more recent work that was mainly geared toward raw send streams, but also included more robust fixes to this problem. |
Closing. The fixes for this issue were included in 0.7.11. We will be releasing an 0.7.12 for #7933. |
System information
Describe the problem you're observing
Trying to move and cleanup data to a new pool, in bigger disks.
Completely zero the new disk, and create a single GPT partition
No problem until here. To ensure this, system I/O has been kept to a minimum, graphics environment disabled, and kernel rewind to a stable one (#7703 has affected me too)
Now, let's try to move another incremental step:
Meanwhile:
After this,
zfs receive -A tank/jonny
never returns.Indeed, any command that would access tank freezes.
To be able to fully reboot I had to disable automatic pool import. Whenever I try to import the new pool, I get this panic:
Not even
zpool import -F
work, always panic.Describe how to reproduce the problem
Apparently, this is not the first time in my setup, I think this is what have caused the destruction I told about in #7703 and #6414
That's why in this run I just have one disk in the new pool
Include any warning/errors/backtraces from the system logs
See above...
The text was updated successfully, but these errors were encountered: