Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs diff fails to report removed files #2081

Closed
aerusso opened this issue Jan 25, 2014 · 14 comments
Closed

zfs diff fails to report removed files #2081

aerusso opened this issue Jan 25, 2014 · 14 comments

Comments

@aerusso
Copy link
Contributor

aerusso commented Jan 25, 2014

Linux host 3.11-2-amd64 #1 SMP Debian 3.11.10-1 (2013-12-04) x86_64 GNU/Linux

I am using 0.6.2 from the zfsonlinux apt repositories. This pool has no data errors, and is scrubbed fairly regularly.

The problem is that zfs diff sometimes fails to mention when files are deleted. The following example illustrates (this is real output that I have changed the names in):

root@host:/tank/fs/oldit% zfs diff tank/fs@WHA
root@host:/tank/fs/oldit% df -h .
Filesystem                Size  Used Avail Use% Mounted on
tank/fs  640G   16G  624G   3% /tank/fs
root@host:/tank/fs/oldit% rm Track01.mp3
root@host:/tank/fs/oldit% zfs diff tank/fs@WHA
M       /tank/fs/oldit

Shouldn't this mention that "/tank/fs/oldit/Track01.mp3" specifically has been removed?

Let's check if file creations are affected:

root@host:/tank/fs/oldit% zfs rollback tank/fs@WHA
root@host:/tank/fs/oldit% touch ab
root@host:/tank/fs/oldit% zfs diff tank/fs@WHA
M       /tank/fs/oldit
+       /tank/fs/oldit/ab

What about this new file? Will it be properly reported if removed?

root@host:/tank/fs/oldit% zfs snapshot tank/fs@WHA-1
root@host:/tank/fs/oldit% rm ab
root@host:/tank/fs/oldit% zfs diff tank/fs@WHA
M       /tank/fs/oldit
root@host:/tank/fs/oldit% zfs diff tank/fs@WHA-1
M       /tank/fs/oldit
-       /tank/fs/oldit/ab

For the heck of it, what happens if we remove "Track01.mp3" now?

root@host:/tank/fs/oldit% rm Track01.mp3
root@host:/tank/fs/oldit% zfs diff tank/fs@WHA
M       /tank/fs/oldit
root@host:/tank/fs/oldit% zfs diff tank/fs@WHA-1
M       /tank/fs/oldit
-       /tank/fs/oldit/Track01.mp3
-       /tank/fs/oldit/ab

This is strange, right? What happens if we do not create "ab"?

root@host:/tank/fs/oldit% zfs destroy tank/fs@WHA-1
root@host:/tank/fs/oldit% zfs rollback tank/fs@WHA
root@host:/tank/fs/oldit% zfs snapshot tank/fs@WHA-1
root@host:/tank/fs/oldit% rm Track01.mp3
root@host:/tank/fs/oldit% zfs diff tank/fs@WHA
M       /tank/fs/oldit
root@host:/tank/fs/oldit% zfs diff tank/fs@WHA-1
M       /tank/fs/oldit

I don't understand what is going on here. Every file I have checked seems to be affected in this way (i.e., Track01.mp3 isn't special here). Also, this problem is somehow new: I was able to identify file deletions at least a week ago (the snapshots do show the deletions specifically now). EDIT: If the directory is modified in the "fromsnap" the issue does not occur (also see my comment). I don't believe the issue is new, I just don't think I noticed it before.

I'll mention that I have at least one snapshot affected by issue #948, though that snapshot is almost a year old.

Thanks

@aerusso
Copy link
Contributor Author

aerusso commented Jan 26, 2014

So I poked around a tiny bit, and the issue is reproducible (I just didn't notice it earlier).

root@host:/% zfs create tank/test
root@host:/% cd /tank/test
root@host:/tank/test% touch a
root@host:/tank/test% zfs snapshot tank/test@1
root@host:/tank/test% zfs snapshot tank/test@2
root@host:/tank/test% rm a
root@host:/tank/test% zfs diff tank/test@1
M       /tank/test/
-       /tank/test/a
root@host:/tank/test% zfs diff tank/test@2
M       /tank/test/
root@host:/tank/test%

In the second call to zfs diff, libzfs_diff.c:write_free_diffs(), ZFS_IOC_NEXT_OBJ is returning ESRCH, and so describe_free is not being called. I'm scared of kernel code, so I'm not going to poke any deeper.

@dweeezil
Copy link
Contributor

I looked into this and tested under both ZoL (trying various commits between current master code and going back to 0.6.2) and also under FreeBSD (9-STABLE) and got the same results every time. This looks like it may be a bug in all ZFS implementations. The problem doesn't happen if there's a change to the file system in between the creation of the two snapshots. For example, if you add a "touch b" between the snapshots, both diffs do properly report the deletion of "a".

I've not had the need to look into the dmu_diff code but I suspect that's where the problem might lie. I'll look into it further when I get a chance.

In the mean time, it would be nice if someone with access to Illumos, Solaris, SmartOS, etc. could try this. I will take a look at Illumos to see whether there are any outstanding issues regarding zfs diff that we may not have picked up.

@pyavdr
Copy link
Contributor

pyavdr commented Jan 27, 2014

It is the same on SmartOS, tested just now.

@aerusso
Copy link
Contributor Author

aerusso commented Jan 28, 2014

So, I really don't know anything about the internals of ZFS, or the kernel, but I am particularly flummoxed by zfs_ioc_next_obj. I'll mention that it is only used once within ZFS tree, in libzfs_diff.c

static int
zfs_ioc_next_obj(zfs_cmd_t *zc)
{
    objset_t *os = NULL;
    int error;

    error = dmu_objset_hold(zc->zc_name, FTAG, &os);
    if (error != 0)
        return (error);

    error = dmu_object_next(os, &zc->zc_obj, B_FALSE,
        os->os_dsl_dataset->ds_phys->ds_prev_snap_txg);
//Why isn't the above argument 0? Why do we insist the object be found in/after this transaction?
    dmu_objset_rele(os, FTAG);
    return (error);
}

Just simply replacing that argument with zero alleviates my bug (but I'm not arrogant enough to test this on a live system until I get some expert feedback).

(I obviously recognize that if there is another consumer for this interface, the solution will be more complicated.)

@ghost
Copy link

ghost commented Jul 23, 2014

Confirmed repro by #2081 (comment) on 0.6.3 x86-64.

@hungrywolf27
Copy link

Confirmed on OpenIndiana 151a8, and comment added to (previously closed) Illumos issue here:
https://www.illumos.org/issues/3043

@behlendorf behlendorf removed this from the 0.6.7 milestone Oct 30, 2014
@aerusso
Copy link
Contributor Author

aerusso commented May 13, 2015

So, I've been running aerusso/zfs@0285fbf for the past year. I just rebased it to address @d683ddbb7272a179da3918cc4f922d92a2195ba2; Illumos bug 5314 doesn't look like it has anything to do with this diff issue.

Let me summarize my (certainly incomplete) understanding of the issue here. I apologize for what is probably incorrect usage of terminology.

  1. write_free_diffs iterates over objects which are being removed, calling describe_free on each object to ultimately return the filenames
  2. zfs_ioc_next_obj is used to advance each iterative step
  3. zfs_ioc_next_obj calls dmu_object_next after some locking is performed on the dataset
  4. dmu_object_next calls dnode_next_offset to actually iterate over the "dnodes" (probably using this term wrong here, sorry)

Per the documentation for dnode_next_offset in dnode.c, the "txg" parameter specifies a lower bound on which transaction the dnode can be found in. We are interested in all dnodes that are removed between the first and last transaction in the snapshot. It didn't need to be created in that snapshot to correspond to a removed file.

In fact, the behavior of zfs diff in the test case exactly matches this: the transaction that created the data that was deleted in snapshot "2" was produced before, in snapshot "1", definitely predating the first transaction in snapshot "2".

If my read of this is correct, it's somewhat of a miracle that any useful information is being extracted from zfs diff: only files modified inside of the transaction they are deleted in will have a transaction post-dating the first transaction of the snapshot, and therefore appear in the diff list.

Grepping the zfs source tree, zfs_ioc_next_obj is only used by write_free_diffs. The change in the patch I mentioned above should therefore not conflict with anything (can someone who know more please tell me if zfs_ioc_next_obj is consumed by any other interface?).

Furthermore, I think the severity of this bug may be understated: if you are relying on zfs diff to determine if snapshots are safe to be deleted, by checking to see if any files are removed in that snapshot, you will miss files. A worst-case scenario could lead to data loss because of snapshot removals.

@behlendorf
Copy link
Contributor

@aerusso first off thanks for digging in to this. Quite honestly no else I'm aware of has had the time to make it a priority. I suspect zfs diff is one of those fairly unknown and even less used features.

As for your analysis it looks very reasonable to me and this is absolutely something which I think could have gone overlooked. The zfs_ioc_next_obj ioctl() entry point in to the kernel is only ever used by zfs diff so no other commands should be impacted.

Since you've been running with this change for a year now I'm guess it did resolve the issue for you? And specifically it does address the known reproducers which have been posted?

If so it would be very helpful if you could open a new pull request with the proposed fix. That way we may be able to get some additional eyes to review the patch, and we can properly attribute the fix to you when it gets merged. When you do so it would be best if you could include a detailed description in the commit message, just like you posted to this issue. Thanks for looking in to this!

@dweeezil
Copy link
Contributor

FWIW, I looked into this over a year ago and came to essentially the same conclusion as @aerusso did, but for whatever reason, the issue totally fell off my radar. I also agree that zfs diff is likely somewhat rarely used and its API is just a wee bit strange, too.

@aerusso
Copy link
Contributor Author

aerusso commented May 13, 2015

I had a plan to look at some datasets using zfs diff, but used up my energy tracking this issue down (and then abandoned the project). This patch resolves the only two test cases I am aware of, those in this thread.

dasjoe pushed a commit to dasjoe/zfs that referenced this issue May 24, 2015
Per the documentation for dnode_next_offset in dnode.c, the "txg"
parameter specifies a lower bound on which transaction the dnode can
be found in. We are interested in all dnodes that are removed between
the first and last transaction in the snapshot. It doesn't need to be
created in that snapshot to correspond to a removed file.

In fact, the behavior of zfs diff in the test case exactly matches
this: the transaction that created the data that was deleted in snapshot
"2" was produced before, in snapshot "1", definitely predating the first
transaction in snapshot "2".

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <Tim Chase <tim@onlight.com>
Closes openzfs#2081
@chani
Copy link

chani commented Jan 29, 2016

I think I am running into the same issue; I am running a few containers (OpenVZ) ontop of ZFS (simfs, not ploop/zvol). I wrote a (more or less) simple script which would create snapshots if there have been changes. For that functionality I do use zfs diff a lot.

My argumentation is: Why should I keep/create/have/hold a snapshot if it does not contain any valuable information? Even if I do have a snapshot of last week 14:00 - I do not need that snapshot if the same data is available in the snapshot of last week 12:00. You need a way to check if there have been any modifications (valuable data as I call it) between the last snapshot and the current state of the dataset. zfs diff is the answer. And since zfs diff is - apart from running some sort of hash script all the time or using inotifywait - the only useful way to detect such changes/modifications I wouldn't say that this is a rarely used function. I might be totally wrong, though. Just my two cents.

Back the the problem itself:
Unable to determine path or stats for object 1015392 in storage/containers/15@recently-2016-01-29170627: No such file or directory

Then I did restart the container and tried again:
Unable to determine path or stats for object 1015733 in storage/containers/15@recently-2016-01-29171312: No such file or directory

Then I did stop the container
Unable to determine path or stats for object 1012181 in storage/containers/15@recently-2016-01-29171510: No such file or directory

Then once again, stopped
message disappeared.

The interesting part is, that this does only happen for this one container and the object id/number is always differently. I am not sure if this is really an issue or problem at all. The Container is a mail filtering gateway which means there are lots of temporary files. So my question would be: Is this message harmful or can I simply ignore the message?

@behlendorf
Copy link
Contributor

@chani there was a recent zfs diff fix regarding moving files which could be related to your issue. Could you try applying commit 616a57b to your version, or using the source from the master branch.

As for the message I don't believe it's harmful but it should be happening either, so that needs to be understood.

@rincebrain
Copy link
Contributor

rincebrain commented Sep 23, 2016

Just wanted to comment, I'm on 0.6.5.6 at the moment and just ran into this on a pool of mine.

# zfs diff monolith/unsorted/incoming@periodic-20160{821,822}|zfilt>/dev/null

Unable to determine path or stats for object 133267 in monolith/unsorted/incoming@periodic-20160821: No such file or directory
 # find /monolith/unsorted/incoming/.zfs/snapshot/periodic-20160821/ -inum 133267
/monolith/unsorted/incoming/.zfs/snapshot/periodic-20160821/MEMORY.DMP
# find /monolith/unsorted/incoming/.zfs/snapshot/periodic-20160822/ -inum 133267
#
# zdb -dddd monolith/unsorted/incoming@periodic-20160821 133627
Dataset monolith/unsorted/incoming@periodic-20160821 [ZPL], ID 342066, cr_txg 7505469, 536G, 87745 objects, rootbp DVA[0]=<1:38e86201000:1000> DVA[1]=<2:24f67fca000:1000> [L0 DMU objset] fletcher4 uncompressed LE contiguous unique double size=800L/800P birth=7499609L/7499609P fill=87745 cksum=df65aa6c7:10818d651be6:bc803582a21ed:634940aa7530d43

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
zdb: dmu_bonus_hold(133627) failed, errno 2
#

edit: Sorry, I had presumed that 616a57b would have made it into 0.6.5-release between January and now, but it doesn't seem it did. I'll try pulling it in when I update to 0.6.5.8 and see if the issue persists.

@rincebrain
Copy link
Contributor

Having applied 616a57b to 0.6.5.8, I still get the same output as in the previous comment for zfs diff and zdb -dddd.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants