Rollback must invalidate mmap'd pages #2186

alexfouche · 2014-03-13T09:19:01Z

Rollback does not always rollback

Using MongoDB (i believe it is mmaped), i update some records, then fsync+lock, then issue a zfs snapshot, then unlock DB. I also tried with process stop and restart instead of fsync+lock. Repeat multiple times

When i issue a zfs rollback with or without -r depending of the rollback i try to restore, i can still see the newer DB records. DB file mdate also shows files are not restored. When i look into .zfs/snapshot/... DB files, i can see that they are correct for the point in time the snapshot was made.

If instead i was not using MongoDB, but simply touching or overwriting some files from shell commands, rollback works fine. But if at some point i start MongoDB, stop it, and try to rollback, then rollback does not rollback. Filesystem is not busy (tested with lsof), and zfs rollback -r shows no error

I tried with explicitely unmounting the zfs filesystem, then rollback, then mounting, and this works. I believe then the issue is about auto-remount when restoring a snapshot. In my case, there was never anything keeping the filesystem busy (tested with lsof), yet restored snapshot would not show

Maybe this is related to #1214.

Here are steps to reproduce:

now=`date +%F_%H:%M:%S`

# Insert data
mongo db2 --quiet <<EOF
    db.now.drop()
    db.now.save({now:"$now"})
EOF

# Stop or fsync+lock
service mongod stop
sleep 3

# Do snapshot
zfs snapshot zfs/mongo_data@$now

# restart for next insert
service mongod start
sleep 1

# REPEAT AGAIN

pwd
/zfs

service mongod stop
Stopping mongod:                                           [  OK  ]

lsof -nP |grep mongo_data
# <- no output, no open files on this filesystem

zfs list -t snapshot
NAME                                 USED  AVAIL  REFER  MOUNTPOINT
zfs/mongo_data@2014-03-13_09:02:29   180K      -   228K  -
zfs/mongo_data@2014-03-13_09:02:37   184K      -   232K  -
zfs/mongo_data@2014-03-13_09:02:47   184K      -   232K  -
zfs/mongo_data@2014-03-13_09:02:54   120K      -   232K  -

# Current value in DB
strings /zfs/mongo_data/db2.* |grep 2014
2014-03-13_09:02:54

# Restore snapshot
zfs rollback -r zfs/mongo_data@2014-03-13_09:02:47
#  <- no output, no error

# Current value in DB is NOT RESTORED !
strings /zfs/mongo_data/db2.* |grep 2014
2014-03-13_09:02:54  # <- OLD VALUE !

# What is in the snapshot
strings /zfs/mongo_data/.zfs/snapshot/2014-03-13_09\:02\:47/db2.* |grep 2014
2014-03-13_09:02:47  # <- yet the snapshot has the correct point in time value

# Explicitely umount+mount, even afterwards solves issue
zfs umount zfs/mongo_data
zfs mount zfs/mongo_data
strings /zfs/mongo_data/db2.* |grep 2014
2014-03-13_09:02:47  # <- FINALLY MY SNAPSHOT WAS CORRECTLY RESTORED !

The text was updated successfully, but these errors were encountered:

alexfouche · 2014-03-13T09:41:43Z

I forgot to mention i am on a Amazon linux instance (Centos 6.x)

Linux zfs01test.domain 3.4.82-69.112.amzn1.x86_64 #1 SMP Mon Feb 24 16:31:21 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

rpm -q zfs spl
zfs-0.6.2-1.el6.x86_64
spl-0.6.2-1.el6.x86_64

lsmod
Module                  Size  Used by
zfs                  1118558  2 
zcommon                40129  1 zfs
znvpair                69965  2 zfs,zcommon
zavl                    6237  1 zfs
zunicode              322780  1 zfs
spl                   153613  5 zfs,zcommon,znvpair,zavl,zunicode
zlib_deflate           21461  1 spl
(...)

maxximino · 2014-03-13T21:22:27Z

Try this after the rollback:
echo 3 >/proc/sys/vm/drop_caches
and see if it fixes the problem.
This can be an important information to find the root cause of the problem.

ryao · 2014-03-14T23:01:10Z

@alexfouche This is a cache coherence issue. Basically, mmap() currently goes through the page cache and rollback does not invalidate it. The double caching between ARC and the page cache is a hack that we inherited from Solaris. The plan is to kill it in a future release, but the future release that sees this happen could be several months away.

In the mean time, @maxximino's suggestion might help, although you probably want to run that both before and after rollback to be safe. Also, echoing 3 into drop_caches is unnecessary. Echoing 1 into drop_caches is sufficient to flush the second page. A second possibility is to try to configure mongodb to avoid mmap() should it support that. If you try this, make sure you don't switch to AIO, which also goes through the page cache.

That being said, let me make a note to other contributors. We should be able to modify ZFS rollback to attempt to invalidate the page cache on any unmapped open files until mmap() has been reworked to use ARC directly. That should tackle the coherence issue. Also, it would be interesting to know if this affects other ZFS platforms. This bug might not be exclusive to us.

alexfouche · 2014-03-18T10:22:45Z

I retested and indeed doing a echo 3 >/proc/sys/vm/drop_caches after the rollback solved the problem

alexfouche · 2014-03-18T10:27:53Z

@ryao
not sure if you took that into account into your pull request, but i saw on Sysctl documentation that one has to do a sync before dropping cache.

behlendorf · 2017-07-08T00:48:26Z

Closing. The solution mentioned above is recommended and things have been further improved by commit 8614ddf.

4Ykw · 2022-09-18T12:30:08Z

Hi there,

I have encountered a problem with a similar situation with a memory-mapped file, where the app could still find non-rolled back data in memory, and this solution worked out.

I am using the ubuntu 20 version:

zfs --version
zfs-0.8.3-1ubuntu12.14
zfs-kmod-2.1.4-0ubuntu0.1

Is there a way to create zfs pools/vols that mandatory invalidate caches specifically for the volumes/pools in question for this reason? if this could be a property that will bring peace of mind for most problems like this.

Or, is there any version on the v2 stream that has this sorted =) and I can give it a try.

Cheers

ryao · 2022-09-18T17:55:02Z

@alexfouche This is a super late reply, but all of the dirty data is safely in ARC, so there is no safety issue from invalidating the page cache.

@4Ykw Please file a new issue for the regression. It will not get much attention in an old closed issue.

ryao · 2022-09-18T21:14:27Z

Is there a way to create zfs pools/vols that mandatory invalidate caches specifically for the volumes/pools in question for this reason?

It already invalidates as needed. In the new issue, please include information on how to reproduce this.

behlendorf · 2022-09-19T16:53:43Z

This may be related to #13608

ryao mentioned this issue Mar 16, 2014

Invalidate Linux buffer cache when flushing zvols #2176

Closed

behlendorf added Bug - Minor labels Oct 31, 2014

behlendorf added this to the 0.7.0 milestone Oct 31, 2014

behlendorf added the Component: Memory Management kernel memory management label Oct 31, 2014

behlendorf changed the title ~~Rollback does not always rollback~~ Rollback must invalidate mmap'd pages Mar 25, 2016

behlendorf modified the milestones: 0.8.0, 0.7.0 Jul 15, 2016

behlendorf removed Bug - Minor labels Sep 30, 2016

behlendorf closed this as completed Jul 8, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rollback must invalidate mmap'd pages #2186

Rollback must invalidate mmap'd pages #2186

alexfouche commented Mar 13, 2014

alexfouche commented Mar 13, 2014

maxximino commented Mar 13, 2014

ryao commented Mar 14, 2014

alexfouche commented Mar 18, 2014

alexfouche commented Mar 18, 2014

behlendorf commented Jul 8, 2017

4Ykw commented Sep 18, 2022 •

edited

Loading

ryao commented Sep 18, 2022 •

edited

Loading

ryao commented Sep 18, 2022

behlendorf commented Sep 19, 2022

Rollback must invalidate mmap'd pages #2186

Rollback must invalidate mmap'd pages #2186

Comments

alexfouche commented Mar 13, 2014

alexfouche commented Mar 13, 2014

maxximino commented Mar 13, 2014

ryao commented Mar 14, 2014

alexfouche commented Mar 18, 2014

alexfouche commented Mar 18, 2014

behlendorf commented Jul 8, 2017

4Ykw commented Sep 18, 2022 • edited Loading

ryao commented Sep 18, 2022 • edited Loading

ryao commented Sep 18, 2022

behlendorf commented Sep 19, 2022

4Ykw commented Sep 18, 2022 •

edited

Loading

ryao commented Sep 18, 2022 •

edited

Loading