Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rollback must invalidate mmap'd pages #2186

Closed
alexfouche opened this issue Mar 13, 2014 · 10 comments
Closed

Rollback must invalidate mmap'd pages #2186

alexfouche opened this issue Mar 13, 2014 · 10 comments
Labels
Component: Memory Management kernel memory management
Milestone

Comments

@alexfouche
Copy link

Rollback does not always rollback

Using MongoDB (i believe it is mmaped), i update some records, then fsync+lock, then issue a zfs snapshot, then unlock DB. I also tried with process stop and restart instead of fsync+lock. Repeat multiple times

When i issue a zfs rollback with or without -r depending of the rollback i try to restore, i can still see the newer DB records. DB file mdate also shows files are not restored. When i look into .zfs/snapshot/... DB files, i can see that they are correct for the point in time the snapshot was made.

If instead i was not using MongoDB, but simply touching or overwriting some files from shell commands, rollback works fine. But if at some point i start MongoDB, stop it, and try to rollback, then rollback does not rollback. Filesystem is not busy (tested with lsof), and zfs rollback -r shows no error

I tried with explicitely unmounting the zfs filesystem, then rollback, then mounting, and this works. I believe then the issue is about auto-remount when restoring a snapshot. In my case, there was never anything keeping the filesystem busy (tested with lsof), yet restored snapshot would not show

Maybe this is related to #1214.

Here are steps to reproduce:

now=`date +%F_%H:%M:%S`

# Insert data
mongo db2 --quiet <<EOF
    db.now.drop()
    db.now.save({now:"$now"})
EOF

# Stop or fsync+lock
service mongod stop
sleep 3

# Do snapshot
zfs snapshot zfs/mongo_data@$now

# restart for next insert
service mongod start
sleep 1

# REPEAT AGAIN
pwd
/zfs

service mongod stop
Stopping mongod:                                           [  OK  ]

lsof -nP |grep mongo_data
# <- no output, no open files on this filesystem

zfs list -t snapshot
NAME                                 USED  AVAIL  REFER  MOUNTPOINT
zfs/mongo_data@2014-03-13_09:02:29   180K      -   228K  -
zfs/mongo_data@2014-03-13_09:02:37   184K      -   232K  -
zfs/mongo_data@2014-03-13_09:02:47   184K      -   232K  -
zfs/mongo_data@2014-03-13_09:02:54   120K      -   232K  -

# Current value in DB
strings /zfs/mongo_data/db2.* |grep 2014
2014-03-13_09:02:54

# Restore snapshot
zfs rollback -r zfs/mongo_data@2014-03-13_09:02:47
#  <- no output, no error

# Current value in DB is NOT RESTORED !
strings /zfs/mongo_data/db2.* |grep 2014
2014-03-13_09:02:54  # <- OLD VALUE !

# What is in the snapshot
strings /zfs/mongo_data/.zfs/snapshot/2014-03-13_09\:02\:47/db2.* |grep 2014
2014-03-13_09:02:47  # <- yet the snapshot has the correct point in time value

# Explicitely umount+mount, even afterwards solves issue
zfs umount zfs/mongo_data
zfs mount zfs/mongo_data
strings /zfs/mongo_data/db2.* |grep 2014
2014-03-13_09:02:47  # <- FINALLY MY SNAPSHOT WAS CORRECTLY RESTORED !
@alexfouche
Copy link
Author

I forgot to mention i am on a Amazon linux instance (Centos 6.x)

Linux zfs01test.domain 3.4.82-69.112.amzn1.x86_64 #1 SMP Mon Feb 24 16:31:21 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

rpm -q zfs spl
zfs-0.6.2-1.el6.x86_64
spl-0.6.2-1.el6.x86_64

lsmod
Module                  Size  Used by
zfs                  1118558  2 
zcommon                40129  1 zfs
znvpair                69965  2 zfs,zcommon
zavl                    6237  1 zfs
zunicode              322780  1 zfs
spl                   153613  5 zfs,zcommon,znvpair,zavl,zunicode
zlib_deflate           21461  1 spl
(...)

@maxximino
Copy link
Contributor

Try this after the rollback:
echo 3 >/proc/sys/vm/drop_caches
and see if it fixes the problem.
This can be an important information to find the root cause of the problem.

@ryao
Copy link
Contributor

ryao commented Mar 14, 2014

@alexfouche This is a cache coherence issue. Basically, mmap() currently goes through the page cache and rollback does not invalidate it. The double caching between ARC and the page cache is a hack that we inherited from Solaris. The plan is to kill it in a future release, but the future release that sees this happen could be several months away.

In the mean time, @maxximino's suggestion might help, although you probably want to run that both before and after rollback to be safe. Also, echoing 3 into drop_caches is unnecessary. Echoing 1 into drop_caches is sufficient to flush the second page. A second possibility is to try to configure mongodb to avoid mmap() should it support that. If you try this, make sure you don't switch to AIO, which also goes through the page cache.

That being said, let me make a note to other contributors. We should be able to modify ZFS rollback to attempt to invalidate the page cache on any unmapped open files until mmap() has been reworked to use ARC directly. That should tackle the coherence issue. Also, it would be interesting to know if this affects other ZFS platforms. This bug might not be exclusive to us.

@alexfouche
Copy link
Author

I retested and indeed doing a echo 3 >/proc/sys/vm/drop_caches after the rollback solved the problem

@alexfouche
Copy link
Author

@ryao
not sure if you took that into account into your pull request, but i saw on Sysctl documentation that one has to do a sync before dropping cache.

@behlendorf behlendorf added this to the 0.7.0 milestone Oct 31, 2014
@behlendorf behlendorf added the Component: Memory Management kernel memory management label Oct 31, 2014
@behlendorf behlendorf changed the title Rollback does not always rollback Rollback must invalidate mmap'd pages Mar 25, 2016
@behlendorf behlendorf modified the milestones: 0.8.0, 0.7.0 Jul 15, 2016
@behlendorf
Copy link
Contributor

Closing. The solution mentioned above is recommended and things have been further improved by commit 8614ddf.

@4Ykw
Copy link

4Ykw commented Sep 18, 2022

Hi there,

I have encountered a problem with a similar situation with a memory-mapped file, where the app could still find non-rolled back data in memory, and this solution worked out.

I am using the ubuntu 20 version:

zfs --version
zfs-0.8.3-1ubuntu12.14
zfs-kmod-2.1.4-0ubuntu0.1

Is there a way to create zfs pools/vols that mandatory invalidate caches specifically for the volumes/pools in question for this reason? if this could be a property that will bring peace of mind for most problems like this.

Or, is there any version on the v2 stream that has this sorted =) and I can give it a try.

Cheers

@ryao
Copy link
Contributor

ryao commented Sep 18, 2022

@alexfouche This is a super late reply, but all of the dirty data is safely in ARC, so there is no safety issue from invalidating the page cache.

@4Ykw Please file a new issue for the regression. It will not get much attention in an old closed issue.

@ryao
Copy link
Contributor

ryao commented Sep 18, 2022

Is there a way to create zfs pools/vols that mandatory invalidate caches specifically for the volumes/pools in question for this reason?

It already invalidates as needed. In the new issue, please include information on how to reproduce this.

@behlendorf
Copy link
Contributor

This may be related to #13608

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Memory Management kernel memory management
Projects
None yet
Development

No branches or pull requests

5 participants