-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v2api: fsstress crashes #372
Comments
I had collect-crashes.sh running overnight with FUSE debug output. It caught eight crashes, logfiles tar'ed up here: This is the go-fuse commit being tested: rfjakob@f61e9a6 . Only some debug output added compared to master. |
Tracked it down a little bit. Seems like the kernel does not like it when MKDIR hands out a recycled inode number before the previous RMDIR finishes. There is an extra FORGET 1 that causes a lookupCount underflow.
|
why is it handing out a recycled number? Is this the loopback FS ? |
Yes loopback on ext4, and ext4 recycles inode numbers |
Survived 12 hours of fsstress testing. Fixes hanwen#372 . BUG: Leaks memory (grew to 12GB RSS during fsstress testing). Change-Id: Ibb36a886f15d48727daa10b9717ea88e45a6b8af
Also add a wrapper script, fsstress.collect-crashes.sh, to collect the debug output. hanwen/go-fuse#372
Using the inode numbers as the nodeid causes problems when the fs reuses inode numbers. This is the case with any overlay filesystem that is backed by ext4 like the loopback example or gocryptfs. We already had the expSleep() and re-add hack, 7090b02 fs: wait out apparent inode type change 68f7052 fs: addNewChild(): handle concurrent FORGETs to mitigate some of the problems (at the risk of deadlocking forever), but I could not find a way to work around the following case uncovered by fsstress: The kernel expects a fresh nodeid from MKDIR (see hanwen#372 for a debug log). This is now guaranteed by passing the O_EXCL to addNewChild(). However, this also means that the hard link detection must happen in addNewChild() as opposed to newInodeUnlocked() before. The expSleep and re-add hacks are no longer needed and have been dropped. This survived 24 hours (42587 iterations) of fsstress testing. Tested was the loopback example running on top of ext4 on Linux 5.8.10. Fixes hanwen#372 . Change-Id: Ibb36a886f15d48727daa10b9717ea88e45a6b8af
Using the inode numbers as the nodeid causes problems when the fs reuses inode numbers. This is the case with any overlay filesystem that is backed by ext4 like the loopback example or gocryptfs. We already had the expSleep() and re-add hack, 7090b02 fs: wait out apparent inode type change 68f7052 fs: addNewChild(): handle concurrent FORGETs to mitigate some of the problems (at the risk of deadlocking forever), but I could not find a way to work around the following case uncovered by fsstress: The kernel expects a fresh nodeid from MKDIR (see hanwen#372 for a debug log). This is now guaranteed by passing the O_EXCL to addNewChild(). However, this also means that the hard link detection must happen in addNewChild() as opposed to newInodeUnlocked() before. The expSleep and re-add hacks are no longer needed and have been dropped. This survived 24 hours (42587 iterations) of fsstress testing. Tested was the loopback example running on top of ext4 on Linux 5.8.10. Fixes hanwen#372 . Change-Id: Ibb36a886f15d48727daa10b9717ea88e45a6b8af
Using the inode numbers as the nodeid causes problems when the fs reuses inode numbers. This is the case with any overlay filesystem that is backed by ext4 like the loopback example or gocryptfs. We already had the expSleep() and re-add hack, 7090b02 fs: wait out apparent inode type change 68f7052 fs: addNewChild(): handle concurrent FORGETs to mitigate some of the problems (at the risk of deadlocking forever), but I could not find a way to work around the following case uncovered by fsstress: The kernel expects a fresh nodeid from MKDIR (see hanwen#372 for a debug log). This is now guaranteed by passing the O_EXCL to addNewChild(). However, this also means that the hard link detection must happen in addNewChild() as opposed to newInodeUnlocked() before. The expSleep and re-add hacks are no longer needed and have been dropped. This survived 24 hours (42587 iterations) of fsstress testing. Tested was the loopback example running on top of ext4 on Linux 5.8.10. Fixes hanwen#372 . v2: Rename inoMap -> stableAttrs, nodeidMap -> kernelNodeIds acc. to feedback from Han-Wen. Change-Id: Ibb36a886f15d48727daa10b9717ea88e45a6b8af
These are pretty hard to reproduce, but they do happen after hours of running fsstress in a loop. We seem to have a race in the LOOKUP/FORGET reference counting. I'm working on finding the race, and will use this ticket to track progress.
I can repro both in loopback and also in gocryptfs (v2api branch). I had two failure modes so far:
The FUSE operation in the backtrace varies.
fsstress logs for reference (does not include FUSE debug output):
fsstress-loopback.log
fsstress-gocryptfs.log
The text was updated successfully, but these errors were encountered: