Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2api: fsstress crashes #372

Closed
rfjakob opened this issue Sep 18, 2020 · 4 comments
Closed

v2api: fsstress crashes #372

rfjakob opened this issue Sep 18, 2020 · 4 comments

Comments

@rfjakob
Copy link
Contributor

rfjakob commented Sep 18, 2020

These are pretty hard to reproduce, but they do happen after hours of running fsstress in a loop. We seem to have a race in the LOOKUP/FORGET reference counting. I'm working on finding the race, and will use this ticket to track progress.

I can repro both in loopback and also in gocryptfs (v2api branch). I had two failure modes so far:

2020/09/15 23:36:23 unknown node 3245
panic: unknown node 3245

goroutine 13312812 [running]:
log.Panicf(0x55b2fe1082ac, 0xf, 0xc0003c7db8, 0x1, 0x1)
	log/log.go:358 +0xc2
github.com/hanwen/go-fuse/v2/fs.(*rawBridge).inode(0xc0000fa5a0, 0xcad, 0x0, 0x0, 0x0)
	github.com/hanwen/go-fuse/v2@v2.0.4-0.20200908172753-0b6cbc515082/fs/bridge.go:292 +0x17d
github.com/hanwen/go-fuse/v2/fs.(*rawBridge).GetXAttr(0xc0000fa5a0, 0xc0008621e0, 0xc00097d158, 0xc0031e2b40, 0x13, 0x0, 0x0, 0x0, 0xc0000da0b8)
[...]

The FUSE operation in the backtrace varies.

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x56021a70cf2a]

goroutine 6044480 [running]:
github.com/hanwen/go-fuse/v2/fs.(*rawBridge).releaseFileEntry(0xc0001725a0, 0x1763, 0x25, 0x0, 0x0)
	github.com/hanwen/go-fuse/v2@v2.0.4-0.20200908172753-0b6cbc515082/fs/bridge.go:775 +0xba
github.com/hanwen/go-fuse/v2/fs.(*rawBridge).Release(0xc0001725a0, 0xc001190360, 0xc000a86a98)
	github.com/hanwen/go-fuse/v2@v2.0.4-0.20200908172753-0b6cbc515082/fs/bridge.go:734 +0x6c
github.com/hanwen/go-fuse/v2/fuse.doRelease(0xc0001c2000, 0xc000a86900)
	github.com/hanwen/go-fuse/v2@v2.0.4-0.20200908172753-0b6cbc515082/fuse/opcode.go:379 +0x50
[...]

fsstress logs for reference (does not include FUSE debug output):
fsstress-loopback.log
fsstress-gocryptfs.log

@rfjakob
Copy link
Contributor Author

rfjakob commented Sep 23, 2020

I had collect-crashes.sh running overnight with FUSE debug output.

It caught eight crashes, logfiles tar'ed up here:
fsstress.log.2020-09-23.tar.gz

This is the go-fuse commit being tested: rfjakob@f61e9a6 . Only some debug output added compared to master.

@rfjakob
Copy link
Contributor Author

rfjakob commented Sep 24, 2020

Tracked it down a little bit. Seems like the kernel does not like it when MKDIR hands out a recycled inode number before the previous RMDIR finishes. There is an extra FORGET 1 that causes a lookupCount underflow.

*03:09:08.680484 rx 553441476: FORGET i581 {Nlookup=4}  req=0xc000290900
*03:09:08.680490 removeRef: i581 lookupCount=0 nl=0 n=0xc0001aae00
*03:09:08.680493 -- 553441476:     done (no reply)
*03:09:09.694282 rx 553501112: MKDIR i307 {0777 (00)} ["d12XXXXXXXXXXXXX"] 17b req=0xc000117680
*03:09:09.694350 addNewChild: i581 lookupCount=1 nl=1 addr=0xc0002f25b0
*03:09:09.694376 tx 553501112:     OK, {i581 g1 tE=1s tA=1s {M040755 SZ=4096 L=2 1026:1026 B8*4096 i0:581 A 1600909749.692894 M 1600909749.692894 C 1600909749.692894}}
*03:09:09.699302 rx 553501486: RMDIR i307 ["d12XXXXXXXXXXXXX"] 17b req=0xc0001f4480
*03:09:09.699492 rx 553501516: MKDIR i214 {0777 (00)} ["d22XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"] 72b req=0xc00033efc0
*03:09:09.699539 addNewChild: i581 lookupCount=2 nl=2 addr=0xc0002f25b0
*03:09:09.699550 tx 553501516:     OK, {i581 g1 tE=1s tA=1s {M040755 SZ=4096 L=2 1026:1026 B8*4096 i0:581 A 1600909749.697894 M 1600909749.697894 C 1600909749.697894}}
*03:09:09.699724 rx 553501564: LOOKUP i214 ["d22XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"] 72b req=0xc000c77680
*03:09:09.699751 tx 553501564:     OK, {i581 g1 tE=1s tA=1s {M040755 SZ=4096 L=2 1026:1026 B8*4096 i0:581 A 1600909749.697894 M 1600909749.697894 C 1600909749.697894}}
*03:09:09.699813 rx 553501578: READDIRPLUS i214 {Fh 46 [0 +4096)  L 0 NONBLOCK,DIRECTORY,0x8000}  req=0xc000c77680
*03:09:09.699902 addNewChild: i581 lookupCount=4 nl=4 addr=0xc0002f25b0
*03:09:09.699959 tx 553501578:     OK,  2408b data "\x00\x00\x00\x00\x00\x00\x00\x00"...
*03:09:09.700010 rx 553501604: FORGET i581 {Nlookup=1}  req=0xc000116000
*03:09:09.700016 -- 553501604:     done (no reply)
*03:09:09.699348 tx 553501486:     OK
*03:09:09.701691 rx 553501722: FORGET i581 {Nlookup=4}  req=0xc0003158c0
*03:09:09.701695 ino 581 lookupCount underflow: lookupCount=3, decrement=4

@hanwen
Copy link
Owner

hanwen commented Sep 24, 2020

why is it handing out a recycled number? Is this the loopback FS ?

@rfjakob
Copy link
Contributor Author

rfjakob commented Sep 24, 2020

Yes loopback on ext4, and ext4 recycles inode numbers

rfjakob added a commit to rfjakob/go-fuse that referenced this issue Oct 3, 2020
Survived 12 hours of fsstress testing.
Fixes hanwen#372 .

BUG: Leaks memory (grew to 12GB RSS during fsstress testing).

Change-Id: Ibb36a886f15d48727daa10b9717ea88e45a6b8af
rfjakob added a commit to rfjakob/gocryptfs that referenced this issue Oct 3, 2020
Also add a wrapper script, fsstress.collect-crashes.sh, to collect
the debug output.

hanwen/go-fuse#372
rfjakob added a commit to rfjakob/go-fuse that referenced this issue Oct 10, 2020
Using the inode numbers as the nodeid causes problems
when the fs reuses inode numbers. This is the case with
any overlay filesystem that is backed by ext4 like the
loopback example or gocryptfs.

We already had the expSleep() and re-add hack,

	7090b02 fs: wait out apparent inode type change
	68f7052 fs: addNewChild(): handle concurrent FORGETs

to mitigate some of the problems (at the risk of deadlocking
forever), but I could not find a way to work around the following
case uncovered by fsstress:

The kernel expects a fresh nodeid from MKDIR (see hanwen#372 for a
debug log).

This is now guaranteed by passing the O_EXCL to addNewChild().
However, this also means that the hard link detection must
happen in addNewChild() as opposed to newInodeUnlocked()
before. The expSleep and re-add hacks are no longer needed
and have been dropped.

This survived 24 hours (42587 iterations) of fsstress testing.
Tested was the loopback example running on top of ext4 on
Linux 5.8.10.

Fixes hanwen#372 .

Change-Id: Ibb36a886f15d48727daa10b9717ea88e45a6b8af
rfjakob added a commit to rfjakob/go-fuse that referenced this issue Oct 13, 2020
Using the inode numbers as the nodeid causes problems
when the fs reuses inode numbers. This is the case with
any overlay filesystem that is backed by ext4 like the
loopback example or gocryptfs.

We already had the expSleep() and re-add hack,

	7090b02 fs: wait out apparent inode type change
	68f7052 fs: addNewChild(): handle concurrent FORGETs

to mitigate some of the problems (at the risk of deadlocking
forever), but I could not find a way to work around the following
case uncovered by fsstress:

The kernel expects a fresh nodeid from MKDIR (see hanwen#372 for a
debug log).

This is now guaranteed by passing the O_EXCL to addNewChild().
However, this also means that the hard link detection must
happen in addNewChild() as opposed to newInodeUnlocked()
before. The expSleep and re-add hacks are no longer needed
and have been dropped.

This survived 24 hours (42587 iterations) of fsstress testing.
Tested was the loopback example running on top of ext4 on
Linux 5.8.10.

Fixes hanwen#372 .

Change-Id: Ibb36a886f15d48727daa10b9717ea88e45a6b8af
rfjakob added a commit to rfjakob/go-fuse that referenced this issue Oct 15, 2020
Using the inode numbers as the nodeid causes problems
when the fs reuses inode numbers. This is the case with
any overlay filesystem that is backed by ext4 like the
loopback example or gocryptfs.

We already had the expSleep() and re-add hack,

	7090b02 fs: wait out apparent inode type change
	68f7052 fs: addNewChild(): handle concurrent FORGETs

to mitigate some of the problems (at the risk of deadlocking
forever), but I could not find a way to work around the following
case uncovered by fsstress:

The kernel expects a fresh nodeid from MKDIR (see hanwen#372 for a
debug log).

This is now guaranteed by passing the O_EXCL to addNewChild().
However, this also means that the hard link detection must
happen in addNewChild() as opposed to newInodeUnlocked()
before. The expSleep and re-add hacks are no longer needed
and have been dropped.

This survived 24 hours (42587 iterations) of fsstress testing.
Tested was the loopback example running on top of ext4 on
Linux 5.8.10.

Fixes hanwen#372 .

v2: Rename inoMap -> stableAttrs, nodeidMap -> kernelNodeIds
acc. to feedback from Han-Wen.

Change-Id: Ibb36a886f15d48727daa10b9717ea88e45a6b8af
@hanwen hanwen closed this as completed in 1d0096e Nov 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants