Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic: "Unknown opcode 2016" - only on GCP (GKE) #276

Closed
ryanlamore opened this issue Mar 27, 2019 · 23 comments
Closed

panic: "Unknown opcode 2016" - only on GCP (GKE) #276

ryanlamore opened this issue Mar 27, 2019 · 23 comments

Comments

@ryanlamore
Copy link
Contributor

TL;DR
Getting this while running examples/loopback in GKE:
`$ fusermount -u mount
$ ./main mount data&
[1] 4469
$ Mounted!
cat data/test.txt
hi
$ cat mount/test.txt
03:44:30.074840 Unknown opcode 2016
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x4f21cc]

goroutine 1 [running]:
github.com/hanwen/go-fuse/fuse.(*request).serializeHeader(0xc0000c6000, 0x0, 0xc00007dc60, 0xc00007dc68, 0x56b620)
/home/rlamore/dev/git/other/go-fuse/fuse/request.go:220 +0x1c
github.com/hanwen/go-fuse/fuse.(*Server).write(0xc0000b6000, 0xc0000c6000, 0x0)
/home/rlamore/dev/git/other/go-fuse/fuse/server.go:494 +0x7f
github.com/hanwen/go-fuse/fuse.(*Server).handleRequest(0xc0000b6000, 0xc0000c6000, 0xc0000c6000)
/home/rlamore/dev/git/other/go-fuse/fuse/server.go:462 +0xb3
github.com/hanwen/go-fuse/fuse.(*Server).loop(0xc0000b6000, 0x0)
/home/rlamore/dev/git/other/go-fuse/fuse/server.go:431 +0x18f
github.com/hanwen/go-fuse/fuse.(*Server).Serve(0xc0000b6000)
/home/rlamore/dev/git/other/go-fuse/fuse/server.go:359 +0x6d
main.main()
/home/rlamore/dev/git/other/go-fuse/example/loopback/main.go:112 +0x678
cat: mount/test.txt: Transport endpoint is not connected
cat: mount/test.txt: Transport endpoint is not connected
[1]+ Exit 2 ./main mount data
`

I've been using go-fuse for years now and it's been an extremely valuable and stable for me. So thank you for that! I'm working in GKE and can't get it to work. The container has all the correct permissions. I've used libfuse and jacobsa libraries and they appear to work without any issues. I've tried with go 1.9 and go 11 and it makes no difference. I realize this might be hard for you to reproduce unless you are setup in GKE. Let me know if there's any more information I can provide. I'm working to try to pinpoint the issue as well. I could try to provide instructions on how to reproduce with a free GKE account if that would be beneficial as well.

@rfjakob
Copy link
Contributor

rfjakob commented Mar 27, 2019

Hi! Looks like there are two things happening here:

(1) The kernel sends opcode 2016, which does not exist in the mainline FUSE protocol. What kernel version are you running (uname -a)? This would be harmless, but:

(2) go-fuse crashes on the unknown opcode. Crash is here

dataLength := r.handler.OutputSize
because r.handler == nil.
This will be fixed.

rfjakob added a commit to rfjakob/go-fuse that referenced this issue Mar 27, 2019
On unknown opcodes, r.handler is nil and r.status is ENOSYS.
Reverse the order and only look at r.handler when status == OK.

Fixes hanwen#276
@rfjakob
Copy link
Contributor

rfjakob commented Mar 27, 2019

Can you try this patch?

rfjakob@79d80db

@hanwen
Copy link
Owner

hanwen commented Mar 27, 2019

that is really odd. Can you post a dump that uses --debug ?

@hanwen
Copy link
Owner

hanwen commented Mar 27, 2019

I work at google, but never used GKE. I would be interested in a repro scenario, but I need more details.

@hanwen
Copy link
Owner

hanwen commented Mar 27, 2019

(especially a debug dump after the fix that rfjakob proposed.)

@hanwen hanwen closed this as completed in fd7328f Mar 27, 2019
@ryanlamore
Copy link
Contributor Author

This fix actually worked for me as long as I didn't have debug turned on. I created a PR: #27 that'll fix debug mode as well. There's just a few nil checks that were needed on InputDebug() and OutputDebug(). Let me know what you think.

I'll come back with the steps to reproduce in a bit as well. It is a little concerning that the 2016 op code is even coming in, right?

@hanwen
Copy link
Owner

hanwen commented Mar 27, 2019

yes, I can't make sense of 2016, and would be interested to see if the other fields are garbage too.

@ryanlamore
Copy link
Contributor Author

I printed them out and everything else seemed to be OK. I did have one implementation question though. I noticed in types.go
type InHeader struct {
Length uint32
Opcode int32
Unique uint64
NodeId uint64
Caller
Padding uint32
}
OpCode is a int32; however, in the spec, it's an uint32. Was there a specific reason that choice was made? This probably isn't the right forum to ask the question, but it's top of mind for me right now.
https://github.com/libfuse/libfuse/blob/master/include/fuse_kernel.h#L709 Thanks!

@hanwen
Copy link
Owner

hanwen commented Mar 27, 2019

oversight. cc423d1

@hanwen hanwen reopened this Apr 1, 2019
@hanwen
Copy link
Owner

hanwen commented Apr 1, 2019

Ryan, I double checked internally, and nobody had any sensible explanation for seeing '2016'. Could you provide a reproduction scenario, so I can file a proper bug report?

Thanks!

@ryanlamore
Copy link
Contributor Author

ryanlamore commented Apr 1, 2019 via email

@hanwen
Copy link
Owner

hanwen commented Apr 15, 2019

any news?

@hanwen
Copy link
Owner

hanwen commented Apr 24, 2019

ping? @ryanlamore

@ryanlamore
Copy link
Contributor Author

ryanlamore commented Apr 24, 2019 via email

@danopia
Copy link

danopia commented Apr 27, 2019

I just got Unknown opcode 2016 from the kernel on a Chromebook (Pixelbook). I think the same binary used to work on this computer, though I haven't run it in ages. Might be some Google kernel patch..

Linux localhost 4.4.171-15926-g798b963ebef9 #1 SMP PREEMPT Wed Mar 20 23:38:35 PDT 2019 x86_64 x86_64 x86_64 GNU/Linux

@rfjakob
Copy link
Contributor

rfjakob commented Apr 28, 2019

Where's the kernel source for this?

@hanwen
Copy link
Owner

hanwen commented Apr 28, 2019

@hanwen
Copy link
Owner

hanwen commented Apr 28, 2019

when did this happen? Near initialization of the FS, or during normal use?

2016 is 0x7e0, ie. 1111110000 binary. Maybe a bitmask?

@rfjakob
Copy link
Contributor

rfjakob commented Apr 28, 2019

@rfjakob
Copy link
Contributor

rfjakob commented Apr 28, 2019

@ryanlamore I think we don't need repro steps anymore ;)

@hanwen
Copy link
Owner

hanwen commented Apr 28, 2019

origin: https://android-review.googlesource.com/c/kernel/common/+/219870

looks like something that might handle case-insensitive paths.

Regardless, aside from the panic, there is nothing to do here. (I think we should not support this opcode for now.)

@hanwen hanwen closed this as completed Apr 28, 2019
@danopia
Copy link

danopia commented Apr 28, 2019

So the patch adding outlier opcode 2016 was merged for 2016. Classy.

For posterity, my crash was when cp tried writing into new files, leaving 0-byte files.

log
2019/04/27 16:49:50 fs: create uploads/2019-04-27-i82t7se2f.png flags 0x80c1 mode 0x81a4
2019/04/27 16:49:50 fs: open uploads/2019-04-27-i82t7se2f.png flags 0x80c1
2019/04/27 16:49:50 fs: get attributes on uploads/2019-04-27-i82t7se2f.png
2019/04/27 16:49:50 Unknown opcode 2016
2019/04/27 16:49:50 Unmounting...
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x4f1648]

goroutine 1 [running]:
github.com/hanwen/go-fuse/fuse.(*request).serializeHeader(0xc4201da000, 0x0, 0x11000, 0x28, 0x0)
        /home/dan/go/src/github.com/hanwen/go-fuse/fuse/request.go:201 +0x18
github.com/hanwen/go-fuse/fuse.(*Server).write(0xc4201d6000, 0xc4201da000, 0xc400000000)
        /home/dan/go/src/github.com/hanwen/go-fuse/fuse/server.go:434 +0x6e
github.com/hanwen/go-fuse/fuse.(*Server).handleRequest(0xc4201d6000, 0xc4201da000, 0xc4201da000)
        /home/dan/go/src/github.com/hanwen/go-fuse/fuse/server.go:407 +0xb5
github.com/hanwen/go-fuse/fuse.(*Server).loop(0xc4201d6000, 0x0)
        /home/dan/go/src/github.com/hanwen/go-fuse/fuse/server.go:376 +0x158
github.com/hanwen/go-fuse/fuse.(*Server).Serve(0xc4201d6000)
        /home/dan/go/src/github.com/hanwen/go-fuse/fuse/server.go:324 +0x59
main.main()
        /home/dan/go/src/github.com/stardustapp/core/utils/starfs/main.go:42 +0x512
cp: error writing '/mnt/stardust/uploads/2019-04-27-i82t7se2f.png': Transport endpoint is not connected
cp: failed to close '/mnt/stardust/uploads/2019-04-27-i82t7se2f.png': Transport endpoint is not connected

Recompiling my binary with go get -u fixed up the panic, looks good now. Thanks for the rapid attention 😄

emdem pushed a commit to emdem/go-fuse that referenced this issue Nov 21, 2019
Go-FUSE has been trailing the kernel, and the kernel (usually) is
careful to only send opcodes that are recognized. However, on GKE
(GCP) the undefined opcode 2016 has been seen, leading to a crash.

Fixes hanwen#276.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants