Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Go programs stored on GCS via gcsfuse crash at launch #562

Closed
tetsuok opened this issue Oct 19, 2021 · 7 comments
Closed

Go programs stored on GCS via gcsfuse crash at launch #562

tetsuok opened this issue Oct 19, 2021 · 7 comments
Labels
bug Error or flaw in the code with unintended result p2 P2

Comments

@tetsuok
Copy link
Contributor

tetsuok commented Oct 19, 2021

When I run a Go program on the gcs filesystem using gcsfuse (commit 70695b5, the tip of master branch), the program crashes at launch. The gcsfuse and the Go program are executed on a Google Cloud instance. The program is compiled with go1.17.2. The reproducing steps are described here and here. The stack trace is shown here. I first filed the issue in golang/go#48997, @randall77 reproduced the issue on his Google Cloud instance as well.

Below is the output of /tmp/bin/gcsfuse -o rw -file-mode=777 -dir-mode=777 --debug_fuse --debug_gcs --foreground my_bucket ~/gcs, assuming we have a GCS bucket, my_bucket:

fuse_debug: 2021/10/19 00:59:12.611638 Op 0x0000003d        connection.go:416] <- GetInodeAttributes (inode 1, PID 2237)
fuse_debug: 2021/10/19 00:59:12.611839 Op 0x0000003d        connection.go:498] -> OK ()
fuse_debug: 2021/10/19 00:59:12.612150 Op 0x0000003e        connection.go:416] <- LookUpInode (parent 1, name "hello", PID 2237)
fuse_debug: 2021/10/19 00:59:12.612357 Op 0x0000003e        connection.go:498] -> OK (inode 2)
fuse_debug: 2021/10/19 00:59:13.999325 Op 0x0000003f        connection.go:416] <- LookUpInode (parent 1, name "hello", PID 16360)
fuse_debug: 2021/10/19 00:59:13.999586 Op 0x0000003f        connection.go:498] -> OK (inode 2)
fuse_debug: 2021/10/19 00:59:14.000139 Op 0x00000040        connection.go:416] <- OpenFile (inode 2, PID 16360)
fuse_debug: 2021/10/19 00:59:14.000333 Op 0x00000040        connection.go:498] -> OK ()
fuse_debug: 2021/10/19 00:59:14.002173 Op 0x00000041        connection.go:416] <- ReadFile (inode 2, PID 16360, handle 7, offset 458752, 77824 bytes)
gcs: 2021/10/19 00:59:14.002343 Req             0x1c: <- Read("hello", [458752, 1728598))
fuse_debug: 2021/10/19 00:59:14.016927 Op 0x00000042        connection.go:416] <- interrupt (fuseid 0x00000041)
2021/10/19 00:59:14.017177 Not retrying Read("hello", 1634536788431753) after error of type *url.Error ("Get \"https://www.googleapis.com:443/download/storage/v1/b/my_bucket/o/hello?alt=media&generation=1634536788431753\": net/http: request canceled"): &url.Error{Op:"Get", URL:"https://www.googleapis.com:443/download/storage/v1/b/my_bucket/o/hello?alt=media&generation=1634536788431753", Err:(*errors.errorString)(0xc000119380)}
gcs: 2021/10/19 00:59:14.017197 Req             0x1c: -> Read error: Get "https://www.googleapis.com:443/download/storage/v1/b/my_bucket/o/hello?alt=media&generation=1634536788431753": net/http: request canceled
2021/10/19 00:59:14.017212 ReadFile: operation canceled, fh.reader.ReadAt: readFull: Get "https://www.googleapis.com:443/download/storage/v1/b/my_bucket/o/hello?alt=media&generation=1634536788431753": net/http: request canceled
fuse_debug: 2021/10/19 00:59:14.017229 Op 0x00000041        connection.go:500] -> Error: "operation canceled"
fuse: 2021/10/19 00:59:14.017232 *fuseops.ReadFileOp error: operation canceled
fuse_debug: 2021/10/19 00:59:14.017325 Op 0x00000043        connection.go:416] <- ReadFile (inode 2, PID 16360, handle 7, offset 471040, 4096 bytes)
gcs: 2021/10/19 00:59:14.017479 Req             0x1c: -> Read error: Get "https://www.googleapis.com:443/download/storage/v1/b/my_bucket/o/hello?alt=media&generation=1634536788431753": net/http: request canceled
gcs: 2021/10/19 00:59:14.017490 Req             0x1c: -> Read("hello", [458752, 1728598)) (15.176564ms): OK
gcs: 2021/10/19 00:59:14.017497 Req             0x1d: <- Read("hello", [471040, 1728598))
fuse_debug: 2021/10/19 00:59:14.017646 Op 0x00000044        connection.go:416] <- interrupt (fuseid 0x00000043)
2021/10/19 00:59:14.017808 Not retrying Read("hello", 1634536788431753) after error of type *url.Error ("Get \"https://www.googleapis.com:443/download/storage/v1/b/my_bucket/o/hello?alt=media&generation=1634536788431753\": net/http: request canceled"): &url.Error{Op:"Get", URL:"https://www.googleapis.com:443/download/storage/v1/b/my_bucket/o/hello?alt=media&generation=1634536788431753", Err:(*errors.errorString)(0xc000119380)}
gcs: 2021/10/19 00:59:14.017821 Req             0x1d: -> Read error: Get "https://www.googleapis.com:443/download/storage/v1/b/my_bucket/o/hello?alt=media&generation=1634536788431753": net/http: request canceled
2021/10/19 00:59:14.017831 ReadFile: operation canceled, fh.reader.ReadAt: readFull: Get "https://www.googleapis.com:443/download/storage/v1/b/my_bucket/o/hello?alt=media&generation=1634536788431753": net/http: request canceled
fuse_debug: 2021/10/19 00:59:14.017846 Op 0x00000043        connection.go:500] -> Error: "operation canceled"
fuse: 2021/10/19 00:59:14.017853 *fuseops.ReadFileOp error: operation canceled
fuse_debug: 2021/10/19 00:59:14.020348 Op 0x00000045        connection.go:416] <- ReadFile (inode 2, PID 16360, handle 7, offset 798720, 4096 bytes)
gcs: 2021/10/19 00:59:14.020678 Req             0x1d: -> Read error: Get "https://www.googleapis.com:443/download/storage/v1/b/my_bucket/o/hello?alt=media&generation=1634536788431753": net/http: request canceled
gcs: 2021/10/19 00:59:14.020717 Req             0x1d: -> Read("hello", [471040, 1728598)) (3.219177ms): OK
gcs: 2021/10/19 00:59:14.020731 Req             0x1e: <- Read("hello", [798720, 1728598))
fuse_debug: 2021/10/19 00:59:14.045437 Op 0x00000045        connection.go:498] -> OK ()
fuse_debug: 2021/10/19 00:59:14.045559 Op 0x00000046        connection.go:416] <- ReadFile (inode 2, PID 16360, handle 7, offset 811008, 4096 bytes)
fuse_debug: 2021/10/19 00:59:14.045822 Op 0x00000046        connection.go:498] -> OK ()
fuse_debug: 2021/10/19 00:59:14.046366 Op 0x00000047        connection.go:416] <- ReleaseFileHandle (PID 0)
gcs: 2021/10/19 00:59:14.046561 Req             0x1e: -> Read("hello", [798720, 1728598)) (25.828651ms): OK
fuse_debug: 2021/10/19 00:59:14.046582 Op 0x00000047        connection.go:498] -> OK ()

Notes

  • The crash doesn't happen when I ran the Go program by setting GODEBUG=asyncpreemptoff=1 (disables the asynchronous preemption introduced in go1.14).
  • The crash doesn't happen when I put the Go program compiled with go1.13.15 or earlier in the gcs filesystem.
  • The crash doesn't happen when we put other binaries (e.g., /usr/bin/python3.7) in the gcs filesystem.
  • gcsfuse cancels the HTTP request to GCS to read the file on GCS when it receives the interrupt request by the kernel.
@tetsuok
Copy link
Contributor Author

tetsuok commented Oct 19, 2021

gcsfuse cancels the HTTP request to GCS to read the file on GCS when it receives the interrupt request by the kernel.

I guess the above based on the following log (extracted from the above log)

fuse_debug: 2021/10/19 00:59:14.002173 Op 0x00000041        connection.go:416] <- ReadFile (inode 2, PID 16360, handle 7, offset 458752, 77824 bytes)
gcs: 2021/10/19 00:59:14.002343 Req             0x1c: <- Read("hello", [458752, 1728598))
fuse_debug: 2021/10/19 00:59:14.016927 Op 0x00000042        connection.go:416] <- interrupt (fuseid 0x00000041)
2021/10/19 00:59:14.017177 Not retrying Read("hello", 1634536788431753) after error of type *url.Error ("Get \"https://www.googleapis.com:443/download/storage/v1/b/my_bucket/o/hello?alt=media&generation=1634536788431753\": net/http: request canceled"): &url.Error{Op:"Get", URL:"https://www.googleapis.com:443/download/storage/v1/b/my_bucket/o/hello?alt=media&generation=1634536788431753", Err:(*errors.errorString)(0xc000119380)}
gcs: 2021/10/19 00:59:14.017197 Req             0x1c: -> Read error: Get "https://www.googleapis.com:443/download/storage/v1/b/my_bucket/o/hello?alt=media&generation=1634536788431753": net/http: request canceled
2021/10/19 00:59:14.017212 ReadFile: operation canceled, fh.reader.ReadAt: readFull: Get "https://www.googleapis.com:443/download/storage/v1/b/my_bucket/o/hello?alt=media&generation=1634536788431753": net/http: request canceled
fuse_debug: 2021/10/19 00:59:14.017229 Op 0x00000041        connection.go:500] -> Error: "operation canceled"
fuse: 2021/10/19 00:59:14.017232 *fuseops.ReadFileOp error: operation canceled

Also, ReadOp in connection.go immediately cancels the operation when it receives the interrupt request.
I suspect the interrupt request is caused by the SIGURG signal which the go runtime sends to stop a running goroutine for the asynchronous preemption.

@lezh
Copy link
Contributor

lezh commented Oct 28, 2021

This is the same issue as #288.

@tetsuok
Copy link
Contributor Author

tetsuok commented Nov 2, 2021

@lezh Would you clarify why this is considered same as #288? This issue happens since go1.14 which was released in Feb, 2020, but #288 was reported in 2018. As far as I understand, the read request is interrupted because the Linux kernel received some signal (SIGURG from the go runtime, I think). That seems a different cause from #288.

@limdeng
Copy link

limdeng commented Jan 5, 2022

I get the same error in 0.38 sometimes, is there any progress?

@sethiay
Copy link
Contributor

sethiay commented May 31, 2023

Running any kind of code from gcsfuse mount is not a supported use-case.

@sethiay sethiay closed this as completed May 31, 2023
@sethiay sethiay added the known-issues Issues that are known or not supported yet. label May 31, 2023
@sethiay sethiay reopened this May 31, 2023
@sethiay sethiay removed the known-issues Issues that are known or not supported yet. label Jun 6, 2023
@vadlakondaswetha vadlakondaswetha added the p2 P2 label Jun 8, 2023
@marcoa6
Copy link
Collaborator

marcoa6 commented May 24, 2024

Update: As of version 2.1 on 23-May-2024, we now offers enhanced control over how GCSfuse responds to interruptions during file system operations.

You can configure GCSFuse to ignore interruptions during file system operations via CLI using --ignore-interrupts flag (disabled by default) or via config-file using the following config:

file-system:
    ignore-interrupts: true

Please try with this flag
Once we get enough feedback, we will enable this by default (targeting next month).

@marcoa6
Copy link
Collaborator

marcoa6 commented Jun 28, 2024

With GCSfuse v2.3 this is now enabled by default. Thank you all for testing and providing feedback!

@marcoa6 marcoa6 closed this as completed Jun 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Error or flaw in the code with unintended result p2 P2
Projects
None yet
Development

No branches or pull requests

7 participants