-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add config option to ignore signal interrupts. #153
base: master
Are you sure you want to change the base?
Conversation
I’m not convinced that this is the right fix: if you just don’t cancel the contexts, pending operations will pile up over time. I’m also not sure I fully understand the actual problem yet, as cancelling the operation should be fine, and the next FUSE request after the interruption should just start another operation, no? Do you have a standalone reproducer for this issue? (I asked for one in issue #122 as well and never got one.) |
I agree that this this isn't the perfect fix. That would be to stop all SIGURG signals being delivered in the first place, but I couldn't figure out how to do that:
Cancelling the context is not fine, as it causes user visible application failures. A simple and consistent way to reproduce this is to run "git clone " in a fuse managed directory. I don't fully understand the issue either. This seems like a Golang thing, not interupts coming from the kernel. What's even more surprising is that the kernel delivers a large number of SIGURG signals. About 50-100 signals for a git checkout with 11 files. Only a handful of those signals result in errors, possibly because the operations had already completed? |
The SIGURG signals are quite easy to trigger. This code sample triggers SIGURG every so often. package main import ( func hello() { func main() { |
Given how:
I reckon that this is the easiest way to adress a user facing issue. I deliberately left this as an opt-in feature, so we can test it and see if it causes pending operations to pile up over time. |
I think the best way is to run your program with the environment variable
Okay, but why? That’s the bit I don’t understand. The kernel’s INTERRUPT messages are supposed to signal that the client is no longer asking for the request to be answered. Is that incorrect? If it’s correct, where does the failure come from?
And what are the steps to do that? Please, supply steps I can run on a clean checkout of jacobsa/fuse without anything else. In particular, I don’t want to deal with Google Cloud accounts, projects, setup, etc. — I think narrowing down this problem to a minimal reproducer will be very valuable in understanding this issue. |
For what it's worth, I agree with @stapelberg here: it seems like this change needs to come with a detailed discussion of the root cause and why this is the appropriate place to address it. I haven't seen anything in these discussions (or the Google-internal ones) that indicates anybody has tried deeply to understand the root cause. With regard to the quote from the kernel docs ("The userspace filesystem may ignore the Below is a similar reply I gave on an internal discussion, with some potential avenues for investigation.
|
I did a bit more digging today and I'm more confused than before. I still don't know why the operations are interrupted. The interrupt problem seems unrelated to the SIGURG signals. Go routine preemptions are a red herring. What we do know is that the AWS and Azure equivalents to GCSFuse don't intercept signals in the fuse code and therefore don't experience the user facing problems. Not to self: The SIGURG signals stop if you set GODEBUG="asyncpreemptoff=1" from the shell, but it doesn't work if you try to use os.Setenv("GODEBUG", "asyncpreemptoff=1") from inside the binary. |
Being able to ignore SIGURG fixes a number of known issues in GCS Fuse, as well as issue #122 for jacobsa/fuse.