-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NO_FILE ulimit decreases from 1048576 to 1024 when upgrading to v1.4.2 #7198
Comments
The issue is that now Go honors default RLIMIT for forked processes (which is actually correct). golang/go#46279 (comment) The fix would be to set a sensible default from Talos when running processes from machined |
frezbo
added a commit
to frezbo/talos
that referenced
this issue
May 9, 2023
Now Go only sets the rlimit for the parent and any fork/exec'ed process gets the rlimit that was the default before fork/exec. Ref: golang/go#46279 This fix got backported to [Go 1.20.4](golang/go@ecf7e00) breaking Talos. Talos used to set rlimit in the [`SetRLimit`](https://github.com/siderolabs/talos/blob/v1.4.2/internal/app/machined/pkg/runtime/v1alpha1/v1alpha1_sequencer_tasks.go#L302) sequencer task. This means any process started by `wrapperd` gets the default Rlimit (1024). Fix this by explicitly setting `rlimit` in `wrapperd` before we drop any capabilities. Fixes: siderolabs#7198 Signed-off-by: Noel Georgi <git@frezbo.dev> (cherry picked from commit a2565f6)
frezbo
added a commit
to frezbo/talos
that referenced
this issue
May 11, 2023
This reverts commit a2565f6. The fix done in `a2565f67`, was actually a no-op caused by the misunderstanding the fix done in Go and backported to [Go 1.20.4](golang/go@ecf7e00). The fix gave a false confidence that it was working when it was tested against Talos `main` branch since the PR siderolabs#7190 bumped `x/sys` package from [v0.7.0 -> v0.8.0](golang/go@ecf7e00), the actual change in `x/sys` can be found here at https://cs.opensource.google/go/x/sys/+/ff18efa0a3fa22d4fede381b822bbcfe53b7ee7c which meant that when updating Go to 1.20.4 the `x/sys` package should been updated too. The `x/sys` package changed how the syscall to set the rlimit was called, it got moved into the Go stdlib instead of calling rlimit syscall in the `x/sys` package, which meant a combination of using Go 1.20.4 and an older `x/sys` package means `RLIMIT_NOFILE` value would not be set back to the original value. The Talos 1.4 release branch currently have `x/sys` at [v0.7.0(https://github.com/siderolabs/talos/blob/v1.4.3/go.mod#L133), so the backport would consist of this change along another commit bumping `x/sys` package to `v0.8.0`. Fixes: siderolabs#7198 Fixes: siderolabs#7206 Co-authored-by: Utku Ozdemir <utku.ozdemir@siderolabs.com> Signed-off-by: Noel Georgi <git@frezbo.dev>
frezbo
added a commit
to frezbo/talos
that referenced
this issue
May 11, 2023
This reverts commit a2565f6. The fix done in `a2565f67`, was actually a no-op caused by the misunderstanding the fix done in Go and backported to [Go 1.20.4](golang/go@ecf7e00). The fix gave a false confidence that it was working when it was tested against Talos `main` branch since the PR siderolabs#7190 bumped `x/sys` package from [v0.7.0 -> v0.8.0](golang/go@ecf7e00), the actual change in `x/sys` can be found here at https://cs.opensource.google/go/x/sys/+/ff18efa0a3fa22d4fede381b822bbcfe53b7ee7c which meant that when updating Go to 1.20.4 the `x/sys` package should been updated too. The `x/sys` package changed how the syscall to set the rlimit was called, it got moved into the Go stdlib instead of calling rlimit syscall in the `x/sys` package, which meant a combination of using Go 1.20.4 and an older `x/sys` package means `RLIMIT_NOFILE` value would not be set back to the original value. The Talos 1.4 release branch currently have `x/sys` at [v0.7.0(https://github.com/siderolabs/talos/blob/v1.4.3/go.mod#L133), so the backport would consist of this change along another commit bumping `x/sys` package to `v0.8.0`. Fixes: siderolabs#7198 Fixes: siderolabs#7206 Co-authored-by: Utku Ozdemir <utku.ozdemir@siderolabs.com> Signed-off-by: Noel Georgi <git@frezbo.dev> (cherry picked from commit 4f720d4)
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Bug Report
Description
When I upgrade from v1.4.1 to v1.4.2, the default ulimit for
NO_FILE
decreases from 1048576 to 1024:ulimit -n
, which should show 1048576My trigger case was InfluxDB (which really loves FDs), complaining about "too many open files", just after I upgraded to v1.4.2 but any pod shows this behavior. After having downgraded to v1.4.1, everything is OK again.
Logs
n/a
Environment
The text was updated successfully, but these errors were encountered: