Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure Go GC based on pillar memory limit or global config #4273

Merged
merged 2 commits into from
Sep 23, 2024

Conversation

rouming
Copy link
Contributor

@rouming rouming commented Sep 23, 2024

PR introduces two settings for Golang runtime which impacts garbage collector behavior:

  1. gogc.memory.limit.bytes provides the runtime with a soft memory limit. The runtime undertakes several processes to try to respect this memory limit, including adjustments to the frequency of garbage collections and returning memory to the underlying system more aggressively. The Go API call is described here: https://pkg.go.dev/runtime/debug#SetMemoryLimit

By default, EVE setting is disabled (set to 0), meaning the Golang runtime memory limit will be set according to the following equation based on the memory.limit_in_bytes hard memory limit provided by the pillar cgroups:

  limit = memory.limit_in_bytes * 0.6

The constant 0.6 was chosen empirically and is explained by simple logic: memory.limit_in_bytes is a hard limit for the whole pillar cgroup, meaning when reached, likely one of the processes will be killed by OOM. In turn Golang runtime memory limit is a soft limit, so the difference must be significant to ensure that after the soft limit is reached, there will be enough memory for the Go garbage collector to do its job and, fortunately, not to hit the hard limit.

  1. gogc.percent sets the garbage collection target percentage: a collection is triggered when the ratio of freshly allocated data to live data remaining after the previous collection reaches this percentage. The Go API call is described here: https://pkg.go.dev/runtime/debug#SetGCPercent

The PR is motivated by a frequently observed bloated zedbox application (up to 500MB) that causes an OOM kill call to the /eve or /pillar cgroups. It is assumed that the bloated zedbox application is not caused by memory leaks, but by a delayed GC sweep cycle and a unconditionally growing runtime heap size. An explicit memory limit set for the Golang runtime (~400MB in the current version of EVE) should make the GC more aggressive when the soft memory limit is hit, which should result in a significant reduction in allocated but unused memory.

GC caught in action (for the curious)

Testing application:

 package main

 import (
    "runtime"
    _ "runtime/debug"
    "fmt"
    "time"
 )

 func main() {
    // UNCOMMENT ME TO SEE SUCCESS IN ACTION:
    //debug.SetMemoryLimit(100<<20)

    var arr []string

    fmt.Printf("Allocating ...")
    for i := 0; i < 100; i++ {
        arr = append(arr, string(make([]byte, 1<<20)))
    }
    fmt.Printf("\nNullifying ...")
    for i := 0; i < 100; i++ {
        arr[i] = ""
        time.Sleep(time.Second/10)
    }
    fmt.Printf("\nMain loop\n")
    for {
        tick := time.NewTimer(time.Second)
        select {
        case <-tick.C:
            var m runtime.MemStats
            runtime.ReadMemStats(&m)
            if (m.Alloc>>20) == 0 {
                fmt.Printf("SUCCESS: expect 0 MB, got %v MB\n", m.Alloc>>20)
            } else {
                fmt.Printf("FAILURE: expect 0 MB, got %v MB\n", m.Alloc>>20)
            }
        }
    }
 }

Simple application allocates 100MB in loop and then nullifies all allocated array elements by empty string assignment, which ideally should cause GC to sweep not used memory immediately, but this operation is expensive and delayed by Golang on 2 minutes [1], thus you see "FAILURE" output at least 120 times.

To make GC to sweep not used 1MB blocks uncomment the line

//debug.SetMemoryLimit(100<<20)

and repeat the run, you should see "SUCCESS" output immediately.

Keep in mind, that sleep time.Sleep() between assignments and select from tick.C (also sleeps for 1 second) is crucial and gives GC CPU cycle to be executed.

[1] https://github.com/golang/go/blob/097b7162adeab8aad0095303aff8a045bbbfa6e0/src/runtime/proc.go#L6037

Cc: @rene @OhmSpectator

Copy link
Member

@OhmSpectator OhmSpectator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks super cool.

Copy link
Contributor

@rene rene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @rouming, please port this PR to 9.4-stable as well....

@OhmSpectator
Copy link
Member

LGTM. @rouming, please port this PR to 9.4-stable as well....

Do we need it in 9.4? Is the user requested it to be fixed on this version? It's EOS. What we do now for the OVMF fixes is an exception caused by a direct ask...

@rouming
Copy link
Contributor Author

rouming commented Sep 23, 2024

LGTM. @rouming, please port this PR to 9.4-stable as well....

9.4 is EOS for a long time, I would not do that, otherwise it has to be unsealed and then never ending story

@OhmSpectator
Copy link
Member

9.4 is EOS for a long time, I would not do that, otherwise it has to be unsealed and then never ending story

Unfortunately, we have already been forced to do it to fix another bug. We will have to make it EOS again very soon and force the decision-making guys to have a solution on how to deal with such situation in future.

Patch introduces two settings for Golang runtime which
impacts garbage collector behavior:

1. `gogc.memory.limit.bytes` provides the runtime with a soft memory
    limit.  The runtime undertakes several processes to try to respect
    this memory limit, including adjustments to the frequency of
    garbage collections and returning memory to the underlying system
    more aggressively. The Go API call is described here:
    https://pkg.go.dev/runtime/debug#SetMemoryLimit

    By default, EVE setting is disabled (set to 0), meaning the Golang
    runtime memory limit will be set according to the following
    equation based on the `memory.limit_in_bytes` hard memory limit
    provided by the pillar `cgroups`:

    `limit = memory.limit_in_bytes * 0.6`

    The constant 0.6 was chosen empirically and is explained by simple
    logic: `memory.limit_in_bytes` is a hard limit for the whole
    pillar cgroup, meaning when reached, likely one of the processes
    will be killed by OOM. In turn Golang runtime memory limit is a
    soft limit, so the difference must be significant to ensure that
    after the soft limit is reached, there will be enough memory for
    the Go garbage collector to do its job and, fortunately, not to
    hit the hard limit.

2. `gogc.percent` sets the garbage collection target percentage: a
    collection is triggered when the ratio of freshly allocated data
    to live data remaining after the previous collection reaches this
    percentage. The Go API call is described here:
    https://pkg.go.dev/runtime/debug#SetGCPercent

The patch is motivated by a frequently observed bloated `zedbox`
application (up to 500MB) that causes an OOM kill call to the /eve or
/pillar cgroups. It is assumed that the bloated `zedbox` application
is not caused by memory leaks, but by a delayed GC sweep cycle and a
unconditionally growing runtime heap size. An explicit memory limit
set for the Golang runtime (~400MB in the current version of EVE)
should make the GC more aggressive when the soft memory limit is hit,
which should result in a significant reduction in allocated but unused
memory.

Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
Documentation for recent:

   gogc.memory.limit.bytes
   gogc.percent

Golang runtime settings.

Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
@OhmSpectator
Copy link
Member

OhmSpectator commented Sep 23, 2024

These PRs are stealing the runners from my PR, preventing the build =(

pkg/pillar/types/memory.go Dismissed Show dismissed Hide dismissed
@rouming
Copy link
Contributor Author

rouming commented Sep 23, 2024

These PRs are stealing the runners from my PR, preventing the build =(

I'm so sorry

@OhmSpectator
Copy link
Member

These PRs are stealing the runners from my PR, preventing the build =(

I'm so sorry

I finally got my runner. Have you done anything for that? Still trying to understand how the runners work

@rouming
Copy link
Contributor Author

rouming commented Sep 23, 2024

These PRs are stealing the runners from my PR, preventing the build =(

I'm so sorry

I finally got my runner. Have you done anything for that? Still trying to understand how the runners work

Nothing

@OhmSpectator OhmSpectator merged commit 2f36682 into lf-edge:master Sep 23, 2024
30 of 31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stable Should be backported to stable release(s)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants