Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client-go memory leak #102718

Closed
xiaoanyunfei opened this issue Jun 9, 2021 · 21 comments
Closed

client-go memory leak #102718

xiaoanyunfei opened this issue Jun 9, 2021 · 21 comments
Assignees
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@xiaoanyunfei
Copy link
Contributor

xiaoanyunfei commented Jun 9, 2021

What happened:

We use client-go informerFactory to handle some pod information,

informerFactory.Core().V1().Pods().Informer().AddEventHandler(
cache.ResourceEventHandlerFuncs{
AddFunc: addPodToCache,
UpdateFunc: updatePodInCache,
DeleteFunc: deletePodFromCache,
})

cache := make(map[string]*v1.Pod)
addPodToCache -> AddPod(pod *v1.Pod) {
key, _ := framework.GetPodKey(pod)
cache[key]=pod
}

deletePodFromCache->RemovePod(pod *v1.Pod) error{
key, _ := framework.GetPodKey(pod)
delete(cache, key)
}

But we came across memory leak problem recently.

I add some log, and finally find the reason.

Our k8s version is 1.17 with more than 5500 node, but the default-watch-cache-size of kube-apiserver is 100 which is too small for our cluster.
(The last k8s already use dynamic size watch-cache ##90058)

Reflector.ListAndWatch

Reflector List function costs more than 35 second, and then Reflector use the last resourceVersion of List to call Watch function. Reflector Watch get an error too old resource version: 6214379869 (6214383056) because the last resourceVersion of List has been expired in default-watch-cache-size of kube-apiserver

Reflect.Run will keep the cycle List -> Watch - get too old resource version error

Then a memory leak occurred. We use cache map to store pod information, and the cache will contain pods in different PodList. This will prevent golang from gc the whole PodList

The code causing memory leak is meta.EachListItem
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apimachinery/pkg/api/meta/help.go#L115

Replacing found = append(found, item) with found = append(found, item.DeepCopyObject()) can fix the problem.
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/tools/cache/reflector.go#L453

What you expected to happen:

Calling DeepCopyObject has some memory overhead,
I wonder if there is a better solution for this.

How to reproduce it (as minimally and precisely as possible):

import (
        "fmt"
        goruntime "runtime"
        "time"

        v1 "k8s.io/api/core/v1"
        "k8s.io/apimachinery/pkg/api/meta"
        metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
        "k8s.io/apimachinery/pkg/runtime"
)

func main() {
        leak()
}

func unleak() {
        store := make(map[string]*v1.Pod)
        var index int
        for {
                plist := generatePod(index)
                for _, pod := range plist.Items {
                        pod := pod
                        store[pod.Name] = &pod
                }
                time.Sleep(time.Second * 2)
                index++
                goruntime.GC()
                fmt.Println("==unleak==", index, len(store))
        }
}

func leak() {
        store := make(map[string]*v1.Pod)
        var index int
        var items []runtime.Object
        for {
                items = items[:0]
                plist := generatePod(index)
                meta.EachListItem(plist, func(obj runtime.Object) error {
                        items = append(items, obj)
                        return nil
                })
                for _, item := range items {
                        pod := item.(*v1.Pod)
                        store[pod.Name] = pod
                }
                time.Sleep(time.Second * 2)
                index++
                goruntime.GC()
                fmt.Println("==leak==", index, len(store))
        }
}

func generatePod(num int) *v1.PodList {
        var plist v1.PodList
        for i := 0; i < 100000-num; i++ {
                pod := v1.Pod{
                        ObjectMeta: metav1.ObjectMeta{
                                Name: fmt.Sprintf("pod-%d", i),
                        },
                }
                plist.Items = append(plist.Items, pod)
        }
        return &plist
}

leak

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Network plugin and version (if this is a network-related bug):
  • Others:
@xiaoanyunfei xiaoanyunfei added the kind/bug Categorizes issue or PR as related to a bug. label Jun 9, 2021
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 9, 2021
@xiaoanyunfei
Copy link
Contributor Author

/sig api-machinery

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 9, 2021
@fedebongio
Copy link
Contributor

/help
/triage accepted

@k8s-ci-robot
Copy link
Contributor

@fedebongio:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help
/triage accepted

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 10, 2021
@novahe
Copy link
Contributor

novahe commented Jun 12, 2021

I think this is the same problem issue: #102564 pr: #102565

@novahe
Copy link
Contributor

novahe commented Jun 12, 2021

I format code used for reproduction

package main

import (
	"fmt"
	goruntime "runtime"
	"time"

	v1 "k8s.io/api/core/v1"
	"k8s.io/apimachinery/pkg/api/meta"
	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
	"k8s.io/apimachinery/pkg/runtime"
)

func main() {
	leak()
}

func unleak() {
	store := make(map[string]*v1.Pod)
	var index int
	for {
		plist := generatePod(index)
		for _, pod := range plist.Items {
			store[pod.Name] = &pod
		}
		time.Sleep(time.Second * 2)
		index++
		goruntime.GC()
		fmt.Println("==unleak==", index, len(store))
	}
}

func leak() {
	store := make(map[string]*v1.Pod)
	var index int
	var items []runtime.Object
	for {
		items = items[:0]
		plist := generatePod(index)
		meta.EachListItem(plist, func(obj runtime.Object) error {
			items = append(items, obj)
			return nil
		})
		for _, item := range items {
			pod := item.(*v1.Pod)
			store[pod.Name] = pod
		}
		time.Sleep(time.Second * 2)
		index++
		goruntime.GC()
		fmt.Println("==leak==", index, len(store))
	}
}

func generatePod(num int) *v1.PodList {
	var plist v1.PodList
	for i := 0; i < 100000-num; i++ {
		pod := v1.Pod{
			ObjectMeta: metav1.ObjectMeta{
				Name: fmt.Sprintf("pod-%d", i),
			},
		}
		plist.Items = append(plist.Items, pod)
	}
	return &plist
}

@249043822
Copy link
Member

Cc @jpbetz @wojtek-t

@wojtek-t
Copy link
Member

Re code above - the leak is not coming from the ForEach, but the fact that you're not really releasing all the references here:

items = items[:0]

Going back to the original comment, I don't think we have a problem with ForEach function:

100 which is too small for our cluster.

you should adjust watchcache size - 100 items in 5k-node cluster is certainly enough. There was a function that adjusts it on number of nodes, but it was based on some flag (apiserver-ram-mb or sth like that), which I bet you don't set.

Then a memory leak occurred. We use cache map to store pod information, and the cache will contain pods in different PodList. This will prevent golang from gc the whole PodList

I don't understand this comment. If you're storing the data in cache, then it's expected that those will not be GC-ed. That's the whole point in storing them in cache, right?

@sxllwx
Copy link
Member

sxllwx commented Jun 14, 2021

The leak here is caused by the golang slice gc mechanism, and there is no memory leak in the for...range mode because of the copy mechanism of v in for_, v := range slice. If the unleak function is changed to the following code (equivalent to the mode of using reflect in meta.EachListItem), there will be an incorrect situation where the RES value monotonously increases. I suggest adding DeepCopy to meta.EachListItem to alleviate this situation.

func unleak() {
	store := make(map[string]*v1.Pod)
	var index int
	for {
		plist := generatePod(index)
		for i := 0; i < len(plist.Items); i++  {
			store[plist.Items[i].GetName()] = &plist.Items[i]
			//store[pod.Name] = &pod
		}
		time.Sleep(time.Second * 2)
		index++
		goruntime.GC()
		fmt.Println("==unleak==", index, len(store))
	}
}

@wojtek-t PTAL

@xiaoanyunfei
Copy link
Contributor Author

xiaoanyunfei commented Jun 15, 2021

@sxllwx is right, The leak here is caused by the golang slice gc mechanism.
Once we get address of slice element the whole slice can not be gc.
meta.EachListItem has the logic of getting address of slice element.
100000 pods can not use 6.9G memory, only about 300-500M

I add a new test case, and the store caches different Generation pod list.

import (
        "fmt"
        goruntime "runtime"
        "time"

        v1 "k8s.io/api/core/v1"
        metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

func main() {
        leak()
}

func leak() {
        store := make(map[string]*v1.Pod)
        var index int
        for {
                plist := generatePod(index)
                for index, pod := range plist {
                        store[pod.Name] = &plist[index]
                }
                time.Sleep(time.Second * 2)
                index++
                goruntime.GC()
                fmt.Println("==leak==", index, len(store))
        }
}

func generatePod(num int) []v1.Pod {
        var items []v1.Pod
        for i := 0; i < 100000-num; i++ {
                pod := v1.Pod{
                        ObjectMeta: metav1.ObjectMeta{
                                Name:       fmt.Sprintf("pod-%d", i),
                                Generation: int64(num),
                        },
                }
                items = append(items, pod)
        }
        return items
}

@sxllwx
Copy link
Member

sxllwx commented Jun 15, 2021

@sxllwx is right, The leak here is caused by the golang slice gc mechanism.
Once we get address of slice element the whole slice can not be gc.
meta.EachListItem has the logic of getting address of slice element.

I add a new test case, and the store caches different Generation pod list.

import (
        "fmt"
        goruntime "runtime"
        "time"

        v1 "k8s.io/api/core/v1"
        metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

func main() {
        leak()
}

func leak() {
        store := make(map[string]*v1.Pod)
        var index int
        for {
                plist := generatePod(index)
                for index, pod := range plist {
                        store[pod.Name] = &plist[index]
                }
                time.Sleep(time.Second * 2)
                index++
                goruntime.GC()
                fmt.Println("==leak==", index, len(store))
        }
}

func generatePod(num int) []v1.Pod {
        var items []v1.Pod
        for i := 0; i < 100000-num; i++ {
                pod := v1.Pod{
                        ObjectMeta: metav1.ObjectMeta{
                                Name:       fmt.Sprintf("pod-%d", i),
                                Generation: int64(num),
                        },
                }
                items = append(items, pod)
        }
        return items
}

@xiaoanyunfei

There is already a PR that is working on a similar issue. The link for this PR discussion
#102565 (comment) PTAL

@wojtek-t
Copy link
Member

OK - that makes sense to me. Sorry for confusion, I was looking to much at #102565 (comment) and I still think there is no bug in that PR, because we're not taking an address of slice item there, right? The runtime.Object are already pointers, right?

That said, the extractList that is happening before has this problem:

func ExtractList(obj runtime.Object) ([]runtime.Object, error) {

@liggitt @deads2k - FYI

@sxllwx
Copy link
Member

sxllwx commented Jun 15, 2021

After testing, only in two cases will affect the work of GC

  • store[plist.Items[i].GetName()] = &plist.Items[i]
  • items.Index(i).Addr().Interface()

The two functions with this kind of operation are


What we need to note is that the corev1.Pod DeepCopyObject method requires a Pointer Receiver.

But in the PodList obtained from kube-apiserver, the Items field is a non-pointer type Pod-struct

type PodList struct {
metav1.TypeMeta `json:",inline"`
// Standard list metadata.
// More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
// +optional
metav1.ListMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"`
// List of pods.
// More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md
Items []Pod `json:"items" protobuf:"bytes,2,rep,name=items"`
}

So

if list[i], found = raw.Addr().Interface().(runtime.Object); !found {
will be called every time

@sxllwx
Copy link
Member

sxllwx commented Jun 27, 2021

@liggitt @deads2k Is there anything I can do to help this move forward?

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 25, 2021
@yuvaldolev
Copy link

/assign

@yuvaldolev
Copy link

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 18, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 16, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 15, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@xigang
Copy link
Contributor

xigang commented Sep 8, 2022

@deads2k @wojtek-t @sxllwx we also had a kube-controller-manager memory leak, suspected to be related to client-go.
Our cluster has 5600 nodes and kube-controller-manager memory usage 197.5GiB:

image

After restarting kube-controller-manager, the memory usage is only 15GiB

kubernetes version: v1.17.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

10 participants