Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] koordlet panic in runtimeproxy PreCreateContainerHook #1882

Closed
LGTH opened this issue Jan 30, 2024 · 4 comments · Fixed by #1885
Closed

[BUG] koordlet panic in runtimeproxy PreCreateContainerHook #1882

LGTH opened this issue Jan 30, 2024 · 4 comments · Fixed by #1885
Assignees
Labels
area/koordlet kind/bug Create a report to help us improve
Milestone

Comments

@LGTH
Copy link

LGTH commented Jan 30, 2024

What happened:
my environment upgrage v1.4.0 from v1.3.0. koordlet "invalid memory address or nil pointer dereference" error. the error log such as:

err parse contair I0118 17:52:18.536907 585926 cpu_burst.go:364] get container xxxxx cgroup path failed,err parse conta 
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x1a534a2] goroutine 759 [running]:
/go/src/github.com/koordinator-sh/koordinator/pkg/koordlet/runtimehooks/protocol/container_context.go:301
IAOGUAN904-20240130 github.com/koordinator-sh/koordinator/pkg/koordlet/runtimehooks/protocol.(*ContainerContext).Update(...)
github.com/koordinator-sh/koordinator/pkg/koordlet/runtimehooks/protocol.(*ContainerContext).ProxyDone(0xc0020aa1e0,0x6?) 
/go/src/github.com/koordinator-sh/koordinator/pkg/koordlet/runtimehooks/protocol/container_context.go:231 +0x42
github.com/koordinator-sh/koordinator/pkg/koordlet/runtimehooks/proxyserver.(*server).PreCreateContainerHook(0xc00038cdc0,{0xc001450100?, 
/go/src/github.com/koordinator-sh/koordinator/pkg/koordlet/runtimehooks/proxyserver/service.go:76 +0x2dc
github.com/koordinator-sh/koordinator/apis/runtime/v1alpha1._RuntimeHookService_PreCreateContainerHook_Handler({0x2313dc0?,0xc00038cdc0}, 
/go/src/github.com/koordinator-sh/koordinator/apis/runtime/v1alpha1/api_grpc.pb.go:245 +0x170
google.golang.org/grpc.(*Server).processUnaryRPC(0xc0003c81e0,{0x2966dfe,0xc0006e8d00},0xc001706000,0xc001d3c420,0x3b4ff90,0xe) 
/go/pkg/mod/google.golang.org/grpc@v1.51.0/server.go:1340 +0xd13
google.golang.org/grpc.(*Server).handleStream(exc0003c81e0,{0x2966df0,0xc0006e8d00},0xc001706000,0x0) 
/go/pkg/mod/google.golang.org/grpc@v1.51.0/server.go:1713 +0xa1b
google.golang.org/grpc.(*Server).serveStreams.func1.2()
/go/pkg/mod/google.golang.org/grpc@v1.51.0/server.go:965 +0x98 created by google.golang.org/grpc.(*Server).serveStreams.func1 
/go/pkg/mod/google.golang.org/grpc@v1.51.0/server.go:963+0x28a

then I read the stack code.

# koordinator-1.4.0\pkg\koordlet\runtimehooks\proxyserver\service.go
func (s *server) PreCreateContainerHook(ctx context.Context,
	req *runtimeapi.ContainerResourceHookRequest) (*runtimeapi.ContainerResourceHookResponse, error) {
	klog.V(5).Infof("receive PreCreateContainerHook request %v", req.String())
	resp := &runtimeapi.ContainerResourceHookResponse{
		ContainerAnnotations: req.GetContainerAnnotations(),
		ContainerResources:   req.GetContainerResources(),
		PodCgroupParent:      req.GetPodCgroupParent(),
		ContainerEnvs:        req.GetContainerEnvs(),
	}
	containerCtx := &protocol.ContainerContext{}      # this line new a object context var, but not assignment value 
	containerCtx.FromProxy(req)
	err := hooks.RunHooks(s.options.PluginFailurePolicy, rmconfig.PreCreateContainer, containerCtx)
	containerCtx.ProxyDone(resp)
	klog.V(5).Infof("send PreCreateContainerHook response for pod %v container %v response %v",
		req.PodMeta.String(), req.ContainerMeta.String(), resp.String())
	return resp, err
}

# koordinator-1.4.0\pkg\koordlet\runtimehooks\protocol\container_context.go
func (c *ContainerContext) ProxyDone(resp *runtimeapi.ContainerResourceHookResponse) {
	c.injectForExt()
	c.Response.ProxyDone(resp)
	c.Update()           # this will call update  function, but 1.3.0 version code don't
}
.....
func (c *ContainerContext) Update() {
	c.executor.UpdateBatch(true, c.updaters...)    # here, c.executor don't assignment value before execute, it will painc
	c.updaters = nil
}

What you expected to happen:
Is this a bug? if it is, what can I do, and how to close the hook or other? I don't understand how to trigger what pod create will cause。please tell me
How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • App version: 1.4.0
  • Kubernetes version (use kubectl version): 1.20
  • Install details (e.g. helm install args):
  • Node environment (for koordlet/runtime-proxy issue):
    • Containerd/Docker version: Containerd
    • OS version:
    • Kernal version: 4.18
    • Cgroup driver: cgroupfs/systemd
  • Others:
@LGTH LGTH added the kind/bug Create a report to help us improve label Jan 30, 2024
@eahydra eahydra changed the title [BUG] [BUG] koordlet panic Jan 30, 2024
@saintube saintube changed the title [BUG] koordlet panic [BUG] koordlet panic in runtimeproxy PreCreateContainerHook Jan 31, 2024
@saintube
Copy link
Member

saintube commented Jan 31, 2024

@LGTH Thanks for your feedback. It is a nil reference bug existing between v1.2~v1.4. I'm very sorry about that and will fix it ASAP.
To avoid this bug temporarily, please disable the runtime proxy and switch to the standalone mode or the NRI mode which are the major ones maintained in the latest versions.
FYI: https://koordinator.sh/docs/designs/nri-mode-resource-management#alternatives

@hormes
Copy link
Member

hormes commented Feb 22, 2024

/reopen

@koordinator-bot koordinator-bot bot reopened this Feb 22, 2024
@koordinator-bot
Copy link

@hormes: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@saintube
Copy link
Member

@LGTH Release v1.4.1 should fix this bug. Please try the latest chart and feel free to reopen this issue if you meet other issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/koordlet kind/bug Create a report to help us improve
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants