Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: dedupe for events FailedToSchedule nodes #372

Merged
merged 1 commit into from
Jun 16, 2023

Conversation

engedaam
Copy link
Contributor

@engedaam engedaam commented Jun 15, 2023

Fixes #

Description
Karpenter was not sorting scheduling requirements strings for it's events, which created the notion that one scheduling requirements was represented in multiple ways.

  • Old FailedToSchedule events
default     36m         Warning   FailedScheduling   pod/inflate-c76d54fc4-cpfdg   Failed to schedule pod, incompatible with provisioner "default", daemonset overhead={"cpu":"125m","pods":"2"}, no instance type satisfied resources {"cpu":"125m","nvidia.com/gpu":"1","pods":"3"} and requirements kubernetes.io/os In [linux], kubernetes.io/arch In [amd64], karpenter.sh/capacity-type In [on-demand], karpenter.sh/provisioner-name In [default], karpenter.k8s.aws/instance-category In [c m r] (no instance type which had enough resources and the required offering met the scheduling requirements)
default     34m         Warning   FailedScheduling   pod/inflate-c76d54fc4-cpfdg   Failed to schedule pod, incompatible with provisioner "default", daemonset overhead={"cpu":"125m","pods":"2"}, no instance type satisfied resources {"cpu":"125m","nvidia.com/gpu":"1","pods":"3"} and requirements karpenter.sh/capacity-type In [on-demand], karpenter.k8s.aws/instance-category In [c m r], kubernetes.io/os In [linux], karpenter.sh/provisioner-name In [default], kubernetes.io/arch In [amd64] (no instance type which had enough resources and the required offering met the scheduling requirements)
default     34m         Warning   FailedScheduling   pod/inflate-c76d54fc4-cpfdg   Failed to schedule pod, incompatible with provisioner "default", daemonset overhead={"cpu":"125m","pods":"2"}, no instance type satisfied resources {"cpu":"125m","nvidia.com/gpu":"1","pods":"3"} and requirements kubernetes.io/arch In [amd64], karpenter.sh/capacity-type In [on-demand], karpenter.k8s.aws/instance-category In [c m r], karpenter.sh/provisioner-name In [default], kubernetes.io/os In [linux] (no instance type which had enough resources and the required offering met the scheduling requirements)
default     34m         Warning   FailedScheduling   pod/inflate-c76d54fc4-cpfdg   Failed to schedule pod, incompatible with provisioner "default", daemonset overhead={"cpu":"125m","pods":"2"}, no instance type satisfied resources {"cpu":"125m","nvidia.com/gpu":"1","pods":"3"} and requirements karpenter.k8s.aws/instance-category In [c m r], kubernetes.io/os In [linux], kubernetes.io/arch In [amd64], karpenter.sh/provisioner-name In [default], karpenter.sh/capacity-type In [on-demand] (no instance type which had enough resources and the required offering met the scheduling requirements)
default     4m9s        Warning   FailedScheduling   pod/inflate-c76d54fc4-cpfdg   Failed to schedule pod, incompatible with provisioner "default", daemonset overhead={"cpu":"125m","pods":"2"}, no instance type satisfied resources {"cpu":"125m","nvidia.com/gpu":"1","pods":"3"} and requirements karpenter.k8s.aws/instance-category In [c m r], karpenter.sh/capacity-type In [on-demand], karpenter.sh/provisioner-name In [default], kubernetes.io/arch In [amd64], kubernetes.io/os In [linux] (no instance type which had enough resources and the required offering met the scheduling requirements)
  • New FailedToSchedule events
Warning  FailedScheduling  5s (x3 over 4m14s)  karpenter          Failed to schedule pod, incompatible with provisioner "default", daemonset overhead={"cpu":"125m","pods":"2"}, no instance type satisfied resources {"cpu":"125m","nvidia.com/gpu":"1","pods":"3"} and requirements karpenter.k8s.aws/instance-category In [c m r], karpenter.sh/capacity-type In [on-demand], karpenter.sh/provisioner-name In [default], kubernetes.io/arch In [amd64], kubernetes.io/os In [linux] (no instance type which had enough resources and the required offering met the scheduling requirements)

How was this change tested?

  • Manually tested

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@engedaam engedaam requested a review from a team as a code owner June 15, 2023 17:27
@engedaam engedaam requested a review from njtran June 15, 2023 17:27
@engedaam engedaam changed the title Fix: dedupe for events FailedToSchedule nodes fix: dedupe for events FailedToSchedule nodes Jun 15, 2023
@jackfrancis
Copy link
Contributor

@engedaam can you kindly add a detailed description to the PR that describes what this fix does?

@coveralls
Copy link

Pull Request Test Coverage Report for Build 5281819940

  • 3 of 3 (100.0%) changed or added relevant lines in 1 file are covered.
  • 20 unchanged lines in 5 files lost coverage.
  • Overall coverage decreased (-0.2%) to 81.401%

Files with Coverage Reduction New Missed Lines %
pkg/controllers/provisioning/scheduling/topology.go 2 86.49%
pkg/controllers/provisioning/scheduling/topologygroup.go 2 96.75%
pkg/test/cachesyncingclient.go 2 82.47%
pkg/controllers/node/controller.go 7 70.59%
pkg/controllers/provisioning/scheduling/preferences.go 7 86.67%
Totals Coverage Status
Change from base Build 5256651055: -0.2%
Covered Lines: 6937
Relevant Lines: 8522

💛 - Coveralls

@coveralls
Copy link

coveralls commented Jun 15, 2023

Pull Request Test Coverage Report for Build 5284091132

  • 5 of 5 (100.0%) changed or added relevant lines in 2 files are covered.
  • 7 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.1%) to 81.509%

Files with Coverage Reduction New Missed Lines %
pkg/controllers/node/controller.go 7 70.59%
Totals Coverage Status
Change from base Build 5256651055: -0.1%
Covered Lines: 6947
Relevant Lines: 8523

💛 - Coveralls

@engedaam engedaam force-pushed the fix-dedupe-events branch 2 times, most recently from 6ba46b3 to 13933a3 Compare June 15, 2023 19:00
pkg/scheduling/requirements.go Show resolved Hide resolved
pkg/events/recorder.go Outdated Show resolved Hide resolved
pkg/scheduling/requirements.go Show resolved Hide resolved
Copy link
Contributor

@njtran njtran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I'll let the others approve.

Copy link
Member

@jonathan-innis jonathan-innis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀 Nice work!

@engedaam engedaam removed the request for review from jackfrancis June 15, 2023 23:44
@engedaam engedaam merged commit 6482d9a into kubernetes-sigs:main Jun 16, 2023
6 checks passed
@engedaam engedaam deleted the fix-dedupe-events branch June 16, 2023 00:03
@@ -406,6 +406,28 @@ var _ = Describe("Requirements", func() {
Expect(reqs.NodeSelectorRequirements()).To(HaveLen(14))
})
})
Context("Stringify Requirements", func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for adding a UT!

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants