Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubevirt: Enforce limits & requests by a configurable ratio #2206

Merged
merged 7 commits into from
Jan 23, 2023

Conversation

iholder101
Copy link
Contributor

@iholder101 iholder101 commented Jan 15, 2023

This PR brings a mechanism to align Kubevirt virt-launcher Pods with ResourceQuotas that are applied to the namespace.

As an example, let's look at the following VMI:

apiVersion: kubevirt.io/v1
kind: VirtualMachineInstance
metadata:
  labels:
    special: vmi-fedora
  name: vmi-fedora
spec:
  domain:
    devices:
      disks:
      - disk:
          bus: virtio
        name: containerdisk
      - disk:
          bus: virtio
        name: cloudinitdisk
      rng: {}
    resources:
      requests:
        memory: 1024M
  terminationGracePeriodSeconds: 0
  volumes:
  - containerDisk:
      image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:20230115_2baf5303d 
    name: containerdisk
  - cloudInitNoCloud:
      userData: |-
        #cloud-config
        password: fedora
        chpasswd: { expire: False }
    name: cloudinitdisk

When this VMI is created, the following virt-launcher pod is created (some details are omitted for simplicity):

kind: Pod
metadata:
  name: virt-launcher-vmi-fedora-lzzn6
  namespace: kubevirt-hyperconverged
spec:
  containers:
    name: compute
    resources:
      limits:
        devices.kubevirt.io/kvm: "1"
        devices.kubevirt.io/tun: "1"
        devices.kubevirt.io/vhost-net: "1"
      requests:
        cpu: 100m
        devices.kubevirt.io/kvm: "1"
        devices.kubevirt.io/tun: "1"
        devices.kubevirt.io/vhost-net: "1"
        ephemeral-storage: 50M
        memory: "1279755392"

As can be seen, this virt-launcher has only CPU and memory requests - but not limits. This means that if this VMI is being created in a namespace that has a ResourceQuota defined in it - the virt-launcher Pod won't be able to start. This now can be solved by the mechanism that's presented in this PR.

To enable this mechanism, first a ratio between memory/CPU limits to request needs to be defined as an annotation in HCO object:

kind: HyperConverged
metadata:
  annotations:
    kubevirt.io/cpu-limit-to-request-ratio: "2"
    kubevirt.io/memory-limit-to-request-ratio: "1.5"

In addition, a ResourceQuota needs to exist on the relevant namespace. As an example, it's possible to create the following object:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: test-rq 
spec:
  hard:
    limits.cpu: "200"
    limits.memory: "2000G"

Please take into account that if a ResourceQuota only sets a limit on limits.cpu or limits.cpu, then CPU/memory limits will be set accordingly. If multiple ResourceQuota exist within the relevant namespace, it takes only one of then to limit CPU/memory limits in order to enforce these limits.

When these annotations are enabled along with a ResourceQuota object, a mutating webhook that's targeted to virt-launcher pods will enforce limits on the pod. It would now look like the following:

    resources:
      limits:
        cpu: 200m
        devices.kubevirt.io/kvm: "1"
        devices.kubevirt.io/tun: "1"
        devices.kubevirt.io/vhost-net: "1"
        memory: "1919633088"
      requests:
        cpu: 100m
        devices.kubevirt.io/kvm: "1"
        devices.kubevirt.io/tun: "1"
        devices.kubevirt.io/vhost-net: "1"
        ephemeral-storage: 50M
        memory: "1279755392"

Bear in mind that the new webhook would be active only if a request is set but a limit is not.

Reviewer Checklist

Reviewers are supposed to review the PR for every aspect below one by one. To check an item means the PR is either "OK" or "Not Applicable" in terms of that item. All items are supposed to be checked before merging a PR.

  • PR Message
  • Commit Messages
  • How to test
  • Unit Tests
  • Functional Tests
  • User Documentation
  • Developer Documentation
  • Upgrade Scenario
  • Uninstallation Scenario
  • Backward Compatibility
  • Troubleshooting Friendly

Release note:

Kubevirt: Enforce limits & requests by a configurable ratio

@kubevirt-bot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@kubevirt-bot kubevirt-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Jan 15, 2023
@openshift-ci
Copy link

openshift-ci bot commented Jan 15, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@kubevirt-bot kubevirt-bot added the do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. label Jan 15, 2023
@kubevirt-bot kubevirt-bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L labels Jan 15, 2023
@kubevirt-bot kubevirt-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 15, 2023
@coveralls
Copy link
Collaborator

coveralls commented Jan 15, 2023

Pull Request Test Coverage Report for Build 3981586663

  • 127 of 181 (70.17%) changed or added relevant lines in 4 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.6%) to 85.013%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/webhooks/mutator/virt-launcher-mutator.go 98 152 64.47%
Totals Coverage Status
Change from base Build 3940385214: -0.6%
Covered Lines: 4850
Relevant Lines: 5705

💛 - Coveralls

Copy link
Collaborator

@nunnatsa nunnatsa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added three comments about context, but they are all the same one, actually.

}
}

func (m *VirtLauncherMutator) Handle(_ context.Context, req admission.Request) admission.Response {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use a real parameter for the context. You'll need it below.

}
originalPod := launcherPod.DeepCopy()

hco, err := getHcoObject(m.cli, m.hcoNamespace)
Copy link
Collaborator

@nunnatsa nunnatsa Jan 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use the context here.

Comment on lines 14 to 15
func getHcoObject(cli client.Client, namespace string) (*v1beta1.HyperConverged, error) {
ctx := context.TODO()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please define the context as the first function parameter and use it.

Copy link
Contributor Author

@iholder101 iholder101 Jan 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.
FYI, as can be seen here, I was simply using the same logic that already exists and following the current pattern.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't mind to fix it, please do. You're refactoring this code anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@iholder101 iholder101 changed the title Limit enforcement Kubevirt: Enforce limits & requests by a configurable ratio Jan 15, 2023
@iholder101 iholder101 marked this pull request as ready for review January 15, 2023 16:24
@kubevirt-bot kubevirt-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 15, 2023
@iholder101
Copy link
Contributor Author

/cc @enp0s3 @fabiand @stu-gott @acardace

Copy link
Collaborator

@nunnatsa nunnatsa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few comments

hcoutil "github.com/kubevirt/hyperconverged-cluster-operator/pkg/util"
)

func getHcoObject(cli client.Client, ctx context.Context, namespace string) (*v1beta1.HyperConverged, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The golang convention is that the context is always the first parameter in the function. Could you please reorder?

Comment on lines 23 to 31
if err != nil {
if apierrors.IsNotFound(err) {
logger.Info("HCO CR doesn't not exist, allow hcoNamespace deletion")
return nil, err
}

logger.Error(err, "failed getting HyperConverged CR")
return nil, err
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that the right error handling and error message, now that it's a generic function?


hco, err := getHcoObject(m.cli, ctx, m.hcoNamespace)
if err != nil {
m.logErr(err, "cannot get hco object")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe

Suggested change
m.logErr(err, "cannot get hco object")
m.logErr(err, "cannot get the HyperConverged object")

Comment on lines 64 to 67
Expect(resources.Limits[v1.ResourceCPU].Equal(expectedResources.Limits[v1.ResourceCPU])).To(BeTrue())
Expect(resources.Requests[v1.ResourceCPU].Equal(expectedResources.Requests[v1.ResourceCPU])).To(BeTrue())
Expect(resources.Limits[v1.ResourceMemory].Equal(expectedResources.Limits[v1.ResourceMemory])).To(BeTrue())
Expect(resources.Requests[v1.ResourceMemory].Equal(expectedResources.Requests[v1.ResourceMemory])).To(BeTrue())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the gomega's Equals matcher work here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately not :(
With Gomega's Equals I get errors that are similar to:

  [FAILED] Expected
      <resource.Quantity>: {
          i: {value: 150000000000, scale: -3},
          d: {Dec: nil},
          s: "150M",
          Format: "DecimalSI",
      }
  to equal
      <resource.Quantity>: {
          i: {value: 150, scale: 6},
          d: {Dec: nil},
          s: "150M",
          Format: "DecimalSI",
      }

Although the two quantities are effectively equal (to 150M).

podNamespace = "fake-namespace"
)

var _ = Describe("virt-launcher webhook mutator", func() {
Copy link
Collaborator

@nunnatsa nunnatsa Jan 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add error cases; e.g. no HyperConverged, negative ratio, zero ratio, wrongly formatted inputs etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!
Added tests for no HCO and invalid ratio. IMHO we don't need to test the library that parses the annotation as it's beyond this "unit's" scope.

Comment on lines 14 to 15
func getHcoObject(cli client.Client, namespace string) (*v1beta1.HyperConverged, error) {
ctx := context.TODO()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't mind to fix it, please do. You're refactoring this code anyway.

@hco-bot
Copy link
Collaborator

hco-bot commented Jan 15, 2023

okd-hco-e2e-image-index-aws lane succeeded.
/override ci/prow/okd-hco-e2e-image-index-gcp

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-image-index-azure

In response to this:

hco-e2e-image-index-aws lane succeeded.
/override ci/prow/hco-e2e-image-index-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@orenc1
Copy link
Collaborator

orenc1 commented Jan 23, 2023

/retest

@hco-bot
Copy link
Collaborator

hco-bot commented Jan 23, 2023

hco-e2e-upgrade-index-sno-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-index-sno-azure

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-upgrade-index-sno-azure

In response to this:

hco-e2e-upgrade-index-sno-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-index-sno-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link

openshift-ci bot commented Jan 23, 2023

@iholder101: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-hco-e2e-upgrade-index-gcp 56f2190 link true /test okd-hco-e2e-upgrade-index-gcp
ci/prow/hco-e2e-upgrade-index-azure 56f2190 link true /test hco-e2e-upgrade-index-azure
ci/prow/hco-e2e-image-index-gcp 56f2190 link true /test hco-e2e-image-index-gcp
ci/prow/hco-e2e-image-index-sno-azure 56f2190 link false /test hco-e2e-image-index-sno-azure
ci/prow/hco-e2e-image-index-azure 56f2190 link true /test hco-e2e-image-index-azure
ci/prow/hco-e2e-upgrade-index-sno-azure 56f2190 link false /test hco-e2e-upgrade-index-sno-azure

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@hco-bot
Copy link
Collaborator

hco-bot commented Jan 23, 2023

hco-e2e-upgrade-index-sno-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-index-sno-azure

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-upgrade-index-sno-azure

In response to this:

hco-e2e-upgrade-index-sno-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-index-sno-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@orenc1
Copy link
Collaborator

orenc1 commented Jan 23, 2023

since it is a temporary solution for the resource limits problem in VirtualMachines, and it is going to get a permanent fix in kubevirt/kubevirt side, i'm ok with disregarding the coverage decrease.
/override coverage/coveralls
/lgtm
/approve

@kubevirt-bot
Copy link
Contributor

@orenc1: Overrode contexts on behalf of orenc1: coverage/coveralls

In response to this:

since it is a temporary solution for the resource limits problem in VirtualMachines, and it is going to get a permanent fix in kubevirt/kubevirt side, i'm ok with disregarding the coverage decrease.
/override coverage/coveralls
/lgtm
/approve

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Jan 23, 2023
@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: orenc1

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 23, 2023
@kubevirt-bot kubevirt-bot merged commit ac6c4e1 into kubevirt:main Jan 23, 2023
@iholder101
Copy link
Contributor Author

/cherrypick release-1.6

@kubevirt-bot
Copy link
Contributor

@iholder101: #2206 failed to apply on top of branch "release-1.6":

Applying: Refactor: rename mutator to namespace mutator
Applying: Introduce basic virt-launcher mutator
Using index info to reconstruct a base tree...
A	deploy/index-image/community-kubevirt-hyperconverged/1.9.0/manifests/kubevirt-hyperconverged-operator.v1.9.0.clusterserviceversion.yaml
A	deploy/olm-catalog/community-kubevirt-hyperconverged/1.9.0/manifests/kubevirt-hyperconverged-operator.v1.9.0.clusterserviceversion.yaml
M	go.mod
M	pkg/components/components.go
M	pkg/util/consts.go
M	pkg/webhooks/mutator/namespace_mutator.go
M	pkg/webhooks/mutator/namespace_mutator_test.go
M	pkg/webhooks/setup.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/webhooks/setup.go
CONFLICT (content): Merge conflict in pkg/webhooks/setup.go
Auto-merging pkg/webhooks/mutator/namespace_mutator_test.go
Auto-merging pkg/webhooks/mutator/namespace_mutator.go
CONFLICT (content): Merge conflict in pkg/webhooks/mutator/namespace_mutator.go
Auto-merging pkg/util/consts.go
CONFLICT (content): Merge conflict in pkg/util/consts.go
Auto-merging pkg/components/components.go
Auto-merging go.mod
CONFLICT (content): Merge conflict in go.mod
Auto-merging deploy/olm-catalog/community-kubevirt-hyperconverged/1.6.0/manifests/kubevirt-hyperconverged-operator.v1.6.0.clusterserviceversion.yaml
Auto-merging deploy/index-image/community-kubevirt-hyperconverged/1.6.0/manifests/kubevirt-hyperconverged-operator.v1.6.0.clusterserviceversion.yaml
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0002 Introduce basic virt-launcher mutator
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherrypick release-1.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@fabiand
Copy link
Member

fabiand commented Feb 14, 2023

@iholder101 wrt to the annotation, can we change it from

    kubevirt.io/cpu-limit-to-request-ratio: "2"
    kubevirt.io/memory-limit-to-request-ratio: "1.5"

to

    kubevirt.io/launcher-resource-cpu-limit-to-request-ratio: "2"
    kubevirt.io/launcher-resource-memory-limit-to-request-ratio: "1.5"

That's quite long, agreed, but otoh the current name does not really say WHAT cpu and memory values are adjusted, and we have at least two places (.resources and .cpu.count or .memory.guest). At the same time it is not clear on what entity is being operated, vm, vmi, pod, which pod?
Thus the ask to make the annotation more specific, and a little more expressive and future proof.

@iholder101
Copy link
Contributor Author

/cherrypick release-1.8

@kubevirt-bot
Copy link
Contributor

@iholder101: #2206 failed to apply on top of branch "release-1.8":

Applying: Refactor: rename mutator to namespace mutator
Applying: Introduce basic virt-launcher mutator
Using index info to reconstruct a base tree...
A	deploy/index-image/community-kubevirt-hyperconverged/1.9.0/manifests/kubevirt-hyperconverged-operator.v1.9.0.clusterserviceversion.yaml
A	deploy/olm-catalog/community-kubevirt-hyperconverged/1.9.0/manifests/kubevirt-hyperconverged-operator.v1.9.0.clusterserviceversion.yaml
M	go.mod
M	pkg/components/components.go
M	pkg/webhooks/mutator/namespace_mutator_test.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/webhooks/mutator/namespace_mutator_test.go
Auto-merging pkg/components/components.go
Auto-merging go.mod
CONFLICT (content): Merge conflict in go.mod
Auto-merging deploy/olm-catalog/community-kubevirt-hyperconverged/1.8.0/manifests/kubevirt-hyperconverged-operator.v1.8.0.clusterserviceversion.yaml
Auto-merging deploy/index-image/community-kubevirt-hyperconverged/1.8.0/manifests/kubevirt-hyperconverged-operator.v1.8.0.clusterserviceversion.yaml
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0002 Introduce basic virt-launcher mutator
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherrypick release-1.8

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

nunnatsa added a commit to nunnatsa/hyperconverged-cluster-operator that referenced this pull request May 15, 2023
Signed-off-by: Nahshon Unna-Tsameret <nunnatsa@redhat.com>
kubevirt-bot pushed a commit that referenced this pull request May 15, 2023
Signed-off-by: Nahshon Unna-Tsameret <nunnatsa@redhat.com>
tiraboschi pushed a commit to tiraboschi/hyperconverged-cluster-operator that referenced this pull request May 18, 2023
…evirt#2341)

Remove the support of the
kubevirt.io/cpu-limit-to-request-ratio
and the kubevirt.io/memory-limit-to-request-ratio annotations,
as this workaround does now work as expected.

Revert PR kubevirt#2206 as it's not needed anymore

This is a manual cherry-pick of: kubevirt#2341

Signed-off-by: Nahshon Unna-Tsameret <nunnatsa@redhat.com>
Signed-off-by: Simone Tiraboschi <stirabos@redhat.com>

Signed-off-by: Nahshon Unna Tsameret <60659093+nunnatsa@users.noreply.github.com>
kubevirt-bot pushed a commit that referenced this pull request May 19, 2023
Remove the support of the
kubevirt.io/cpu-limit-to-request-ratio
and the kubevirt.io/memory-limit-to-request-ratio annotations,
as this workaround does now work as expected.

Revert PR #2206 as it's not needed anymore

This is a manual cherry-pick of: #2341

Signed-off-by: Nahshon Unna Tsameret <60659093+nunnatsa@users.noreply.github.com>
Co-authored-by: Nahshon Unna Tsameret <60659093+nunnatsa@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants