Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Don't set default project-clone CPU limit unless we have to #1127

Closed
wants to merge 2 commits into from

Conversation

amisevsk
Copy link
Collaborator

@amisevsk amisevsk commented Jun 15, 2023

What does this PR do?

This PR changes how the default project-clone CPU limit is handled: if the configured value for the project-clone CPU limit is a) the default value (1000m) and b) there are no limit ranges present in the workspace's namespace, we no longer set a CPU limit for project-clone.

What issues does this PR fix or reference?

Closes #1126
Related: eclipse-che/che#22198, #1111

Is it tested? How?

Test:

  1. By default, project-clone container does not specify a CPU limit
  2. If the DWOC is configured with a non-default value (e.g. 1001m), that value is used for the project-clone CPU limit
  3. If a limitRange is created in the workspace's namespace, the default CPU limit of 1000m is used for the project-clone container.

PR Checklist

  • E2E tests pass (when PR is ready, comment /test v8-devworkspace-operator-e2e, v8-che-happy-path to trigger)
    • v8-devworkspace-operator-e2e: DevWorkspace e2e test
    • v8-che-happy-path: Happy path for verification integration with Che

amisevsk and others added 2 commits June 14, 2023 20:00
If the workspace's namespace does not define any limitRanges that would
prevent creation of a pod that does not specify a CPU limit, do not set
a CPU limit for the project clone init container by default.

If the operator's configuration specifies a non-default CPU limit for
project-clone, that value is always used.

Currently, the controller only checks whether limitRanges are present,
so a limit range that only applies to PVCs will still result in a
default CPU limit being used.

Co-authored-by: Anatolii Bazko <abazko@redhat.com>
Signed-off-by: Angel Misevski <amisevsk@redhat.com>
Signed-off-by: Angel Misevski <amisevsk@redhat.com>
@openshift-ci
Copy link

openshift-ci bot commented Jun 15, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: amisevsk

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tolusha
Copy link
Contributor

tolusha commented Jun 15, 2023

You starts watching all LimitRange objects which can affect DWO memory usage.

@amisevsk
Copy link
Collaborator Author

Yeah this is the tradeoff (and partly why I don't think considering limitranges is a good approach in general).

Either we cache all limitranges (which could potentially take a lot of memory) or make API requests on every reconcile (which leads to a variety of problems).

With caching, we can expect an increase in memory usage on the order of ~200 MB depending on cluster size (small clusters shouldn't really have a problem)

Without caching, we have to worry about DWO making a lot of requests against the API in unpredicable patterns, and also tie the reconcile loop to API latency, which could cause instability in starting workspaces.

IMO it's sufficient that the project-clone container can be configured to not have a CPU limit, and we shouldn't be looking at limitranges at all. We should do the safe, compatible thing in default DWO containers (set a CPU limit that is sufficient for the performance we expect) and allow a way to remove the CPU limit when desired.

cc: @l0rd any thoughts here?

@l0rd
Copy link
Collaborator

l0rd commented Jun 15, 2023

I agree with you. Watching the LimitRanges is overkill and may have some major side effects.

That's my proposal: when the operator is deployed, based on the information that we have, we should decide (in advance) if we can avoid setting the CPU limits for workspaces UDI, project-clone and gateway containers. But how can we decide that when the developer namespaces don't exist yet? Well my guess would be that, if the CheCluster namespace has a default container CPU limit, then developers namespaces will have it too. Of course this is an extremely weak assumption so maybe it's better to do what @amisevsk proposes. But not setting the CPU limits is so better, it provides such a better UX, that I feel bad that we set it by default on the workspaces containers. Of course admins should have a way to enforce a different default behavior (that can be setting a specific CPU limit or not setting it at all).

Out of curiosity: do you know if Developer Sandbox has a default CPU limit in the openshift-devspaces namespace?

@amisevsk
Copy link
Collaborator Author

Out of curiosity: do you know if Developer Sandbox has a default CPU limit in the openshift-devspaces namespace?

I just double-checked, and sadly the sandbox clusters don't have limitranges in non-user namespaces, so this heuristic wouldn't work there.

Currently, the places CPU limits are used in DWO are

  • The Operator's controller/webhooks containers have CPU limits set -- I'd love to be able to remove these to see if it improves workspace startup time, but they're defined in the CSV itself so it's likely a no-go.
  • The project-clone container uses a default CPU limit to be safe, but this can be disabled via the DevWorkspaceOperatorConfig CR (and this field is supported by Eclipse Che as of the last release). I'm not sure how much of an impact the CPU limit has here, as this container may be bound by network rather than CPU (we should test this).

In particular, we don't set CPU limits on workspace containers by default, so if a DevWorkspace is created that e.g. doesn't set a limit on the tools container, that container will be run without a limit (the same applies for CPU requests, actually).

Related to this issue I think is the issue @AObuchow is working on: #1056. Currently, if we merge a contribution (e.g. the editor) into the default development container, we get a CPU limit from that container that we don't necessarily want. Once #1056 is resolved, we should be able to run any non-project-clone containers without a CPU limit by just not setting a CPU limit in devfiles.

@amisevsk amisevsk changed the title Don't set default project-clone CPU limit unless we have to [WIP] Don't set default project-clone CPU limit unless we have to Jun 21, 2023
@amisevsk
Copy link
Collaborator Author

Closing this pull request as I don't think it's something we want to do in DWO; CPU limits can now be configured for every container so it's possible to remove them as required.

@amisevsk amisevsk closed this Aug 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't set CPU limits by default unless they are required by a LimitRange
3 participants