Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help Scheduler make the right choice #3824

Closed
bdurrow opened this issue Mar 10, 2021 · 7 comments
Closed

Help Scheduler make the right choice #3824

bdurrow opened this issue Mar 10, 2021 · 7 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@bdurrow
Copy link

bdurrow commented Mar 10, 2021

Feature request

Through some mechanism provide the scheduler enough information so that the workspace PV(s) are scheduled somewhere the pipeline can complete. Nice to have, if a node scale-up is required allow that to happen while other less resource intensive tasks are being performed.

Use case

As a cluster admin I would like to run pipelines that are hard to schedule due to resource constraints in the cluster.

Consider the following pipeline:
(git-clone) -> (build) -> (test) -> (copy-assets)

The test task in my pipeline is hard to schedule, it requires 75% of the CPU and Memory of a node.
I have a cluster divided across 3 availability zones, each PV is only available in a single availability zone. I have Machine Autoscaler for each availability zone that can scale up a new node but some of the Autoscalers may be at their limit and will not create any additional nodes.

When the git-clone task is scheduled it doesn't provide enough "hints" to the scheduler so that the PV gets created in an availability zone where the test task can be completed. The "affinity-assistant" tends to make matters worse.

I can currently work around this limitation (assuming affinity-assistant is disabled) by adding a task that has resource requests that match the test task so my pipeline looks like this:
(preallocate-resources) -> (git-clone) -> (build) -> (test) -> (copy-assets)

This is an ugly hack especially because I can no longer parameterize the resource requests due to this validation failure (for reference I am including my task__reallocate-resources.yaml at the end of this issue):

Error from server (BadRequest): error when creating "task__preallocate-resources.yaml": admission webhook "webhook.pipeline.tekton.dev" denied the request: mutation failed: cannot decode incoming new object: quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$'

Additionally, the affinity-assistant feature breaks this workaround because the affinity-assitant pod is scheduled before the first task pod.

Even before implementing the preallocate-resources workaround described above I set a priorityClassName attribute on the test task that is slightly higher than default and allows for preemption. This allows the scheduler to evict pods from a node in order to make space for the test task pod. I'm not sure exactly why but when the affinity-assistant feature is enabled this workaround doesn't appear to work any longer.

I know that documentation states in many places to use a limitRange to automatically provide for resource allocation for the pods. I feel that limitRanges are a poor solution to the problem. It seems that every pod schedule in the namespace will use this limitRange so in my use case the test task pod would never schedule because the affinity-assistant pod has such large resource requests.

For reference task__preallocate-resources.yaml which fails validation:

apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
  name: preallocate-resources
spec:
  description: >-
    Use this task first in a pipeline to make sure that the workspace PV is allocated somewhere that the pipeline can complete.
  params:
    - name: cpu
      default: '1000m'
      description: cpu required for largest pipeline task
      type: string
    - name: memory
      default: '4Gi'
      description: memory required for largest pipeline task
      type: string
    - name: ephemeral-storage
      default: '5Gi'
      description: ephemeral-storage required for largest pipeline task
      type: string
  steps:
    - name: nop
      image: registry.redhat.io/openshift-pipelines-tech-preview/pipelines-git-init-rhel8@7e18e13a94c9c82e369274984500401e27684d3565393cf6ed2bad55f2d751bc
      command: [echo]
      args: ["Startup Complete"]
      resources:
        requests:
          cpu: $(params.cpu)
          ephemeral-storage: $(params.ephemeral-storage)
          memory: $(params.memory)
  workspaces:
    - name: source
@bdurrow bdurrow added the kind/feature Categorizes issue or PR as related to a new feature. label Mar 10, 2021
@jlpettersson
Copy link
Member

Yes, these are a weak point with Tekton currently. I feel this problem is similar to #3049

I have tried to elaborate about similar problems in Design doc: Task parallelism when using workspace and a potentially viable solution (but require lot of work) for this is to run the whole PipelineRun within a Pod (such that Kubernetes can be responsible for the scheduling and potential autoscaling). I have suggested to address this problem in TEP-0046 Colocation of Tasks and Workspaces (formerly PipelineRun in a Pod)

@souleb
Copy link
Contributor

souleb commented Mar 24, 2021

I'm not sure exactly why but when the affinity-assistant feature is enabled this workaround doesn't appear to work any longer.

From the k8s documentation:

If a pending Pod has inter-pod affinity to one or more of the lower-priority Pods on the Node, the inter-Pod affinity rule cannot be satisfied in the absence of those lower-priority Pods. In this case, the scheduler does not preempt any Pods on the Node. Instead, it looks for another Node.

@bobcatfish
Copy link
Collaborator

hey @bdurrow - it seems like maybe part of the problem is that the variable replacement you're trying to do isn't working, is that right? e.g.:

      resources:
        requests:
          cpu: $(params.cpu)
          ephemeral-storage: $(params.ephemeral-storage)
          memory: $(params.memory)

Looking at the list of fields that accept variable replacement it doenst look like the resource requests (or limits) are in there - im sure we'd be happy to add them tho!

I guess that only helps you with the pre-allocate Task though and not the rest of the problem 🙃

It sounds like the other thing that you might want is some way to express resource requests at a Pipeline level 🤔 vs having to include your pre-allocate task

@tekton-robot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 30, 2021
@tekton-robot
Copy link
Collaborator

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 15, 2021
@tekton-robot
Copy link
Collaborator

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

@tekton-robot
Copy link
Collaborator

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

5 participants