Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I have a confusion why totalMinavailable can not be equal to Job.Spec.MinAvailable? #2895

Open
renwenlong-github opened this issue Jun 5, 2023 · 2 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@renwenlong-github
Copy link

What happened:

1、What does this spec.minAvailable < sum(spec.tasks.minAvailable) mean?

2、What does this spec.minAvailable > sum(spec.tasks.minAvailable) mean?

For example, the following job can be submitted:

apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: minavailable-job
spec:
  schedulerName: volcano
  minAvailable: 1       #  1 not equal to 3 + 2
  tasks:
    - replicas: 5
      minAvailable: 3
      name: "master"
      template:
        metadata:
          name: master
        spec:
          containers:
            - image: nginx
              name: nginx
              resources:
                requests:
                  cpu: "0.5"
                  memory: "0.5Gi"
          restartPolicy: OnFailure
    - replicas: 2
      minAvailable: 2
      name: "work"
      template:
        metadata:
          name: web
        spec:
          containers:
            - image: nginx
              name: nginx
              resources:
                requests:
                  cpu: "1"
                  memory: "1Gi"
          restartPolicy: OnFailure

podgroup info is, I think this pod group is wrong:

apiVersion: scheduling.volcano.sh/v1beta1
kind: PodGroup
metadata:
  name: minavailable-job-312b719a-3060-42d5-8657-d11e0a80f890
spec:
  minMember: 1
  minResources:
    count/pods: "1"
    cpu: "1*0.5"
    memory: 1*0.5Gi
    pods: "1"
    requests.cpu: "1*0.5"
    requests.memory: 1*0.5Gi
  minTaskMember:
    master: 3
    work: 2
  queue: default

I want to change it as follow:
no dependsOn job :

apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: minavailable-job
spec:
  schedulerName: volcano
  minAvailable: 3+2   # webhook change it, sum(tasks minAvailable)
  tasks:
    - replicas: 5
      minAvailable: 3
      name: "master"
      template:
        metadata:
          name: master
        spec:
          containers:
            - image: nginx
              name: nginx
              resources:
                requests:
                  cpu: "0.5"
                  memory: "0.5Gi"
          restartPolicy: OnFailure
    - replicas: 2
      minAvailable: 2
      name: "work"
      template:
        metadata:
          name: web
        spec:
          containers:
            - image: nginx
              name: nginx
              resources:
                requests:
                  cpu: "1"
                  memory: "1Gi"
          restartPolicy: OnFailure

podgroup is

apiVersion: scheduling.volcano.sh/v1beta1
kind: PodGroup
metadata:
  name: minavailable-job-312b719a-3060-42d5-8657-d11e0a80f890
spec:
  minMember: 5
  minResources:
    count/pods: "5"
    cpu: "3*0.5+2*1"           # sum(tasks minAvailable.cpu)
    memory: 3*0.5+2*1Gi   # sum(tasks minAvailable.memory)
    pods: "5"
    requests.cpu: "3*0.5+2*1"
    requests.memory: 3*0.5+2*1Gi
  minTaskMember:
    master: 3
    work: 2
  queue: default

dependsOn job As follows:

apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: minavailable-job
spec:
  schedulerName: volcano
  minAvailable: 3      #  Must be a DAG entry task,example master task
  queue: share-queue
  tasks:
    - replicas: 5
      minAvailable: 3
      name: "master"
      template:
        metadata:
          name: master
        spec:
          containers:
            - image: nginx
              name: nginx
              resources:
                requests:
                  cpu: "0.5"
                  memory: "0.5Gi"
          restartPolicy: OnFailure
    - replicas: 2
      minAvailable: 2
      name: "work"
      template:
        metadata:
          name: web
        spec:
          containers:
            - image: nginx
              name: nginx
              resources:
                requests:
                  cpu: "1"
                  memory: "1Gi"
          restartPolicy: OnFailure
      dependsOn:
        name:
          - "master"

PodGroup minResources is master minResources, because master can be scheduled on behalf of the entire job can be scheduled,right?

apiVersion: scheduling.volcano.sh/v1beta1
kind: PodGroup
metadata:
  name: minavailable-job-9c191369-acd6-45c4-b0b0-e45dbe8ee230
  namespace: default
spec:
  minMember: 3
  minResources:
    count/pods: "3"
    cpu: 1500m
    memory: 1536Mi
    pods: "3"
    requests.cpu: 1500m
    requests.memory: 1536Mi
  minTaskMember:
    master: 3
    work: 2
  queue: default

What you expected to happen:
change webhook and generate podgroup code

How to reproduce it (as minimally and precisely as possible):
node

Anything else we need to know?:
none

Environment:

  • Volcano Version: master
  • Kubernetes version (use kubectl version): none
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release): none
  • Kernel (e.g. uname -a):none
  • Install tools:none
  • Others:none
@lowang-bh
Copy link
Member

lowang-bh commented Jun 17, 2023

tasks.minAvailable was introduced by PR #1459. I also think there are some holes of logic.

Issue #2921 record all cases need to be considered.

PR #2802 fix some part of it.

@stale
Copy link

stale bot commented Sep 17, 2023

Hello 👋 Looks like there was no activity on this issue for last 90 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).

@stale stale bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

2 participants