Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable dynamic GPU scheduling #79

Open
ksatzke opened this issue Jul 30, 2020 · 2 comments · May be fixed by #87
Open

Enable dynamic GPU scheduling #79

ksatzke opened this issue Jul 30, 2020 · 2 comments · May be fixed by #87
Assignees
Labels
design The issue is related to the high-level architecture env/kubernetes To indicate something specific to Kubernetes setup of KNIX feature_request New feature request help wanted Extra attention is needed in progress This issue is already being fixed

Comments

@ksatzke
Copy link
Collaborator

ksatzke commented Jul 30, 2020

Currently, the resource limits for KNIX components, when using helm charts for deployments, are fixed at deployment time, like so:

resources:
      limits:
        cpu: 1
        memory: 2Gi
      requests:
        cpu: 1
        memory: 1Gi

For each workflow deployment, its allowance for GPU support should also be available for configuration at workflow deployment time, to enable dynamic definition of workflow requirements to run on GPUs instead of CPUs at workflow deployment time, and for KNIX to enable scheduling of the workflow on a node which still has sufficient GPUs cores available, like so:

resources:
      limits:
        cpu: 1
        memory: 2Gi
        nvidia.com/gpu: 1 # requesting 1 GPU
  • add the option to define GPU requirements per workflow to the GUI
  • store workflow requirement limits together with workflow data
  • extend management service to evaluate and handle workflow requirement limits for GPU and handle GPU scheduling
  • add node labelling capabilities to KNIX
@ksatzke ksatzke added feature_request New feature request help wanted Extra attention is needed design The issue is related to the high-level architecture in progress This issue is already being fixed env/kubernetes To indicate something specific to Kubernetes setup of KNIX labels Jul 30, 2020
@ksatzke ksatzke self-assigned this Jul 30, 2020
@iakkus
Copy link
Member

iakkus commented Jul 30, 2020

These need to be done in the feature/GPU_support_extended branch, right?

@ksatzke
Copy link
Collaborator Author

ksatzke commented Jul 30, 2020

If we can agree on the issue, we can perform implementation using this branch to extend KNIX GPU support, right.

@ksatzke ksatzke linked a pull request Oct 12, 2020 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design The issue is related to the high-level architecture env/kubernetes To indicate something specific to Kubernetes setup of KNIX feature_request New feature request help wanted Extra attention is needed in progress This issue is already being fixed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants