Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encounter "CRD exists" error while Installing multi volcano scheduler #3302

Open
bysph opened this issue Jan 16, 2024 · 5 comments
Open

Encounter "CRD exists" error while Installing multi volcano scheduler #3302

bysph opened this issue Jan 16, 2024 · 5 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@bysph
Copy link

bysph commented Jan 16, 2024

What happened:
According to https://github.com/volcano-sh/volcano/blob/master/docs/design/multi-volcano-schedulers.md, we can install multi volcano schedulers for scheduling different kind of workloads.
But I encountered the following error when trying to install another Helm release named "volcano-spark" in a Kubernetes cluster that already has a "volcano" Helm release installed.

Error: rendered manifests contain a resource that already exists. Unable to continue with install: CustomResourceDefinition "jobtemplates.flow.volcano.sh" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: key "meta.helm.sh/release-name" must equal "volcano-spark": current value is "volcano"; annotation validation error: key "meta.helm.sh/release-namespace" must equal "volcano-spark-system": current value is "volcano-system"

What you expected to happen:
There should be some parameters to control whether the CRD installation can be disabled.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:
Additionally, it appears that certain parameters such as basic.scheduler_app_name in the installation document at https://github.com/volcano-sh/volcano/tree/master/installer might be outdated. Is it possible that this documentation lacks updates? Initially, I assumed that this parameter was intended for installing multiple Volcano instances, but I did not find it in helm chart.

Environment:

  • Volcano Version:
  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:
@bysph bysph added the kind/bug Categorizes issue or PR as related to a bug. label Jan 16, 2024
@Monokaix
Copy link
Member

Monokaix commented Jan 17, 2024

What if you just deploy two different volcano scheduler and controller deployments and check whether it can work?

@bysph
Copy link
Author

bysph commented Jan 17, 2024

What if you just deploy two different volcano scheduler and controller deployments and check whether it can work?

Deploying with YAML is ok, but I've encountered a new issue. Volcano supports multiple schedulers for managing different nodes, but it seems they cannot manage different queues, this will result in incomplete isolation between tasks of different types. For example, the "reserved" of a flink queue may impact the decision-making of the volcano scheduler only for spark. How do you solve this kind of issue?

@Monokaix
Copy link
Member

Monokaix commented Jan 17, 2024

What if you just deploy two different volcano scheduler and controller deployments and check whether it can work?

Deploying with YAML is ok, but I've encountered a new issue. Volcano supports multiple schedulers for managing different nodes, but it seems they cannot manage different queues, this will result in incomplete isolation between tasks of different types. For example, the "reserved" of a flink queue may impact the decision-making of the volcano scheduler only for spark. How do you solve this kind of issue?

We also have a nodeGroup plugin, and it can set node affinity on queue, this might be a way to solve it.

@bysph
Copy link
Author

bysph commented Jan 18, 2024

What if you just deploy two different volcano scheduler and controller deployments and check whether it can work?

Deploying with YAML is ok, but I've encountered a new issue. Volcano supports multiple schedulers for managing different nodes, but it seems they cannot manage different queues, this will result in incomplete isolation between tasks of different types. For example, the "reserved" of a flink queue may impact the decision-making of the volcano scheduler only for spark. How do you solve this kind of issue?

We also have a nodeGroup plugin, and it can set node affinity on queue, this might be a way to solve it.

Will I encounter this issue when using this feature - that is, the monitoring is no longer accurate, the queue shows resources, but ultimately, due to node affinity, the scheduling cannot be completed?
And it seems unable to meet my requirement of having different volcano schedulers claim different queues.

@Monokaix
Copy link
Member

What if you just deploy two different volcano scheduler and controller deployments and check whether it can work?

Deploying with YAML is ok, but I've encountered a new issue. Volcano supports multiple schedulers for managing different nodes, but it seems they cannot manage different queues, this will result in incomplete isolation between tasks of different types. For example, the "reserved" of a flink queue may impact the decision-making of the volcano scheduler only for spark. How do you solve this kind of issue?

We also have a nodeGroup plugin, and it can set node affinity on queue, this might be a way to solve it.

Will I encounter this issue when using this feature - that is, the monitoring is no longer accurate, the queue shows resources, but ultimately, due to node affinity, the scheduling cannot be completed? And it seems unable to meet my requirement of having different volcano schedulers claim different queues.

Preempt and Reclaim are both node level action, although it chooses queue first, it will traverse all nodes just belong to current scheduler, so I think other queues and nodes that not belong to current shceudler will not be chosen and recalim.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

2 participants