Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP 2076: Kueuectl #2093

Merged
merged 3 commits into from
May 1, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
342 changes: 342 additions & 0 deletions keps/2076-kueuectl/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,342 @@
# KEP-2076: Kueuectl

<!-- toc -->
- [Summary](#summary)
- [Motivation](#motivation)
- [Goals](#goals)
- [Non-Goals](#non-goals)
- [Proposal](#proposal)
- [User Stories (Optional)](#user-stories-optional)
- [Story 1](#story-1)
- [Story 2](#story-2)
- [Risks and Mitigations](#risks-and-mitigations)
- [Design Details](#design-details)
- [Create ClusterQueue](#create-clusterqueue)
- [Create LocalQueue](#create-localqueue)
- [List ClusterQueue](#list-clusterqueue)
- [List LocalQueue](#list-localqueue)
- [List Workloads](#list-workloads)
- [Stop ClusterQueue](#stop-clusterqueue)
- [Resume ClusterQueue](#resume-clusterqueue)
- [Stop LocalQueue](#stop-localqueue)
- [Resume LocalQueue](#resume-localqueue)
- [Stop Workload](#stop-workload)
- [Resume Workload](#resume-workload)
- [Pass-through](#pass-through)
- [Test Plan](#test-plan)
- [Prerequisite testing updates](#prerequisite-testing-updates)
- [Unit Tests](#unit-tests)
- [Integration tests](#integration-tests)
- [Graduation Criteria](#graduation-criteria)
- [Implementation History](#implementation-history)
- [Alternatives](#alternatives)
<!-- /toc -->

## Summary

We want to create a command line tool for Kueue that allows to:

* list Kueue's objects with easy to use Kueue-specific filtering,
* create Local and ClusterQueues without touching yamls,
mwielgus marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to help generate yamls?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, with --dry-run you will get just the yaml.

* perform management operations on LQs, CQs and Workloads.
mwielgus marked this conversation as resolved.
Show resolved Hide resolved

## Motivation

Currently many administrative operations around Kueue are largely inconvenient.
They require full API understanding, are relatively error-prone or are simply
impossible without writing a custom mini script or complex pipe processing.
mwielgus marked this conversation as resolved.
Show resolved Hide resolved

### Goals

* Provide a command line tool for system administrator to:

* Create ClusterQueues and LocalQueues.
* Listing *Queues and Workloads that meet certain criteria.
mwielgus marked this conversation as resolved.
Show resolved Hide resolved
* Stopping and resuming execution in ClusteQueuesr and LocalQueues.
mwielgus marked this conversation as resolved.
Show resolved Hide resolved
* Stooping and resuming individual Workloads.
mwielgus marked this conversation as resolved.
Show resolved Hide resolved
* (In the future) Migrating workloads between LocalQueues and other avanced operations

* Build it on top of kubectl (as a kubectl plugin) to reuse all of
mwielgus marked this conversation as resolved.
Show resolved Hide resolved
the authentication/cluster selection methods.

mwielgus marked this conversation as resolved.
Show resolved Hide resolved
### Non-Goals

* Provide any other interface than a command line (no web ui).
* Provide a tool targeted at ml researchers to make running jobs on Kubernetes easier.
* Expose additional metrics or statuses.

## Proposal

Create kueue kubectl plugin with a set of listing and management commands in form of:

```
kubectl kueue <command> <object> <flags>
```

Additionally provide a wrapper script to allow shorter syntax like:

```
kueuectl <command> <object> <flags>
```

The commands automatically submit all changes unless `--dry-run` option is given - in that
case the tool will print out yamls without making any changes in the cluster.
mwielgus marked this conversation as resolved.
Show resolved Hide resolved

### User Stories (Optional)

#### Story 1

I want to stop admission of new Workloads on a specific ClusterQueue but allow the already
running Workloads to complete without manually editing the CQ's definition.

#### Story 2

I want to create a LocalQueue pointing to a specific CQ without creating a one-time yaml.

### Risks and Mitigations

* There will be an additional binary placed on sysadmin's machine, whose version will
have to be keep in sync.

## Design Details

The following commands will be provided within the plugin:

### Create ClusterQueue

Creates a ClusterQueue with the given name, cohort, specified quota and other details.
Format:

```
kueuectl create cq|clusterqueue cqname
–-cohort=cohortname # defaults to "" - no cohort

--queuing-strategy=strategy # defaults to BestEffortFIFO
--namespace-selector=selector # defaults to {} - all namespaces can use the queue
--reclaim-within-cohort=policy # defaults to Never
--preemption-within-cluster-queue = policy # defaults to Never
--preemption-when-borrowing = policy # defaults to Never
mwielgus marked this conversation as resolved.
Show resolved Hide resolved

–-nominal-quota=rfname1:resource1=value,resource2=value,resource3=value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if I want another flavor? Do I add another --nominal-quota line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

–-borrowing-limit=rfname1:resource1=value,resource2=value,resource3=value
–-lending-limit=rfname1:resource1=value,resource2=value,resource3=value
```
mwielgus marked this conversation as resolved.
Show resolved Hide resolved

It is possible to create a ClusterQueue with multiple resource flavors/FlavorQuotas inside
a ResourceGroup and multiple ResourceGroups covering different sets of resources.
The command will create the appropriate resource groups with resource flavors
in the order they appear in the command line. If two settings have at least
one common resource quota specified, they will land in the same ResourceGroup.

Output:
A simple confirmation, like in regular kubectl create,
`clusterqueue.kueue.x-k8s.io/xxxxx created`


### Create LocalQueue

Creates a LocalQueue with the given name pointing to specified ClusterQueue.
The command validates that the target ClusterQueue exists and
its namespace selector matches to LocalQueue's namespace.

Format:

```
kueuectl create lq|localqueue lqname
–-namespace=namespace # uses context's default namespace if not specified
alculquicondor marked this conversation as resolved.
Show resolved Hide resolved
–-clusterqueue=cqname
alculquicondor marked this conversation as resolved.
Show resolved Hide resolved
--skip-validation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that you can create LQ pointing to CQ that doesn't exist yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe --ignore-unknown-cq?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ol, changed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you forget to commit/push?

```

Output:
A simple confirmation `localqueue.kueue.x-k8s.io/xxxxx created`

### List ClusterQueue

List all ClusterQueues, potentially limiting output to those that are active/inactive and
matching the label selector. Format:

```
kueuectl list cq|clusterqueue(s)
--active=*|true|false
--selector=selector # label selector
```

Output columns:

* Name
* Cohort
* Pending Workloads
* Admitted Workloads
* Status (active or not)
* Age

### List LocalQueue

Lists LocalQueues that match the given criteria: point to a specific CQ,
being active/inactive, belonging to the specified namespace or matching the label
selector.
Format:

```
kueuectl list lq|localqueue(s)
–-namespace=ns # uses context's default namespaces if not specified
--all-namespaces | -A
-–clusterqueue=clusterqueue
–-active=*|true|false
--selector=selector # label selector
```

Outputs columns:

* Namespace (if -A is used)
* Name
* ClusterQueue
* Pending Workloads
* Admitted Workloads
* Status (active or not)
* Age

### List Workloads

Lists Workloads that match the provided criteria. Format:

```
kueuectl list workloads
--namespace=ns # uses context's default namespace if not specified
--all-namespaces | -A
--clusterqueue=cq
-–localqueue=lq
-—only-pending
—-only-admitted
Comment on lines +212 to +213
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe --condition=Pending|Admitted?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of --condition=Pending|Admitted, could we provide a field selector for more flexibility?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conditions are multiple fields, though.

Or do you expect some form of jsonpath? I wonder if it could become too complex to write for a common use case such as conditions.

OTOH, do you think we should have other fields?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alculquicondor I was imagined similar to the kubectl --field-selector.

I didn't mean that JSONPath.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can probably make it work for most fields, but conditions will still look weird.

--selector=selector
```

Output:

* Namespace (if -A is used)
* Workload name
* CRD type (truncated to 10 chars)
* CRD name
* LocalQueue
* ClusterQueue
* Status
* Position in Queue (if Pending)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will depend on #1776

cc @KunWuLuan

* Age


### Stop ClusterQueue

Stops admission and execution inside the specified ClusterQueue, possibly
limiting the action only to the selected ResourceFlavor.
Format:

```
kueuectl stop clusterqueue|cq cqname
--keep-already-running
--resource-flavor=rfname
mwielgus marked this conversation as resolved.
Show resolved Hide resolved
```

Output:
None.

### Resume ClusterQueue

Resumes admission inside the specified ClusterQueue.
Format:
```
kueuectl resume clusterqueue|cq cqname
-–resource-flavor=rfname
```
Output:
None.

### Stop LocalQueue

Stops execution (or just admission) of Workloads coming from the given LocalQueue.
This requires adding StopPolicy to LocalQueue and enforcing its changes in ClusterQueue.
mwielgus marked this conversation as resolved.
Show resolved Hide resolved
Format:
```
kueuectl stop localqueue|lq lqname
--keep-already-running
```
Output:
None.


### Resume LocalQueue

Resumes admission of Workloads coming from the given LocalQueue.
Format:
```
kueuectl resume localqueue|lq lqname
```
Output:
None.

### Stop Workload

Puts the given Workload on hold. The Workload will not be admitted and
if it is already admitted it will be put back to queue just as if it was preempted.
mwielgus marked this conversation as resolved.
Show resolved Hide resolved
Format:
```
kueuectl stop workload name --namespace=ns
```
Output:
None.


### Resume Workload

Resumes the Workload, allowing its admission according to regular ClusterQueue rules.
Format:
```
kueuectl resume workload name --namespace=ns
```
Output:
None.

### Pass-through

For completeness there will be 4 additional commands that will simply execute regular kubectl
so that the users won't have to remember to switch the command to kubectl.

* `delete workload|clusterqueue|cq|localqueue|lq`
* `get workload|clusterqueue|cq|localqueue|lq`
* `edit workload|clusterqueue|cq|localqueue|lq`
* `describe workload|clusterqueue|cq|localqueue|lq`

### Test Plan

[x] I/we understand the owners of the involved components may require updates to
existing tests to make this code solid enough prior to committing the changes necessary
to implement this enhancement.

##### Prerequisite testing updates

#### Unit Tests

Regular unit tests for the commands will be provided.

#### Integration tests

Integration/E2E tests will be provided for all of the commands.

### Graduation Criteria

Beta:
* Positive feedback from users.
* All bugs and issues fixed.

GA/Stable:
* Positive feedback from users
* No request for column/flags changes for 0.5 year.

## Implementation History

KEP: 2023-04-27.

## Alternatives

* Use existing kubectl functionality and perform management operations via
API manipulations.
mwielgus marked this conversation as resolved.
Show resolved Hide resolved
28 changes: 28 additions & 0 deletions keps/2076-kueuectl/kep.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
title: Kueuectl
kep-number: 2076
authors:
- "@mwielgus"
status: implementable
creation-date: 2024-04-25
reviewers:
- "@alculquicondor"
approvers:
- "@alculquicondor"
replaces:
- "/keps/487-kubectl-plugin"
mwielgus marked this conversation as resolved.
Show resolved Hide resolved

# The target maturity stage in the current dev cycle for this KEP.
stage: alpha

# The most recent milestone for which work toward delivery of this KEP has been
# done. This can be the current (upcoming) milestone, if it is being actively
# worked on.
latest-milestone: "v0.8"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
latest-milestone: "v0.8"
latest-milestone: "v0.7"

@mwielgus Isn't this target v0.7? Because we already implemented the part of the command.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although we will likely not finish a meaningful amount of commands. Not sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If your team doesn't have enough time to implement commands, I'm ok with postponing it to v0.8.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm


# The milestone at which this feature was, or is targeted to be, at each stage.
milestone:
alpha: "v0.8"
beta: "v0.9"
Comment on lines +24 to +25
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
alpha: "v0.8"
beta: "v0.9"
alpha: "v0.7"
beta: "v0.8"


disable-supported: false