Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP 2551 - kubectl exit code normalization #2574

Merged
merged 13 commits into from
Feb 2, 2022

Conversation

rikatz
Copy link
Contributor

@rikatz rikatz commented Mar 16, 2021

xref Issue: #2551

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 16, 2021
@k8s-ci-robot k8s-ci-robot added sig/cli Categorizes an issue or PR as relevant to SIG CLI. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 16, 2021
@soltysh soltysh self-assigned this Mar 16, 2021
@apelisse
Copy link
Member

apelisse commented Apr 7, 2021

Note that some exit code are not actual errors. You should definitely account for that.

@rikatz
Copy link
Contributor Author

rikatz commented Apr 7, 2021

Note that some exit code are not actual errors. You should definitely account for that.

True!! So we've discussed in todays sig-cli and I'll try to reduce the scope/amount of error codes, like having:

  • An error code that represents some problem on the client side
  • A small amount of error codes that might represent apiserver errors (although @soltysh pointed that some forbidden or not found errors may not be well differentiated in some parts like kubectl get, so maybe having the 'forbidden' code for write operations).
  • Errors over 128 can be used to represent the sum of something forked. As an example: https://www.gnu.org/software/bash/manual/html_node/Exit-Status.html we can use 129 (128 + 1) if some plugin, called from kubectl exists with 1 (and this may apply to diff as well!)


### Goals

* Document possible exit codes for kubectl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we also cover expectations for plugins and their exit codes? Maybe that's a non-goal at this stage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Tim,

IMO we should not normalize the plugins exit codes, but plugins enter in the 'external programs' category, which we can say that the error code is 128 + the plugin exit code (same for kubectl diff, as an example).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I'm questioning myself what happens if a plugin ends with an error code bigger then 128, which will be 128 (from the kubectl) + 128 (from the plugin).

During the design, this should be dealt like (if > 255 then return 255 or something inside the supported error range)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can recommend but that's not a hard requirement.

keps/sig-cli/2551-return-code-normalization/README.md Outdated Show resolved Hide resolved

### Goals

* Document possible exit codes for kubectl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can recommend but that's not a hard requirement.

keps/sig-cli/2551-return-code-normalization/README.md Outdated Show resolved Hide resolved
The majority of commands already are organized as the following:
* Run Complete to complete missing information with defaults. This runs on the client side
* Run Validation to check command syntax and missing arguments. This runs on the client side
* Run the command itself. This might run on the client side, be dry-run, run an external command or call the APIServer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might actually change during the separattion of cobra from Options struct, so it would be nice to put it in a more generic way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example: validation and execution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deejross will take a look on this

kubectl exec and run uses the pod exit code as its own exit code, we should figure out how we
should deal with this
https://github.com/kubernetes/kubernetes/blob/v1.22.0-alpha.0/test/e2e/kubectl/kubectl.go#L496
https://github.com/kubernetes/kubernetes/blob/v1.22.0-alpha.0/staging/src/k8s.io/kubectl/pkg/cmd/util/helpers.go#L178-L179
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a big one and how we can properly handle that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this follow the 128+exit code rule proposed for plugins?

Copy link
Contributor

@deejross deejross Aug 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other option is to start kubectl error codes at 201. This would allow exec to continue returning the pod exit code, assuming it's less than 201. If the pod exit code is 201 or greater, then set it to 255. Maybe we should apply the same logic to external commands as well, such as diff to keep it consistent. This would allow those commands to return their codes unaltered (if less than 201), and make kubectl generated error codes distinct to reduce confusion around the origin of an error. Thoughts on this approach?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soltysh wdyt about this pattern of using RC > 200? I was discussing with Ross here that this may be a good approach but not the desired by sig-cli.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about having kubectl exec take a parameter --use-exit-code that defaults to true, and that you can force to false if you want success to mean “we were able to exec something”?

### Creating new error parser functions

Another design solution is to create helper functions for each steps:
* When running Complete, Validate or other client side steps, call cmdutil.CheckClientErr(err) and exits with some well defined client error code, mapped to ErrorCodeClient
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned before I'd describe this in slightly more generic way, since this implementation might change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deejross to take a look into this

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 9, 2021
@rikatz
Copy link
Contributor Author

rikatz commented Aug 9, 2021

/remove-lifecycle stale
I will still work on this :D
/lifecycle active

@k8s-ci-robot k8s-ci-robot added lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 9, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. and removed lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. labels Nov 14, 2021
@eddiezane eddiezane linked an issue Nov 22, 2021 that may be closed by this pull request
4 tasks
@eddiezane
Copy link
Member

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 22, 2021
Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of nits, but mostly this looks good
/hold
for you to get PRR file added and nits addressed
/lgtm
/approve

for the developers to warn when a new deployment fails because of the lack of some permission, so those permissions can be
updated for the pipeline to work correctly.

The developers are making a lot of changes, and they keeps asking for Bruce to look for every pipeline execution, even those that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The developers are making a lot of changes, and they keeps asking for Bruce to look for every pipeline execution, even those that
The developers are making a lot of changes, and they keep asking for Bruce to look for every pipeline execution, even those that

wants to warn users when the apply command fails because of differences between the manifests.

#### Story 2
Bruce Wayne, the security administrator of the Gotham Inc Company is following the development of a new product. Bruce asked
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Bruce Wayne, the security administrator of the Gotham Inc Company is following the development of a new product. Bruce asked
Bruce Wayne, the security administrator of the Gotham Inc company is following the development of a new product. Bruce asked


| Code | Description |
| ----- | ----------------------------------------------------------------------------------------------------- |
| 1-200 | Reserved for exit codes from exec and external commands |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if we need to reserve all 200, if just the initial 100 isn't sufficient? But that's something we'll need to empirically figure out as we go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. This was a discussion I was on with Ross, that maybe the code reservation can be something more sparse or in a "range" other than reserving all the 200

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's "close" this list before going to beta, for alpha let's leave it open.

| 203 | Client configuration error, invalid or missing configuration |
| 204 | Network failure, API could not be reached |
| 205 | Authentication failure, identity could not be determined |
| 206 | Authorization failure, identity was determined, but does not have access to requested resource(s) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and not found are hard to distinguish due to server returning 404 Not Found in both cases, due to security reasons.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can leave this as an UNRESOLVED tag so we are stating this is not solved yet

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly what I said above :)


### Creating new error parser functions

Another design solution is to create helper functions for each steps:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about it earlier this week and I think that implementing a set of new error codes inside kubectl where each command could use them and then having CheckErr return new error codes with that env of yours and just -1 in the backwards compatible way as it does today. Which is close with what you're describing here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry @soltysh is there any action on this? I'm a bit slow this latest days :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No action required, this is more like a thought for you and @deejross to consider when implementing 😉

CRI or CNI may require updating that component before the kubelet.
-->

## Production Readiness Review Questionnaire
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're missing file in https://github.com/kubernetes/enhancements/blob/master/keps/prod-readiness/ and make sure to update the template with current one.

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jan 28, 2022
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jan 28, 2022
@rikatz
Copy link
Contributor Author

rikatz commented Jan 28, 2022

PRR added
/assign @johnbelamaric

Copy link
Member

@johnbelamaric johnbelamaric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor tweaks needed to PRR but looks good in general.

keps/sig-cli/2551-return-code-normalization/README.md Outdated Show resolved Hide resolved
keps/sig-cli/2551-return-code-normalization/README.md Outdated Show resolved Hide resolved
# The milestone at which this feature was, or is targeted to be, at each stage.
milestone:
alpha: "v1.24"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're missing some fields from the latest template, please update


## Production Readiness Review Questionnaire

### Feature Enablement and Rollback
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you have an older version of the questions, please review the latest template and include any new questions as well. Thanks.

owning-sig: sig-cli
status: implementable
creation-date: 2021-03-16
reviewers:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look thanks!

@rikatz
Copy link
Contributor Author

rikatz commented Feb 1, 2022

@johnbelamaric fixed based on your comments, thanks :)

Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor nits, but this is good, thx!
/lgtm
/approve


Items marked with (R) are required *prior to targeting to a milestone / release*.

- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: some of these checkboxes need to be checked.


| Code | Description |
| ----- | ----------------------------------------------------------------------------------------------------- |
| 1-200 | Reserved for exit codes from exec and external commands |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's "close" this list before going to beta, for alpha let's leave it open.

| 203 | Client configuration error, invalid or missing configuration |
| 204 | Network failure, API could not be reached |
| 205 | Authentication failure, identity could not be determined |
| 206 | Authorization failure, identity was determined, but does not have access to requested resource(s) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly what I said above :)


### Creating new error parser functions

Another design solution is to create helper functions for each steps:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No action required, this is more like a thought for you and @deejross to consider when implementing 😉

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 1, 2022
@johnbelamaric
Copy link
Member

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johnbelamaric, rikatz, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 1, 2022
@soltysh
Copy link
Contributor

soltysh commented Feb 2, 2022

/hold cancel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/cli Categorizes an issue or PR as relevant to SIG CLI. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kubectl return code normalization
9 participants