Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Surface error results to BuildRun #901

Closed
wants to merge 7 commits into from

Conversation

dalbar
Copy link
Member

@dalbar dalbar commented Oct 11, 2021

Changes

This PR provides a mechanism to surface errors in the build process into BuildRun objects granting more visibility into without being Tekton/Implementation specific. To achieve this goal, we introduce two new generic results shp-error-reason and shp-error-message, that a TaskRun can emit and a BuildRun can pick up.
Additionally, it implements a parser and interpreter for git errors in stdout that is used to deliver a use-case and PoC for the error surfacing.

Initial implementation for ship 24.

/kind feature

Submitter Checklist

  • Includes tests if functionality changed/was added
  • Includes docs if changes are user-facing (docs will be included once the linked ship progresses)
  • Set a kind label on this PR
  • Release notes block has been filled in, or marked NONE

Release Note

Added a new field `FailureDetails` to BuildRun's `Status` that has the failure location of a failed build pod and container. Additionally the new field contains a `Reason` and `Message` to communicate error information to users and third parties. Moreover, our git binary analyses git failure messages and surfaces them to the BuildRun's `FailureDetails`. 

@openshift-ci openshift-ci bot added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 11, 2021
@openshift-ci openshift-ci bot requested review from otaviof and qu1queee October 11, 2021 08:06
@openshift-ci openshift-ci bot added the do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. label Oct 11, 2021
@openshift-ci openshift-ci bot added release-note Label for when a PR has specified a release note and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Oct 11, 2021
Copy link
Contributor

@coreydaley coreydaley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few thoughts on the API. Once those are addressed (either accepted or denied), I will make another pass with a complete review.

cmd/git/main.go Outdated Show resolved Hide resolved
cmd/git/main.go Outdated
Comment on lines 52 to 57
resultErrorMessage string
resultErrorReason string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two thoughts here:

  • If these are coming from the taskrun, how about taskRunErrorMessage and taskRunErrorReason.
  • Instead of just surfacing the errors, why not surface a generic taskRunMessage and taskRunReason, maybe an additional taskRunReasonType and then you can surface errors or success messages?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes they are coming from TaskRun. However, the git command itself is not aware of Tekton. From its point of view it is just supplied with two filesystem paths.
If it is not a concern then I can introduce the name changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the second thought. Please see #901 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd name them resultFileErrorMessage and resultFileErrorReason. And +1 that the term Tekton should not appear here.

cmd/git/main.go Outdated

}

return typeUndef, &ExitError{Code: 110, Message: "Unsupported type of credentials provided, either SSH private key or username/password is supported"}
}

func writeErrorResults(failure *git2.ErrorResult) (err error) {
if flagValues.resultErrorReason == "" || flagValues.resultErrorMessage == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be && instead of ||?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having || forces to supply a non-empty reason and message to the user. So we have to ask whether only a reason or a message is sufficient. I felt like both are necessary.

cmd/git/main_test.go Outdated Show resolved Hide resolved
Comment on lines 147 to 150

// Failure contains error details that are collected and surfaced from TaskRun
// +optional
Failure *Failure `json:"failure,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, some kind of generic TaskRunResults instead of just Failure

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree with a generic TaskRun result. Do you see a specific use-case for the API changes? We have already introduced build related results in a recent PR with strong typing (Output and Source results). Additionally, there is a proposal for strategy authors to provide generic results (see Strategy Results Proposal from ship 0023).

From an API design point of view, I think it is more pleasing to have a dedicated field for errors, because its presence is enough to derive that an error occurred in the build. There would be no need to process generic results and look for hints of failure/error.

@dalbar dalbar force-pushed the add/surface_error_results branch 5 times, most recently from 91ce776 to f63cf05 Compare October 15, 2021 14:16
@SaschaSchwarze0 SaschaSchwarze0 requested review from SaschaSchwarze0 and removed request for qu1queee October 19, 2021 09:44
Copy link
Member

@SaschaSchwarze0 SaschaSchwarze0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New files are missing copyright headers, please add them.

cmd/git/main.go Outdated
Comment on lines 52 to 57
resultErrorMessage string
resultErrorReason string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd name them resultFileErrorMessage and resultFileErrorReason. And +1 that the term Tekton should not appear here.

cmd/git/main.go Outdated Show resolved Hide resolved
cmd/git/main.go Outdated Show resolved Hide resolved
cmd/git/main.go Outdated Show resolved Hide resolved
pkg/git/git_error_parser.go Outdated Show resolved Hide resolved
pkg/git/git_error_parser.go Outdated Show resolved Hide resolved
pkg/git/git_error_parser.go Outdated Show resolved Hide resolved
pkg/git/git_error_parser.go Outdated Show resolved Hide resolved
pkg/git/git_error_parser.go Outdated Show resolved Hide resolved
pkg/reconciler/buildrun/resources/taskrun.go Outdated Show resolved Hide resolved
@SaschaSchwarze0 SaschaSchwarze0 added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Oct 20, 2021
@SaschaSchwarze0 SaschaSchwarze0 changed the title Surface error results to BuildRun WIP Surface error results to BuildRun Oct 20, 2021
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 20, 2021
@SaschaSchwarze0
Copy link
Member

Added WIP, as things like docs are missing but should be included for a complete PR.

@dalbar dalbar force-pushed the add/surface_error_results branch 2 times, most recently from 64a11d1 to 68d2f57 Compare October 20, 2021 10:56
Baran Dalgic added 6 commits October 25, 2021 11:11
According to ship 0024, this commit adds a mechanism to surface errors
from TaskRun steps. The errors are published under `status.failure` in
BuildRun iff underlying TaskRun has failed.

Adds `pkg/reconciler/buildrun/resources/failures.go`, `pkg/reconciler/buildrun/resources/failures_test.go`
that contains logic for extracting error reason and message.

Modifies `pkg/reconciler/buildrun/resources/taskrun.go` by adding two
new generic results for errors.
- Parser that processes git stdout
- Git cmd line tool now uses error results to surface detailed reason and message
This commit extracts the login in `conditions.go` that searches for the failed pod and its container. It merges the information of the error location (pod and container) into the error results.

Fix tests
Reduces cyclic complexity in git_error_parser.

Renames cmd line argument for `cmd/git`.

Adds more error classes to achieve a 1 to 1 mapping from reason to
message.

Shortens long error messages for git errors.

Adds copyright headers to new files.

Extracts failure related task specs into `failureDetails` file.
@dalbar dalbar force-pushed the add/surface_error_results branch from 68d2f57 to 6041ffc Compare October 25, 2021 09:12
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 25, 2021

@dalbar: The label(s) /remove-label do-not-merge/work-in-progress cannot be applied. These labels are supported: platform/aws, platform/azure, platform/baremetal, platform/google, platform/libvirt, platform/openstack, ga, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, px-approved, docs-approved, qe-approved, downstream-change-needed, backport-risk-assessed, cherry-pick-approved

In response to this:

/remove-label do-not-merge/work-in-progress

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@dalbar dalbar changed the title WIP Surface error results to BuildRun Surface error results to BuildRun Oct 25, 2021
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 25, 2021
@dalbar dalbar force-pushed the add/surface_error_results branch from 6041ffc to 6fa23f1 Compare October 25, 2021 11:03
Copy link
Member

@SaschaSchwarze0 SaschaSchwarze0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few smaller items.

docs/buildrun.md Outdated Show resolved Hide resolved
docs/buildrun.md Outdated Show resolved Hide resolved
test/e2e/e2e_test.go Outdated Show resolved Hide resolved
@dalbar dalbar force-pushed the add/surface_error_results branch 2 times, most recently from 86b4879 to 5a4641c Compare October 25, 2021 13:07
@dalbar dalbar force-pushed the add/surface_error_results branch from 5a4641c to 830d526 Compare October 28, 2021 07:25
@dalbar dalbar force-pushed the add/surface_error_results branch from 830d526 to e9d4b3d Compare October 28, 2021 08:58
Copy link
Member

@SaschaSchwarze0 SaschaSchwarze0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 28, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: SaschaSchwarze0

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 28, 2021
@dalbar dalbar requested a review from coreydaley October 28, 2021 09:02
Copy link
Member

@adambkaplan adambkaplan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold

Thank you @dalbar for this pretty significant contribution. There are a lot of commits here that affect multiple, independent components, which makes it difficult for me personally to review. I have concerns with some particulars - like the git error parser - that will likely bog down this review and prevent other important pieces from merging.

It also looks like we are still actively discussing the implementation in shipwright-io/community#37

Is it possible to split this into multiple pull requests? A pull request with just the API and controller changes would be simpler to review the current PR that also includes changes to our git step runner.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 28, 2021
@dalbar dalbar force-pushed the add/surface_error_results branch from e9d4b3d to c817232 Compare October 29, 2021 09:22
@dalbar dalbar force-pushed the add/surface_error_results branch from c817232 to b1499ce Compare October 29, 2021 09:23
@dalbar
Copy link
Member Author

dalbar commented Oct 29, 2021

/hold

Thank you @dalbar for this pretty significant contribution. There are a lot of commits here that affect multiple, independent components, which makes it difficult for me personally to review. I have concerns with some particulars - like the git error parser - that will likely bog down this review and prevent other important pieces from merging.

It also looks like we are still actively discussing the implementation in shipwright-io/community#37

Is it possible to split this into multiple pull requests? A pull request with just the API and controller changes would be simpler to review the current PR that also includes changes to our git step runner.

That is a valid point. I have to rewrite the e2e-test and can then proceed with the split.

@dalbar dalbar closed this Nov 2, 2021
@dalbar dalbar deleted the add/surface_error_results branch November 2, 2021 09:46
@dalbar dalbar mentioned this pull request Nov 2, 2021
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. release-note Label for when a PR has specified a release note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants