Skip to content

Commit

Permalink
Added docs for failure details
Browse files Browse the repository at this point in the history
  • Loading branch information
Baran Dalgic committed Nov 15, 2021
1 parent 77b7cf2 commit 0add676
Show file tree
Hide file tree
Showing 10 changed files with 167 additions and 118 deletions.
5 changes: 2 additions & 3 deletions cmd/git/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -44,14 +44,14 @@ func (e ExitError) Error() string {

type settings struct {
help bool
skipValidation bool
depth uint
url string
revision string
depth uint
target string
resultFileCommitSha string
resultFileCommitAuthor string
secretPath string
skipValidation bool
}

var flagValues settings
Expand Down Expand Up @@ -100,7 +100,6 @@ func main() {
// Execute performs flag parsing, input validation and the Git clone
func Execute(ctx context.Context) error {
flagValues = settings{depth: 1}

pflag.Parse()

if flagValues.help {
Expand Down
2 changes: 1 addition & 1 deletion deploy/crds/shipwright.io_buildruns.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -459,7 +459,7 @@ spec:
type: object
type: array
failedAt:
description: FailedAt points to the resource where the BuildRun failed
description: 'Deprecated: FailedAt points to the resource where the BuildRun failed'
properties:
container:
type: string
Expand Down
21 changes: 20 additions & 1 deletion docs/buildrun.md
Original file line number Diff line number Diff line change
Expand Up @@ -268,10 +268,29 @@ _Note_: We heavily rely on the Tekton TaskRun [Conditions](https://github.com/te

### Understanding failed BuildRuns

To make it easier for users to understand why did a BuildRun failed, users can infer from the `Status.FailedAt` field, the pod and container where the failure took place.
[DEPRECATED] To make it easier for users to understand why did a BuildRun failed, users can infer from the `Status.FailedAt` field, the pod and container where the failure took place.

In addition, the `Status.Conditions` will host under the `Message` field a compacted message containing the `kubectl` command to trigger, in order to retrieve the logs.

Lastly, users can check `Status.FailureDetails` field, which includes the same information available in the `Status.FailedAt` field,
as well as a humanly-readable error message and reason.
The message and reason are only included if the build strategy provides them.

Example of failed BuildRun:

```yaml
# [...]
status:
# [...]
failureDetails:
location:
container: step-source-default
pod: baran-build-buildrun-gzmv5-b7wbf-pod-bbpqr
message: The source repository does not exist, or you have insufficient permission
to access it.
reason: GitRemotePrivate
```

### Step Results in BuildRun Status

After the successful completion of a `BuildRun`, the `.status` field contains the results (`.status.taskResults`) emitted from the `TaskRun` steps generate by the `BuildRun` controller as part of processing the `BuildRun`. These results contain valuable metadata for users, like the _image digest_ or the _commit sha_ of the source code used for building.
Expand Down
28 changes: 27 additions & 1 deletion docs/buildstrategies.md
Original file line number Diff line number Diff line change
Expand Up @@ -296,10 +296,36 @@ status:
# [...]
output:
digest: sha256:07626e3c7fdd28d5328a8d6df8d29cd3da760c7f5e2070b534f9b880ed093a53
size: "1989004"
size: 1989004
# [...]
```

Additionally, you can store error details for debugging purposes when a BuildRun fails using your strategy.

| Result file | Description |
| ---------------------------------- | ----------------------------------------------- |
| `$(results.shp-error-reason.path)` | File to store the error reason. |
| `$(results.shp-error-message.path)` | File to store the error message. |

Reason is intended to be a one-word, CamelCase classification of the error source, with the first letter capitalized.
Error details are only propagated if the build container terminates with a non-zero exit code.
This information will be available in the `.status.failureDetails` field of the BuildRun.

```yaml
apiVersion: shipwright.io/v1alpha1
kind: BuildRun
# [...]
status:
# [...]
failureDetails:
location:
container: step-source-default
pod: baran-build-buildrun-gzmv5-b7wbf-pod-bbpqr
message: The source repository does not exist, or you have insufficient permission
to access it.
reason: GitRemotePrivate
```

## Steps Resource Definition

All strategies steps can include a definition of resources(_limits and requests_) for CPU, memory and disk. For strategies with more than one step, each step(_container_) could require more resources than others. Strategy admins are free to define the values that they consider the best fit for each step. Also, identical strategies with the same steps that are only different in their name and step resources can be installed on the cluster to allow users to create a build with smaller and larger resource requirements.
Expand Down
2 changes: 1 addition & 1 deletion pkg/apis/build/v1alpha1/buildrun_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ type BuildRunStatus struct {
// +optional
BuildSpec *BuildSpec `json:"buildSpec,omitempty"`

// FailedAt points to the resource where the BuildRun failed
// Deprecated: FailedAt points to the resource where the BuildRun failed
// +optional
FailedAt *FailedAt `json:"failedAt,omitempty"`

Expand Down
7 changes: 5 additions & 2 deletions pkg/reconciler/buildrun/resources/conditions.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,12 @@ import (
"knative.dev/pkg/apis"
"sigs.k8s.io/controller-runtime/pkg/client"

buildv1alpha1 "github.com/shipwright-io/build/pkg/apis/build/v1alpha1"
"github.com/shipwright-io/build/pkg/ctxlog"
corev1 "k8s.io/api/core/v1"
apierrors "k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"

buildv1alpha1 "github.com/shipwright-io/build/pkg/apis/build/v1alpha1"
"github.com/shipwright-io/build/pkg/ctxlog"
)

// Common condition strings for reason, kind, etc.
Expand Down Expand Up @@ -81,9 +82,11 @@ func UpdateBuildRunUsingTaskRunCondition(ctx context.Context, client client.Clie
return err
}

//lint:ignore SA1019 we want to give users some time to adopt to failureDetails
buildRun.Status.FailedAt = &buildv1alpha1.FailedAt{Pod: pod.Name}

if failedContainer != nil {
//lint:ignore SA1019 we want to give users some time to adopt to failureDetails
buildRun.Status.FailedAt.Container = failedContainer.Name
message = fmt.Sprintf("buildrun step %s failed in pod %s, for detailed information: kubectl --namespace %s logs %s --container=%s",
failedContainer.Name,
Expand Down
103 changes: 0 additions & 103 deletions pkg/reconciler/buildrun/resources/failureDetails_test.go

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,18 @@ import (
"context"
"encoding/json"
"fmt"
buildv1alpha1 "github.com/shipwright-io/build/pkg/apis/build/v1alpha1"
"github.com/tektoncd/pipeline/pkg/apis/pipeline/v1beta1"
v1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/types"
"knative.dev/pkg/apis"
"sigs.k8s.io/controller-runtime/pkg/client"

buildv1alpha1 "github.com/shipwright-io/build/pkg/apis/build/v1alpha1"
)

const (
resultErrorMessage = "error-message"
resultErrorReason = "error-reason"
prefixedResultErrorReason = prefixParamsResultsVolumes + "-" + resultErrorReason
prefixedResultErrorMessage = prefixParamsResultsVolumes + "-" + resultErrorMessage
)
Expand Down Expand Up @@ -80,8 +83,8 @@ func extractFailedPodAndContainer(ctx context.Context, client client.Client, tas
}

func extractFailureDetails(ctx context.Context, client client.Client, taskRun *v1beta1.TaskRun) (failure *buildv1alpha1.FailureDetails) {
failure = &buildv1alpha1.FailureDetails{ }
failure.Location = &buildv1alpha1.FailedAt{ Pod: taskRun.Status.PodName }
failure = &buildv1alpha1.FailureDetails{}
failure.Location = &buildv1alpha1.FailedAt{Pod: taskRun.Status.PodName}

if reason, message, hasReasonAndMessage := extractFailureReasonAndMessage(taskRun); hasReasonAndMessage {
failure.Reason = reason
Expand Down
105 changes: 105 additions & 0 deletions pkg/reconciler/buildrun/resources/failure_details_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
// Copyright The Shipwright Contributors
//
// SPDX-License-Identifier: Apache-2.0

package resources

import (
"context"
"encoding/json"
"fmt"

. "github.com/onsi/ginkgo"
. "github.com/onsi/gomega"
pipelinev1beta1 "github.com/tektoncd/pipeline/pkg/apis/pipeline/v1beta1"
corev1 "k8s.io/api/core/v1"
"knative.dev/pkg/apis"


buildv1alpha1 "github.com/shipwright-io/build/pkg/apis/build/v1alpha1"
buildfakes "github.com/shipwright-io/build/pkg/controller/fakes"
)

var _ = Describe("Surfacing errors", func() {
Context("resources.UpdateBuildRunUsingTaskFailures", func() {
ctx := context.Background()
client := &buildfakes.FakeClient{}

It("surfaces errors to BuildRun in failed TaskRun", func() {
redTaskRun := pipelinev1beta1.TaskRun{}
redTaskRun.Status.Conditions = append(redTaskRun.Status.Conditions,
apis.Condition{Type: apis.ConditionSucceeded, Reason: pipelinev1beta1.TaskRunReasonFailed.String()})
failedStep := pipelinev1beta1.StepState{}

errorReasonValue := "PullBaseImageFailed"
errorMessageValue := "Failed to pull the base image."
errorReasonKey := fmt.Sprintf("%s-%s", prefixParamsResultsVolumes, resultErrorReason)
errorMessageKey := fmt.Sprintf("%s-%s", prefixParamsResultsVolumes, resultErrorMessage)

errorReason := pipelinev1beta1.PipelineResourceResult{Key: errorReasonKey, Value: errorReasonValue}
errorMessage := pipelinev1beta1.PipelineResourceResult{Key: errorMessageKey, Value: errorMessageValue}
unrelated := pipelinev1beta1.PipelineResourceResult{Key: "unrelated-resource-key", Value: "Unrelated resource value"}

message, _ := json.Marshal([]pipelinev1beta1.PipelineResourceResult{errorReason, errorMessage, unrelated})

failedStep.Terminated = &corev1.ContainerStateTerminated{Message: string(message)}

redTaskRun.Status.Steps = append(redTaskRun.Status.Steps, failedStep)
redBuild := buildv1alpha1.BuildRun{}

UpdateBuildRunUsingTaskFailures(ctx, client, &redBuild, &redTaskRun)

Expect(redBuild.Status.FailureDetails.Message).To(Equal(errorMessageValue))
Expect(redBuild.Status.FailureDetails.Reason).To(Equal(errorReasonValue))
})

It("does not surface unrelated Tekton resources if the TaskRun fails", func() {
redTaskRun := pipelinev1beta1.TaskRun{}
redTaskRun.Status.Conditions = append(redTaskRun.Status.Conditions,
apis.Condition{Type: apis.ConditionSucceeded, Reason: pipelinev1beta1.TaskRunReasonFailed.String()})
failedStep := pipelinev1beta1.StepState{}

unrelated := pipelinev1beta1.PipelineResourceResult{Key: "unrelated", Value: "unrelated"}

message, _ := json.Marshal([]pipelinev1beta1.PipelineResourceResult{unrelated})

failedStep.Terminated = &corev1.ContainerStateTerminated{Message: string(message)}

redTaskRun.Status.Steps = append(redTaskRun.Status.Steps, failedStep)
redBuild := buildv1alpha1.BuildRun{}

UpdateBuildRunUsingTaskFailures(ctx, client, &redBuild, &redTaskRun)

Expect(redBuild.Status.FailureDetails.Reason).To(BeEmpty())
Expect(redBuild.Status.FailureDetails.Message).To(BeEmpty())
})

It("should not surface errors for a successful TaskRun", func() {
greenTaskRun := pipelinev1beta1.TaskRun{}
greenTaskRun.Status.Conditions = append(greenTaskRun.Status.Conditions, apis.Condition{Type: apis.ConditionSucceeded})
greenBuildRun := buildv1alpha1.BuildRun{}

UpdateBuildRunUsingTaskFailures(ctx, client, &greenBuildRun, &greenTaskRun)

Expect(greenBuildRun.Status.FailureDetails).To(BeNil())
})

It("should not surface errors if the TaskRun does not have a Succeeded condition", func() {
unfinishedTaskRun := pipelinev1beta1.TaskRun{}
unfinishedTaskRun.Status.Conditions = append(unfinishedTaskRun.Status.Conditions, apis.Condition{Type: apis.ConditionReady})
unfinishedBuildRun := buildv1alpha1.BuildRun{}

UpdateBuildRunUsingTaskFailures(ctx, client, &unfinishedBuildRun, &unfinishedTaskRun)
Expect(unfinishedBuildRun.Status.FailureDetails).To(BeNil())
})

It("should not surface errors if the TaskRun is in progress", func() {
unknownTaskRun := pipelinev1beta1.TaskRun{}
unknownTaskRun.Status.Conditions = append(unknownTaskRun.Status.Conditions, apis.Condition{Type: apis.ConditionSucceeded, Reason: "random"})
unknownBuildRun := buildv1alpha1.BuildRun{}

UpdateBuildRunUsingTaskFailures(ctx, client, &unknownBuildRun, &unknownTaskRun)
Expect(unknownBuildRun.Status.FailureDetails).To(BeNil())
})
})
})
3 changes: 0 additions & 3 deletions pkg/reconciler/buildrun/resources/taskrun.go
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,6 @@ const (
paramSourceRoot = "source-root"
paramSourceContext = "source-context"

resultErrorMessage = "error-message"
resultErrorReason = "error-reason"

workspaceSource = "source"

inputParamBuilder = "BUILDER_IMAGE"
Expand Down

0 comments on commit 0add676

Please sign in to comment.