Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect PipelineRun status "ResolvingTaskRef" #5673

Closed
lbernick opened this issue Oct 21, 2022 · 13 comments
Closed

Incorrect PipelineRun status "ResolvingTaskRef" #5673

lbernick opened this issue Oct 21, 2022 · 13 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@lbernick
Copy link
Member

lbernick commented Oct 21, 2022

Expected Behavior

If a pipeline task ref is invalid, the Pipeline will be rejected by the admission webhook. In addition, if remote resolution completes successfully, the PipelineRun status should not be "ResolvingTaskRef". Lastly, the validation error expected exactly one, got both: spec.taskRef.name, spec.taskRef.params, spec.taskRef.resolver". is confusing since there are three things, and you should be able to specify resolver + params

Actual Behavior

Remote resolution succeeds. PipelineRun is marked as Running(ResolvingTaskRef), but I see the k8s event " PipelineRun Failed to create TaskRun "clone-build-push-run-fetch-source": admission webhook "validation.webhook.pipeline.tekton.dev" denied the request: validation failed: expected exactly one, got both: spec.taskRef.name, spec.taskRef.params, spec.taskRef.resolver".

Steps to Reproduce the Problem

Pipeline:

apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
  name: clone-build-push
spec:
  description: |
    This pipeline clones a git repo, builds a Docker image with Kaniko and
    pushes it to Google Artifact repository.
  workspaces:
  - name: source-code
  params:
  - name: image
  - name: repo-url
  - name: chat-webhook-url
  results:
  - name: image-digest
    value: $(tasks.build.results.digest)
  tasks:
  - name: fetch-source
    taskRef:
      name: git-clone
      resolver: hub
      params:
      - name: kind
        value: task
      - name: name
        value: git-clone
      - name: version
        value: "0.7"
    workspaces:
    - name: output
      workspace: source-code
    params:
    - name: url
      value: $(params.repo-url)
  - name: build
    taskRef:
      name: kaniko-build
    params:
    - name: image
      value: $(params.image)
    runAfter:
    - fetch-source

PipelineRun:

apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
  name: clone-build-push-run
spec:
  pipelineRef:
    name: clone-build-push
  params:
  - name: repo-url
    value: https://github.com/lbernick/web-app-demo
  - name: image
    value: us-east1-docker.pkg.dev/tekton-interns/leebernick/web-app-demo
  - name: chat-webhook-url
    value: https://chat.googleapis.com/v1/spaces/AAAAGDLkJiM/messages?key=AIzaSyDdI0hCZtE6vySjMm-WEfRq3CPzqKqqsHI&token=PJjU6o5bxDsarG_hgl6U6ZqOb8EM_L8DX3Lp5FhIVTQ%3D
  workspaces:
  - name: source-code
    volumeClaimTemplate:
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi
  taskRunSpecs:
  - pipelineTaskName: build
    taskServiceAccountName: builder
    computeResources:
      requests:
        cpu: 1Gi

PR status:

Status:
  Conditions:
    Last Transition Time:  2022-10-21T19:17:52Z
    Message:               PipelineRun gosl/clone-build-push-run awaiting remote resource
    Reason:                ResolvingTaskRef
    Status:                Unknown
    Type:                  Succeeded

The resolutionrequest has successfully resolved:

$ k get resolutionrequest -n gosl
NAME                                   OWNERKIND     OWNER                  SUCCEEDED   REASON   STARTTIME              ENDTIME
hub-7bc267001bbaccdf73355dfc32a42ceb   PipelineRun   clone-build-push-run   True                 2022-10-21T19:17:52Z   2022-10-21T19:17:52Z

Additional Info

  • Kubernetes version:
$ k version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.2", GitCommit:"5835544ca568b757a8ecae5c153f317e5736700e", GitTreeState:"clean", BuildDate:"2022-09-21T14:25:45Z", GoVersion:"go1.19.1", Compiler:"gc", Platform:"darwin/arm64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.3-gke.200", GitCommit:"58cd50d1c2a9028cfb92baf62cc0b678ef751532", GitTreeState:"clean", BuildDate:"2022-07-20T09:23:47Z", GoVersion:"go1.18.3b7", Compiler:"gc", Platform:"linux/amd64"}
  • Tekton Pipeline version:
$ tkn version
Client version: 0.26.0
Pipeline version: devel
Triggers version: devel
@lbernick lbernick added the kind/bug Categorizes issue or PR as related to a bug. label Oct 21, 2022
@yuzp1996
Copy link
Contributor

yuzp1996 commented Nov 9, 2022

/assign

@yuzp1996
Copy link
Contributor

yuzp1996 commented Dec 3, 2022

I apply the following resource to cluster

apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
  name: clone-build-push
spec:
  description: |
    This pipeline clones a git repo, builds a Docker image with Kaniko and
    pushes it to Google Artifact repository.
  workspaces:
  - name: source-code
  params:
  - name: image
  - name: repo-url
  - name: chat-webhook-url
  tasks:
  - name: fetch-source
    taskRef:
      name: git-clone
      resolver: hub
      params:
      - name: kind
        value: task
      - name: name
        value: git-clone
      - name: version
        value: "0.7"
    workspaces:
    - name: output
      workspace: source-code
    params:
    - name: url
      value: $(params.repo-url)

---

apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
  name: clone-build-push-run
spec:
  pipelineRef:
    name: clone-build-push
  params:
  - name: repo-url
    value: https://github.com/lbernick/web-app-demo
  - name: image
    value: us-east1-docker.pkg.dev/tekton-interns/leebernick/web-app-demo
  - name: chat-webhook-url
    value: https://chat.googleapis.com/v1/spaces/AAAAGDLkJiM/messages?key=AIzaSyDdI0hCZtE6vySjMm-WEfRq3CPzqKqqsHI&token=PJjU6o5bxDsarG_hgl6U6ZqOb8EM_L8DX3Lp5FhIVTQ%3D
  workspaces:
  - name: source-code
    volumeClaimTemplate:
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi

And maybe I reproduced the problem because I get the following log in pipeline-controller
image

{"severity":"error","timestamp":"2022-12-03T13:42:37.439Z","logger":"tekton-pipelines-controller","caller":"pipelinerun/pipelinerun.go:232","message":"Reconcile error: error creating TaskRun called clone-build-push-run-fetch-source for PipelineTask fetch-source from PipelineRun clone-build-push-run: admission webhook \"validation.webhook.pipeline.tekton.dev\" denied the request: validation failed: expected exactly one, got both: spec.taskRef.name, spec.taskRef.params, spec.taskRef.resolver","commit":"09c7285","knative.dev/controller":"git.luolix.top.tektoncd.pipeline.pkg.reconciler.pipelinerun.Reconciler","knative.dev/kind":"tekton.dev.PipelineRun","knative.dev/traceid":"8e818aca-0cd7-4d89-a12e-3b444c938f62","knative.dev/key":"default/clone-build-push-run","stacktrace":"github.com/tektoncd/pipeline/pkg/reconciler/pipelinerun.(*Reconciler).ReconcileKind\n\tgit.luolix.top/tektoncd/pipeline/pkg/reconciler/pipelinerun/pipelinerun.go:232\ngit.luolix.top/tektoncd/pipeline/pkg/client/injection/reconciler/pipeline/v1beta1/pipelinerun.(*reconcilerImpl).Reconcile\n\tgit.luolix.top/tektoncd/pipeline/pkg/client/injection/reconciler/pipeline/v1beta1/pipelinerun/reconciler.go:235\nknative.dev/pkg/controller.(*Impl).processNextWorkItem\n\tknative.dev/pkg@v0.0.0-20221011175852-714b7630a836/controller/controller.go:542\nknative.dev/pkg/controller.(*Impl).RunContext.func3\n\tknative.dev/pkg@v0.0.0-20221011175852-714b7630a836/controller/controller.go:491"}

image

@yuzp1996
Copy link
Contributor

yuzp1996 commented Dec 4, 2022

I think there are two things that need to be resolved.
First, if validation for pipelinerun failed the pipelinerun should be marked as failed and second the error message should be more clear.

@yuzp1996
Copy link
Contributor

yuzp1996 commented Dec 4, 2022

First Problem

It seems that if the Taskrun related to the Pipelinerun is invalid and the creation fails, the status of the Pipelinerun will remain in the following state.
image

Now I guess it's because the related Taskrun is not created so no event will be emitted to trigger Pipelinerun's reconcile and Pipelinerun will be frozen at ResolvingTaskRef.

I will try to find more evidence and possible solutions. Any suggestion is welcome.

Second Problem

I reproduced it with taskrun like this

apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
  name: git-clone-demo-tr
spec:
  taskRef:
    resolver: git
    name: test
    params:
    - name: url
      value: https://github.com/tektoncd/catalog.git
    - name: revision
      value: main
    - name: pathInRepo
      value: task/git-clone/0.6/git-clone.yaml
  params:
  - name: url
    value: https://github.com/tektoncd/catalog.git
  workspaces:
    - name: output # this workspace name must be declared in the Pipeline
      volumeClaimTemplate:
        spec:
          accessModes:
            - ReadWriteOnce # access mode may affect how you can use this volume in parallel tasks
          resources:
            requests:
              storage: 1Gi

This is the related code.
image

I guess it is because there is no suitable format when printing FieldError so can only print the ErrMultipleOneOf one by one.
image

@yuzp1996
Copy link
Contributor

yuzp1996 commented Dec 18, 2022

To resolve the first problem I think there are two points that we need to think about.

First, the invalid taskref in Pipeline should not be created. So we should add validation for taskRef before creating Pipeline.

Second, when failing to create TaskRun in PipelineRun's reconciler maybe we should mark PipelineRun failed in case it confuses the users.

Here is a simple WIP PR that demonstrates what I want to do. If this is fine I will complete this PR. Looking forward to your suggestions. @lbernick @abayer

https://github.com/tektoncd/pipeline/pull/5887/files

@lbernick
Copy link
Member Author

To resolve the first problem I think there are two points that we need to think about.

First, the invalid taskref in Pipeline should not be created. So we should add validation for taskRef before creating Pipeline.

SGTM! I saw you already have a PR out; would you mind modifying it so it tackles only this issue first, and we can address the second issue separately?

Second, when failing to create TaskRun in PipelineRun's reconciler maybe we should mark PipelineRun failed in case it confuses the users.

I think we need to change the TaskRun creation logic so the PipelineRun controller isn't attempting to create an invalid TaskRun. I'm a bit confused because it looks like the resolution happens before the TaskRun is created, and maybe we should instead create the TaskRun and allow the TaskRun controller to handle resolving the TaskRef. I agree that if the PipelineRun controller fails to create the TaskRun, the PipelineRun should be marked as failed.

@lbernick lbernick mentioned this issue Jan 5, 2023
7 tasks
@tekton-robot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 21, 2023
@tekton-robot
Copy link
Collaborator

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 20, 2023
@tekton-robot
Copy link
Collaborator

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

@tekton-robot
Copy link
Collaborator

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@afrittoli
Copy link
Member

@lbernick is this still an issue, should we reopen it? @yuzp1996 are you still working on this?

@lbernick
Copy link
Member Author

Yes, I was able to reproduce this today, thanks for checking!

@lbernick lbernick reopened this Jun 13, 2023
@github-project-automation github-project-automation bot moved this from Done to In Progress in Tekton Community Roadmap Jun 13, 2023
@lbernick lbernick removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jun 13, 2023
@lbernick lbernick added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jun 20, 2023
@lbernick lbernick mentioned this issue Jun 20, 2023
3 tasks
@lbernick lbernick self-assigned this Jun 21, 2023
@lbernick
Copy link
Member Author

Closed by #6854 and #6866.

@github-project-automation github-project-automation bot moved this from In Progress to Done in Tekton Community Roadmap Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
Status: Done
Development

No branches or pull requests

4 participants