-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pipelines v0.43.1
installation nondeterministically fails to update PipelineRun status with TaskRun statuses
#5964
Comments
Looking at the PipelineRun in question:
The reason the PipelineRun isn't being updated with the statuses of the TaskRuns is because the TaskRun ownerReferences are for a v1beta1 PipelineRun, but the PipelineRun is v1beta1. It seems like you originally applied a v1beta1 PipelineRun to the cluster and it got converted to v1 (and same with the TaskRun), but the child TaskRuns' ownerReferences weren't changed, so the controller doesn't know how to update the PipelineRun. /priority critical-urgent |
Another piece of curiosity: the "stuck in I tried the identical setup in a 2nd cluster and the first one |
@bendory if you don't mind I'm going to split this into two different bugs; I'll keep this one to track the PipelineRun status updates and open a new one for the issue observed with the CLI. |
v0.43.0
installation results in errors watching PipelineRunv0.43.0
installation nondeterministically fails to update PipelineRun status with TaskRun statuses
I certainly don't mind! Thank you for your attention on this! |
v0.43.0
installation nondeterministically fails to update PipelineRun status with TaskRun statusesv0.43.1
installation nondeterministically fails to update PipelineRun status with TaskRun statuses
This PipelineRun was originally created as v1beta1, but it is being converted to v1 and I'm not sure why. Here, the k8s docs say "If you update an existing object, it is rewritten at the version that is currently the storage version. This is the only way that objects can change from one version to another." Since our storage version is v1beta1, I'm surprised anything is converting the PipelineRun to v1. One possibility is that this is due to some interaction with kubectl. After thinking about this a bit more, it's not clearly problematic that the PipelineRun's childReferences are v1beta1 TaskRuns and the TaskRun's ownerReferences are a v1beta1 PipelineRun even though the version of these resources returned by @bendory have you observed this flake more than once? |
I think this corresponds with |
Yes, when playing earlier today, I observed it in two different clusters. |
aha! In the output of
coming from this line of code. I think the problem here is with our conversion logic. With embedded status = full, when we convert v1beta1 to v1, we have a v1 pipelinerun w/ taskruns in annotations, and childreferences in status. When converting back to v1beta1, we have a pipelinerun with both childreferences and taskruns in the status, which is invalid (see this test case). I'm still not sure why conversion is going on, or what's causing the nondeterministic behavior, but the error from this event should be fixed by changing our conversion logic. |
Thanks Lee, that makes sense! I think this test case is only valid in |
A guess could be a v1beta1 PipelineRun got converted to v1, so the TaskRuns got converted as Annotations in v1 and then converted back to v1beta1 to be stored in etcd. (This is also the only case I can think of that there could be both |
I also see this in the controller logs, which makes me more confident that this is causing the problem:
Unfortunately, it doesn't seem like we can reliably reproduce the error, but I'm going to run a few PipelineRuns with #5968 applied to my cluster and see if it recurs.
I think that's exactly the problem-- our test cases don't depend on the value of this flag and they need to. Maybe better to move this discussion to #5968 if the test cases need more discussion?
I'm not sure what you mean; you're correct that nobody is going to create a v1 pipelinerun with the relevant annotations but we need to be able to convert round-trip and with the existing logic I don't think we can.
Sorry, I should have been more clear. I'm not sure why the conversion webhook is being called at all. According to the docs this should only happen if a resource is created in a version other than the stored version, so why are we observing a PipelineRun being converted from v1beta1 to v1? |
Expected Behavior
(Edited: moved error message from tkn to tektoncd/cli#1847)
PipelineRun
status reflects status of the child TaskRunsActual Behavior
tkn pr list
shows thePipelineRun
asRunning
even thoughtkn tr list
shows all relatedTaskRun
s asSucceeded
Steps to Reproduce the Problem
PipelineRun
that ran cleanly inv0.42.0
v0.43.1
kubectl create|apply --filename ...
tkn pr logs --last -f
Additional Info
I expected this to be resolved in
v0.43.1
by #5945 and #5948, but no joy. 😿The text was updated successfully, but these errors were encountered: