Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(executor): Fix compatibility issue when selfLink is no longer populated for k8s>=1.21. Fixes #6045 #6014

Merged
merged 2 commits into from
Jun 2, 2021

Conversation

terrytangyuan
Copy link
Member

@terrytangyuan terrytangyuan commented May 26, 2021

See context in #6008. Also Fixes #6045.

Checklist:

Tips:

  • Your PR needs to pass the required checks before it can be approved. If the check is not required (e.g. E2E tests) it does not need to pass
  • Sign-off your commits to pass the DCO check: git commit --signoff.
  • Run make pre-commit -B to fix codegen or lint problems.
  • Say how how you tested your changes. If you changed the UI, attach screenshots.

@codecov
Copy link

codecov bot commented May 26, 2021

Codecov Report

Merging #6014 (1cc1b86) into master (f1fcb43) will increase coverage by 0.01%.
The diff coverage is 81.81%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6014      +/-   ##
==========================================
+ Coverage   47.52%   47.53%   +0.01%     
==========================================
  Files         248      248              
  Lines       15721    15730       +9     
==========================================
+ Hits         7472     7478       +6     
- Misses       7312     7316       +4     
+ Partials      937      936       -1     
Impacted Files Coverage Δ
workflow/executor/resource.go 28.96% <81.81%> (+3.67%) ⬆️
workflow/metrics/server.go 12.50% <0.00%> (-4.17%) ⬇️
cmd/argo/commands/get.go 56.45% <0.00%> (-0.65%) ⬇️
workflow/controller/operator.go 71.02% <0.00%> (+0.05%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f1fcb43...1cc1b86. Read the comment docs.

@terrytangyuan
Copy link
Member Author

terrytangyuan commented May 26, 2021

@alexec Only tests for a few of the executors failed and the errors are all different. Is there anything special about the CI environment for different executors (e.g. permissions) that I should be aware of?

The following log is from build for emissary executor:

logs | k8s-resource-tmpl-with-wf-m6jjx main Error: signal: terminated
1050
 controller | time="2021-05-26T04:37:38.961Z" level=info msg="Create pods/exec 500"
1051
 controller | time="2021-05-26T04:37:38.962Z" level=warning msg="failed to clean-up pod" action=terminateContainers error= key=argo/k8s-resource-tmpl-with-wf-m6jjx/terminateContainers

Different error in PNS executor:

 controller | time="2021-05-26T04:38:32.144Z" level=info msg="Pod failed: Error (exit code 143)" displayName=k8s-resource-tmpl-with-wf-hnk9z namespace=argo pod=k8s-resource-tmpl-with-wf-hnk9z templateName=main workflow=k8s-resource-tmpl-with-wf-hnk9z

@alexec
Copy link
Contributor

alexec commented May 26, 2021

Step 1. Let's re-run the jobs.

@terrytangyuan
Copy link
Member Author

terrytangyuan commented May 26, 2021

Okay tried a couple of things. It looks like each time different executors failed so there maybe something unstable hidden in the CI environment itself. Also the failure is unrelated to the change but rather the new test case itself: https://github.com/argoproj/argo-workflows/actions/runs/880489742

@terrytangyuan
Copy link
Member Author

I've located the root cause which will be fixed in #6033. I'll clean up this PR once #6033 is merged so this PR can focus on supporting k8s>=1.21.

@terrytangyuan terrytangyuan changed the title fix(executor): Fix compatibility issue when selfLink is no longer populated for k8s>=1.21 with tests fix(executor): Fix compatibility issue when selfLink is no longer populated for k8s>=1.21 with tests. Fixes #6045 May 29, 2021
@terrytangyuan terrytangyuan changed the title fix(executor): Fix compatibility issue when selfLink is no longer populated for k8s>=1.21 with tests. Fixes #6045 fix(executor): Fix compatibility issue when selfLink is no longer populated for k8s>=1.21. Fixes #6045 May 29, 2021
…ulated for k8s>=1.21

Signed-off-by: terrytangyuan <terrytangyuan@gmail.com>
@terrytangyuan terrytangyuan marked this pull request as ready for review June 1, 2021 20:36
@terrytangyuan
Copy link
Member Author

This PR has been rebased after #6033 got merged. @alexec Feel free to review whenever you get a chance.

Signed-off-by: terrytangyuan <terrytangyuan@gmail.com>
@alexec alexec added this to the v3.2 milestone Jun 1, 2021
@alexec
Copy link
Contributor

alexec commented Jun 1, 2021

Lets hold this for v3.2

@christopheblin
Copy link
Contributor

@alexec

Lets hold this for v3.2

Without this fix, "workflow-of-workflows" is not working with k8sapi executor under "new" k8s versions (basically, I think any resource creation that takes a bit of time is not working, so it means Workflows but also Deployments)

Can you explain why the fix is hold ? Will it be available in latest or in a 3.1.1 soon ?

Or is there a workaround to "wait" the resource in a next step ?

@alexec
Copy link
Contributor

alexec commented Jun 2, 2021

Are we saying that self link issue is already impacting production systems? I.e. 1.21 is now running in production?

@alexec alexec merged commit 803855b into argoproj:master Jun 2, 2021
@alexec alexec modified the milestones: v3.2, v3.1 Jun 2, 2021
@terrytangyuan terrytangyuan deleted the fix-k8s-compa branch June 2, 2021 18:53
@christopheblin
Copy link
Contributor

Thanks for the merge @alexec 👍

I am running Azure Kubernetes Services (AKS) which, by the analysis of logs by @terrytangyuan in #6045, has the selfLink issue

I'll try to test the "latest" image tomorrow (CET) and I'll come back with a new issue if needed

@sarabala1979 sarabala1979 mentioned this pull request Jun 10, 2021
88 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

workflow-of-workflows : waiting "child" workflow is not working
3 participants