Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move asoctl:ci target to run sequentially #2744

Closed
wants to merge 2 commits into from
Closed

Conversation

super-harsh
Copy link
Collaborator

What this PR does / why we need it:

Moving asoctl:ci target to run sequentially. As we're experiencing timeouts randomly this could be because of a race in the generator:ci and asoctl:ci trying to modify the files parallelly.

@codecov-commenter
Copy link

codecov-commenter commented Feb 23, 2023

Codecov Report

Merging #2744 (b0096b5) into main (a64582c) will increase coverage by 0.00%.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main    #2744   +/-   ##
=======================================
  Coverage   56.01%   56.01%           
=======================================
  Files         970      970           
  Lines      348950   348950           
=======================================
+ Hits       195472   195475    +3     
+ Misses     122666   122664    -2     
+ Partials    30812    30811    -1     
Impacted Files Coverage Δ
...l/armconversion/convert_to_arm_function_builder.go 86.43% <0.00%> (+0.88%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

- crossplane:ci
cmds:
- task: asoctl:ci
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im not convinced that this is the issue. If you look at an example stuck run:
https://github.com/Azure/azure-service-operator/actions/runs/4249026647/jobs/7388790440

The final logs are:

[generator:unit-tests-cover] go: downloading golang.org/x/text v0.7.0
[generator:unit-tests-cover] go: downloading github.com/josharian/intern v1.0.0
[generator:unit-tests-cover] go: downloading golang.org/x/sys v0.5.0
task: [controller:generate-genruntime-deepcopy] find ./pkg/genruntime -type f -name "zz_generated.*" -delete
task: [controller:generate-genruntime-deepcopy] cd pkg/genruntime && controller-gen object:headerFile=/workspace/v2/boilerplate.go.txt paths=./...
task: [controller:generate-genruntime-deepcopy] cd pkg/genruntime && gofmt -l -s -w .
[controller:generate-genruntime-deepcopy] conditions/zz_generated.deepcopy.go
[controller:generate-genruntime-deepcopy] zz_generated.deepcopy.go
Error: The operation was canceled.

To me that looks like either controller:generate-genruntime-deepcopy or generator:unit-tests-cover are getting stuck, not asoctl, which can't even be building at the time those steps run as it has a dependency on controller:generate-crds which hasn't happened yet.

I think we should add some logs to the end of the controller:generate-genruntime-deepcopy and generator:unit-tests-cover steps to see if they are actually finishing. Then when things get stuck again we can determine if the issue is either of those commands.

We may also want to stick a log at the beginning of produce-markdown-summary as that task is deferred til after the generator:unit-tests-cover finishes -- so it should run immediately after, but obviously isn't.

@matthchr
Copy link
Member

For reference here's a successful run to compare that stuck run to: https://github.com/Azure/azure-service-operator/actions/runs/4249442170/jobs/7389599661

@matthchr
Copy link
Member

Actually it looks like this might be related to:
keycloak/keycloak-benchmark#113
go-task/task#715

It sounds like it's possible to deadlock go-task if we use run: when_changed or run: once -- some of the issues say when -C is 1, but others seem to suggest any usage of -C can cause it. We use -C 2.

@theunrepentantgeek
Copy link
Member

This has been discussed over on the Task Discord - best hypothesis was that something was trying to acquire the lock to select the next task to run before the lock was released for the previous one.

@matthchr
Copy link
Member

If that's the case, rather than force asoctl to run sequentially maybe we should just drop our usage of -C?

@theunrepentantgeek
Copy link
Member

theunrepentantgeek commented Feb 26, 2023

We added -C flag to limit the concurrency of tasks because we were overloading the build machines (they only have two vCPUs and not that much memory) and things were running really really slowly.

Have a look at #2475 (Limit concurrency of CI builds).

The information I've seen about a potential deadlock in task seemed to indicate the problem was only with -C 1 and we're already using -C 2 - see #2485 (Lift concurrency to 2 simultaneous tasks).

If we want to get rid of the -C flag, we need to go to our own build agents with more vCPU and memory, else we'll remain in Build hell.

@super-harsh super-harsh deleted the test/asoctl-ci branch June 20, 2023 01:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

4 participants