-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: plan queue functionality #2782
base: main
Are you sure you want to change the base?
Conversation
Due to the related discussion having so many upvotes, it might be worth resurrecting this. @ghaiszaher would you be able to resolve the conflicts and verify that this works in your system? |
@nitrocode I resolved the conflicts but I discovered a bug when trying to replan a PR that already has the lock, because with the current logic there is a call to I am working on a fix. |
@nitrocode I fixed the bug and tried it locally (only with BoltDB), it's working properly now. |
@ghaiszaher that's great to hear. Looking forward to your changes. I changed the pr body into a check list so we can follow your progress. 😄 thank you for your efforts! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nitrocode thanks a lot. I fixed some of the issues and updated the PR description. We're getting closer to a production-ready version but still need some help.
I am moving the initial discussions started in the old PR to this one.
About your questions on the tests - yes we have not implemented all of them, as we first wanted to check with you if this is some functionality that sounds like a good idea, and if how we go about implementing it would make sense to you (as mentioned in the PR description). So my understanding is "yes it does" ? :)
We could continue by applying the suggestions and extending tests. Would that make sense to you? What do you expect from us to call our PR "production ready", and get closer to merging it?
One topic we circled around but never properly got to, is feature flag to enable/disable the feature. If you had any suggestions there, it would be great (meaning how feature flags are usually approached in this codebase, we would not want to break some convention).
What do you think about these points? Just to know how to continue with some parts.
Also feel free to review the PR in its current state 😄
server/events/project_locker.go
Outdated
// TODO monikma #8 this will be called if there was a plan error and the lock was automatically dropped; | ||
// Should we assure dequeuing of the next PR here too? | ||
_, _, err := p.Locker.Unlock(lockAttempt.LockKey, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Original discussion: #1923 (comment)
Yes, what do you think, what would make more sense?
- Just drop everything and wait for the lock owner to fix their plan and acquire the lock again, or
- Say "your loss, now you are out of the queue", and dequeue the next PR, or
- Quickly re-acquire the lock for this PR so that the owner can plan as soon as they fixed it
.. we have not really decided. I guess 1 is not a good option as then a third person could accidentally get the lock.. also 3 would not be fair since fixing can take time. So I guess 2. Yes, we have not implemented it yet, we were waiting for some general feedback on the functionality first :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Option 1: this would mean anyone can obtain the lock and get in front of the queue in the meantime
Option 3: That was our initial option but we were not sure why we release the lock in case of plan error in the first place.
Maybe we can also have Option 4: feature flag to either go with Option 2 (dequeue on plan error) or Option 3 (always keep the lock on plan error).
@nitrocode any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc: @runatlantis/maintainers and @krrrr38 for visibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like Option 4 that way the behavior is configurable to the user workflow that they prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jamengual since this PR is getting very large, do you think it makes sense to introduce such feature flag in a follow-up, and for now just give the lock to the next in queue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can add them in another PR but the feature if approved will not be released if the flags are not added, so we will have to have both PRs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, then I will consider it in this PR if you don't mind the size.
@@ -79,6 +79,88 @@ func TestRedisWithTLS(t *testing.T) { | |||
_ = newTestRedisTLS(s) | |||
} | |||
|
|||
func TestGetQueueByLock(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of repetition between this file and boltdb_test.go
. Would it be possible to combine all these in one generic file, and let the tests run twice using both redis and boltdb (since we expect the exact same behavior)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps. Did you want to do this refactoring as a separate pr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was answered by @jamengual in this comment: #2782 (comment)
But if this is a valid option, I can try to work on it next.
mmmm I don't know if this could make it confusing for people since it does
not follow the stablished pattern.
…On Sun, Jan 29, 2023, 8:09 a.m. Ghais Zaher ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In server/core/redis/redis_test.go
<#2782 (comment)>
:
> @@ -79,6 +79,88 @@ func TestRedisWithTLS(t *testing.T) {
_ = newTestRedisTLS(s)
}
+func TestGetQueueByLock(t *testing.T) {
There's a lot of repetition between this file and boltdb_test.go. Would
it be possible to combine all these in one generic file, and let them run
using both redis and boltdb (since we expect the exact same behavior)?
—
Reply to this email directly, view it on GitHub
<#2782 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAQ3ERD4HPZQBKXU44ZBWFDWU2I3LANCNFSM6AAAAAAS4KLVGA>
.
You are receiving this because your review was requested.Message ID:
***@***.***>
|
server/controllers/api_controller.go
Outdated
@@ -92,7 +92,7 @@ func (a *APIController) Plan(w http.ResponseWriter, r *http.Request) { | |||
a.apiReportError(w, http.StatusInternalServerError, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add some end to end tests too since this is such a large change?
u.commentOnDequeuedPullRequests(ctx, dequeueStatus) | ||
} | ||
|
||
func (u *UnlockCommandRunner) commentOnDequeuedPullRequests(ctx *command.Context, dequeueStatus models.DequeueStatus) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add tests for these functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nitrocode same issue here, should this function be exported just for the tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nitrocode any thoughts on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ghaiszaher Looks like some tests already exist in command_test_runner.go
@@ -114,6 +127,22 @@ var IndexTemplate = template.Must(template.New("index.html.tmpl").Parse(` | |||
<a href="{{ $basePath }}{{.LockPath}}"> | |||
<div class="twelve columns button content lock-row"> | |||
<div class="list-title">{{.RepoFullName}} <span class="heading-font-size">#{{.PullNum}}</span> <code>{{.Path}}</code> <code>{{.Workspace}}</code></div> | |||
{{ if .Queue }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any web_templates
tests that we can add for these changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't find any test regarding the rendered content
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I also could not find a test for this. I was hoping maybe we could add one in this PR to test verify/validate this change.
30eae01
to
ff9e4a4
Compare
@ghaiszaher do you think you will have time to finish this? Thanks. |
@jamengual Yes. In the meantime, I would appreciate your opinion on the question in this discussion: #2782 (comment) |
// TODO(Ghais): make sure it's fine to remove this line: | ||
//_, _, err = p.lockingLocker.UnlockByPull(baseRepo.FullName, pull.Num) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jamengual what do you think about deleting this line? We wouldn't really benefit from removing all the locks here. As long as the plans are deleted with p.deletePlans
, it should be fine to keep the locks assigned to the current pull requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to bring @GenPage into this conversation but I do agree with you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not against it, but I have some concerns about the plan deletion code within pending_plan_finder, it's a bit dated, and it doesn't make sense to me while we handle deleting plans in multiple places. Wouldn't it be better to use the deletePlan function in working_dir.go instead?
# Conflicts: # server/events/delete_lock_command.go # server/events/delete_lock_command_test.go
I've started reviewing this but will take some time. It's a pretty big change |
We will place this on hold as we want to introduce more locking around a plan queue when we have a solution for the current regressions concerning #3345. I would like more eyes on this PR before introducing this new feature. |
what
This is a follow-up on feat: plan queue functionality #1923 which was closed because the fork was deleted.
The implementation of the queue functionality is complete and the code is almost production-ready. We are seeking your assistance and would appreciate your feedback if this is the right direction and if this functionality could be considered.
What we have done so far
atlantis unlock
command, from Atlantis UI or by merging/closing the PR).atlantis unlock
or from UI), the delete lock call fails withdeleting lock failed with: remove .../default.tfplan: no such file or directory
-> this is fixed by ignoring non-existing file errors when deleting a planWhat is missing:
Note: This should also be done if the queue is disabled
web_templates
tests
why
Reasons are explained in the discussion #1722
references
contributors
@monikma @amir-elgayed @ghaiszaher