-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Atlantis fails to complete plan for a large environment. #452
Comments
Some of the debugging we did, hopefully this helps: Could it be related to this update? #421 |
@mignaulo it might be related to that. Is there a |
Also, what happens when you run the command manually? |
I was able to run the command manually successfully. I did not look for a crash.log entry unfortunately :/ |
Can you reproduce it? Does it happen every time? |
Are you running |
It happened every time while running |
Locally or in the same dir and on the same server as Atlantis? |
I ran the same command Atlantis ran in the same directory on the same server as Atlantis. |
Is there any way for me to reproduce this locally? Does it happen for all of your Terraform projects or is it only for that large one? |
Hi - It only happens for our largest project. To give you an idea of size, the state file is 1.6MB. Plans take 3 or 4 minutes on a fairly small Fargate instance. (when we ran into issues, we switched to EC2-backed ECS so we could SSH into the container and get more information. We switched back since then) |
Can I build a custom Atlantis image for you with extra logging? Do you need anything special in the image or do you just use the default image. |
We just use the default image 👍 We'll be happy to run some tests on your custom image. |
Okay so I've created a Docker image I'd like you to
Thanks so much for helping me debug this. |
For anyone reading along, Olivier helped me debug this and even came up with the solution! Basically, I wasn't reading off the OS pipe concurrently and so if there was enough output, terraform would stall waiting for the pipe to empty out and Atlantis would wait until terraform finished executing before reading off the pipe ==> deadlock. The fix was to read off the pipe concurrently but while testing the fix, I realized the terraform panic that my original code changes were meant to solve was actually being correctly caught using |
Thanks @lkysow! |
With the latest Atlantis container, we've run into an issue where our largest TF environment never completes its Atlantis plan. This environment has a couple hundred resources tracked in it. The logs indicate that it reaches the "terraform workspace show" section of the auto-plan, but the actual plan never completes:
Beyond that we get no further logs entries for this environment.
If I manually delete the lock on the environment and comment "atlantis plan" again on the PR, I get this error:
the default workspace is currently locked by another command that is running for this pull request–wait until the previous command is complete and try again
Since there are no more locks available to delete from the UI, the only solution is to manually apply the Terraform changes outside of Atlantis and then merge the PR, despite the CI check failure.
We do not have this behavior in our other, smaller environments (approx a dozen resources) -- just this very large one.
This does not occur in version 0.4.13 -- the plan successfully completes for the same large environment.
The text was updated successfully, but these errors were encountered: