Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

policy checks and terraform version inconsistent behavior #1606

Closed
snuggie12 opened this issue May 28, 2021 · 16 comments · Fixed by #1658
Closed

policy checks and terraform version inconsistent behavior #1606

snuggie12 opened this issue May 28, 2021 · 16 comments · Fixed by #1658
Labels
bug Something isn't working

Comments

@snuggie12
Copy link

We currently set which version of terraform to use inside of the terraform block per project like so:

terraform {
  required_version = "0.13.5"
}

When implementing policy checks we got some unreadable errors that we eventually figured out is because the default version of terraform is 0.12.12.

We didn't want to go through a docker deployment so I set the terraform_version in atlantis.yaml for my project/workspaces. I first tested it to v0.15.1 since I saw that is the latest. However, no plans ran because the required version was 0.13.5 and therefore no policy checks ran as it needs the plan file.

So my take on it is this. When the project has nothing set terraform 0.12.12 plans but knows to upgrade to 0.13.5 for the plan. The terraform show however is just reading a plan file.

Is there any way to read the terraform block or perhaps the plan file has the version that made the plan?

@snuggie12
Copy link
Author

Oh, I also forgot to ask: Will setting the default version environment variable be different behavior from when I'm setting at the project level? Would 0.15.1 plan correctly if using that env var?

@snuggie12
Copy link
Author

I tested my above question and it too fails but this time at terraform show instead of terraform plan.

So it would appear as if the same logic to determine TF version for terraform plan needs to happen for terraform show

@msarvar
Copy link
Contributor

msarvar commented Jun 3, 2021

I'm not exactly sure what is the issue here? If you're pinning your terraform to a specific tf version you should make sure that to have matching terraform_version fields for your projects in the atlantis.yaml. Atlantis doesn't dynamically decide what terraform version to use, it either uses default binary that is specified as env variable, or what is defined in the atlantis.yaml for your project. If you terraform code has different version in the terraform {} block then you either need to update you terraform block, or atlantis.yaml.

@msarvar msarvar added the waiting-on-response Waiting for a response from the user label Jun 3, 2021
@snuggie12
Copy link
Author

The problem is most combinations ensure that the terraform plan and terraform show commands use different versions and therefore fail. I'll list them using env var, Atlantis.yaml and terraform block:

Env var plus tf block: the plan runs the version from tf block and show is run with env var. If env var is higher the show will error. If lower it might error with the same message but ours was so low it couldn't read the plan.

Atlantis.yaml plus tf block: Atlantis.yaml seems to take precedence and our plan ran with the version stated there. Tried a higher version and the plan failed. Policy check ran despite there being no plan file and silently errored. I believe I tried a lower version than the tf block and also got an error?

Only thing I haven't tried is no env variable and the tf block. I thought I had tried that but turns out we had it set.

For the plan you clearly determine what version to use, download it and then use it. I think that same scan needs to be done for the show.

@endriu0
Copy link

endriu0 commented Jun 10, 2021

Just run into the same issue. My atlantis.yaml by default uses 0.12.31 / project it failed on is instead overriding it to 0.13.5 and plan runs perfectly with correct version.

Now because this is only a cleanup there were no changes in the plan which I believe meant that plan file didnt get created so conftest tried to run it with default - here's the failure :

running terraform show: running "/atlantis/bin/terraform0.12.31 show -no-color -json /atlantis/repos/terraform/2980/default/apps/my-app/qa/default.tfplan" in "/atlantis/repos/terraform/2980/default/apps/my-app/qa": exit status 1

@endriu0
Copy link

endriu0 commented Jun 10, 2021

Correction - it's consistent in it's failures on conftest whenever tf version doesn't match default

@uLan08
Copy link

uLan08 commented Jun 11, 2021

I just want to add that I am encountering the same error even though the Terraform version used for the plan and show is the same. I have also tried updating the default TF version on atlantis.

running terraform show: running "/root/.atlantis/bin/terraform0.13.5 show -no-color -json /root/.atlantis/repos/foo/foo-infrastructure/1493/default/terraform/aws/s3/opa-test-bucket/default.tfplan" in "/root/.atlantis/repos/foo/foo-infrastructure/1493/default/terraform/aws/s3/opa-test-bucket": exit status 1
# atlantis.yaml in repo
version: 3
projects:
  - dir: terraform/aws/s3/opa-test-bucket
    autoplan:
      enabled: true
    terraform_version: v0.13.5
# provider.tf
terraform {
  required_version = "= v0.13.5"
}

@snuggie12
Copy link
Author

I presume I cannot change the waiting-on-response label myself, but didn't want to be blocking progress on this.

Does my response and the two other users' comments (thanks for confirming the issue!) confirm an issue?

@snuggie12
Copy link
Author

I just want to add that I am encountering the same error even though the Terraform version used for the plan and show is the same. I have also tried updating the default TF version on atlantis.

running terraform show: running "/root/.atlantis/bin/terraform0.13.5 show -no-color -json /root/.atlantis/repos/foo/foo-infrastructure/1493/default/terraform/aws/s3/opa-test-bucket/default.tfplan" in "/root/.atlantis/repos/foo/foo-infrastructure/1493/default/terraform/aws/s3/opa-test-bucket": exit status 1

@uLan08 I needed to hop on the container and manually run terraform show as the atlantis user in order to see the actual error message. If your versions are the same and you got no errors during the plan phase it would seem like a different issue.

@msarvar
Copy link
Contributor

msarvar commented Jun 22, 2021

@snuggie12 sorry for a long response times. Let me try to summarize. There are 2 example workflows:

  1. When the combination of set DEFAULT_TERRAFORM_VERSION env variable and defined required_version in the terraform root, atlantis plan is dynamically using the version provided in required_version. But policy check uses env variable, and this will fail if required_version and DEFAULT_TERRAFORM_VERSION are not the same.
  2. When setting terraform_version in the atlantis.yaml plan fails due to mismatch required_version and terraform_version in atlantis.yaml. However, policy check runs but silently fails?

The first case is a bit odd to me and it should be failing the same way the second case fails, would you be able to share the config that is able to reproduce the issue. I need to debug it to understand the failure. Atlantis should be always using either env variable, or explicit terraform version defined in the repo config.

Regarding the second case, policy check doesn't run if any of your plans fail. So it is not silently failing, it is not running at all.

Let me know if I missed any detail.

@snuggie12
Copy link
Author

Yeah, I think we can ignore number 2 as atlantis prefers atlantis.yaml over required_version in the tf root. And I agree that if plan fails there's no plan file so terraform show can't run.

For number 1 it seems like you summarized it correctly. Which files do you need? Is the wrapper which determines version to use in plan identical to the one for show?

Either way, the setup should just be setting an env variable default that is different from in the tf root required_version. The only detail that might matter there is how different the versions are. Env var being lower might vary depending on how different, but if higher will always break:

Env var plus tf block: the plan runs the version from tf block and show is run with env var. If env var is higher the show will error. If lower it might error with the same message but ours was so low it couldn't read the plan.

@msarvar
Copy link
Contributor

msarvar commented Jun 22, 2021

More config you can share the better, server config and repo config can be a good start. Both plan and show use the same wrapper that defines the version, at least they both should.
You could also test the workflow with policy check disabled, I'm curious if terraform apply will work.

@mikedougherty
Copy link

hi @msarvar i have been working with @snuggie12 on this and i have fetched the following content from our atlantis container showing this issue:


Env var:

bash-5.0# echo $ATLANTIS_DEFAULT_TF_VERSION
v0.15.1

(we have also experienced this issue with 0.12.12 as mentioned previously)


Repository's atlantis.yaml:

bash-5.0# cat /home/atlantis/.atlantis/repos/missionlane/infra-terraform/1040/dev/atlantis.yaml

version: 3
automerge: true

projects:
  ## >> Other projects removed for brevity <<

  # monitoring
  - dir: workspaced/gcp/system-projects/projects/monitoring
    workspace: dev
  - dir: workspaced/gcp/system-projects/projects/monitoring
    workspace: infra-dev
  - dir: workspaced/gcp/system-projects/projects/monitoring
    workspace: infra-prod
  - dir: workspaced/gcp/system-projects/projects/monitoring
    workspace: staging
  - dir: workspaced/gcp/system-projects/projects/monitoring
    workspace: prod
  # monitoring

project's terraform block:

bash-5.0# cat /home/atlantis/.atlantis/repos/missionlane/infra-terraform/1040/dev/workspaced/gcp/system-projects/projects/monitoring/versions.tf

terraform {
  required_version = "0.13.5"

  backend "s3" {
    bucket               = "missionlane-terraform-state"
    dynamodb_table       = "missionlane-terraform-locks"
    encrypt              = "true"
    key                  = "terraform.tfstate"
    profile              = "missionlane-infra"
    region               = "us-east-1"
    workspace_key_prefix = "gcp-monitoring-project"
  }
  required_providers {
    google = {
      source = "hashicorp/google"
    }
    google-beta = {
      source = "hashicorp/google-beta"
    }
    gsuite = {
      source = "deviavir/gsuite"
    }
    kubernetes = {
      source = "hashicorp/kubernetes"
    }
  }
}

we do not have a server config.yaml, instead we configure the application through environment variables:

bash-5.0# env | sort | grep ATLANTIS_ | grep -Eiv 'secret|token'

ATLANTIS_ATLANTIS_URL=https://atlantis.internal.corp-prod-us-west-2.aws.missionlane.com
ATLANTIS_CHECKOUT_STRATEGY=merge
ATLANTIS_DEFAULT_TF_VERSION=v0.15.1
ATLANTIS_ENABLE_POLICY_CHECKS=false
ATLANTIS_GH_USER=missionlane-atlantis
ATLANTIS_GID=1841
ATLANTIS_HIDE_PREV_PLAN_COMMENTS=true
ATLANTIS_REPO_CONFIG=/config/repo-config.yaml
ATLANTIS_REPO_WHITELIST=github.com/missionlane/*
ATLANTIS_SILENCE_FORK_PR_ERRORS=true
ATLANTIS_SILENCE_VCS_STATUS_NO_PLANS=false
ATLANTIS_SILENCE_WHITELIST_ERRORS=true
ATLANTIS_UID=1841

@msarvar
Copy link
Contributor

msarvar commented Jun 22, 2021

@mikedougherty Thanks! This is helpful, can you also share sanitized version of the repo-config? Specifically I'm interested if you 're using any custom workflows?
Also looks like ATLANTIS_ENABLE_POLICY_CHECKS is set to false, do you see similar issue with atlantis apply as well or is it only policy check specific?

@msarvar
Copy link
Contributor

msarvar commented Jun 22, 2021

Actually nwm, I was able to reproduce the issue. I will mark it as a bug.

To repro:

  1. Setup atlantis for development following instructions.
  2. Set the default terraform version to anything >0.13 and enable policy checking.
  3. Create a new project in you target repository with main.tf containing -
terraform {
  required_version = "0.13.5"
}
  1. Create a PR and see policy checking fail.

@msarvar msarvar added bug Something isn't working and removed waiting-on-response Waiting for a response from the user labels Jun 22, 2021
@mikedougherty
Copy link

Specifically I'm interested if you 're using any custom workflows?

in fact we have modified the default workflow, which the project in question is using. i have to admit, i don't recall why we amended the default workflow, especially the TF_WORKSPACE var... hopefully that is not relevant to this issue though. you can also see the policy config in this file.

repos:
  - id: github.com/missionlane/infra-terraform
    apply_requirements:
      - approved
    allow_custom_workflows: true
    allowed_overrides:
      - apply_requirements
      - workflow
workflows:
  default:
    apply:
      steps:
        - apply
    plan:
      steps:
        - env:
            name: TF_WORKSPACE
            value: ""
        - init
        - plan:
            extra_args:
              - -lock=false
policies:
  owners:
    users:
      - mikedougherty
      - snuggie12
  policy_sets:
    - name: deny_authoritative_iam_resources
      path: /policies/deny_authoritative_iam_resources.rego
      source: local

hopefully this will also help in tracking down this bug. cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants