Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implicit Reference Ordering with Data Sources in 0.13 #25961

Closed
bflad opened this issue Aug 21, 2020 · 4 comments · Fixed by #26284
Closed

Implicit Reference Ordering with Data Sources in 0.13 #25961

bflad opened this issue Aug 21, 2020 · 4 comments · Fixed by #26284
Labels
bug config confirmed a Terraform Core team member has reproduced this issue explained a Terraform Core team member has described the root cause of this issue in code v0.13 Issues (primarily bugs) reported against v0.13 releases
Milestone

Comments

@bflad
Copy link
Contributor

bflad commented Aug 21, 2020

Terraform Version

Terraform v0.13.0
+ provider registry.terraform.io/hashicorp/aws v3.3.0

Terraform Configuration Files

# PLEASE NOTE:
#
# The below Terraform configuration represents an anti-pattern in typical usage.
# Practitioners should normally just reference the resource attributes
# or submit feature requests for any missing resource attributes.
# However, this pattern is extensively used in provider acceptance testing of
# data sources and combined with resource.TestCheckResourceAttrPair() checks.
#
# This is also presented here just in case there are valid configuration usage patterns
# for this, where this potentially represents a real-world bug.

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "3.3.0"
    }
  }

  required_version = "0.13.0"
}

provider "aws" {
  region = "us-east-2"
}

resource "aws_ssm_document" "test" {
  name          = "bflad-testing"
  document_type = "Command"

  content = <<DOC
{
  "schemaVersion": "1.2",
  "description": "Check ip configuration of a Linux instance.",
  "parameters": {},
  "runtimeConfig": {
    "aws:runShellScript": {
      "properties": [
        {
          "id": "0.aws:runShellScript",
          "runCommand": [
            "ifconfig"
          ]
        }
      ]
    }
  }
}
DOC
}

data "aws_ssm_document" "test" {
  # depends_on = [aws_ssm_document.test]
  name = aws_ssm_document.test.name
}

output "content" {
  value = data.aws_ssm_document.test.content
}

output "document_format" {
  value = data.aws_ssm_document.test.document_format
}

Debug Output

Please ask if you need this.

Expected Behavior

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

Outputs:

content = {
  "schemaVersion": "1.2",
  "description": "Check ip configuration of a Linux instance.",
  "parameters": {},
  "runtimeConfig": {
    "aws:runShellScript": {
      "properties": [
        {
          "id": "0.aws:runShellScript",
          "runCommand": [
            "ifconfig"
          ]
        }
      ]
    }
  }
}

document_format = JSON

Actual Behavior

On terraform apply with no existing state/resources:

Error: Error reading SSM Document: InvalidDocument: Document with name bflad-testing does not exist.

Using implicit attribute references ordered operations correctly in Terraform 0.12 and seems to imply that Terraform is now ordering the data source read before the resource creation. If we uncomment the depends_on configuration, it works properly:

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create
 <= read (data resources)

Terraform will perform the following actions:

  # data.aws_ssm_document.test will be read during apply
  # (config refers to values not yet known)
 <= data "aws_ssm_document" "test"  {
      + arn           = (known after apply)
      + content       = (known after apply)
      + document_type = (known after apply)
      + id            = (known after apply)
      + name          = "bflad-testing"
    }

  # aws_ssm_document.test will be created
  + resource "aws_ssm_document" "test" {
      + arn              = (known after apply)
      + content          = jsonencode(
            {
              + description   = "Check ip configuration of a Linux instance."
              + parameters    = {}
              + runtimeConfig = {
                  + aws:runShellScript = {
                      + properties = [
                          + {
                              + id         = "0.aws:runShellScript"
                              + runCommand = [
                                  + "ifconfig",
                                ]
                            },
                        ]
                    }
                }
              + schemaVersion = "1.2"
            }
        )
      + created_date     = (known after apply)
      + default_version  = (known after apply)
      + description      = (known after apply)
      + document_format  = "JSON"
      + document_type    = "Command"
      + document_version = (known after apply)
      + hash             = (known after apply)
      + hash_type        = (known after apply)
      + id               = (known after apply)
      + latest_version   = (known after apply)
      + name             = "bflad-testing"
      + owner            = (known after apply)
      + parameter        = (known after apply)
      + platform_types   = (known after apply)
      + schema_version   = (known after apply)
      + status           = (known after apply)
    }

Plan: 1 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

aws_ssm_document.test: Creating...
aws_ssm_document.test: Creation complete after 0s [id=bflad-testing]
data.aws_ssm_document.test: Reading...
data.aws_ssm_document.test: Read complete after 0s [id=bflad-testing]

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

Outputs:

content = {
  "schemaVersion": "1.2",
  "description": "Check ip configuration of a Linux instance.",
  "parameters": {},
  "runtimeConfig": {
    "aws:runShellScript": {
      "properties": [
        {
          "id": "0.aws:runShellScript",
          "runCommand": [
            "ifconfig"
          ]
        }
      ]
    }
  }
}

document_format = JSON

Steps to Reproduce

In an account without the named SSM Document:

  1. terraform init
  2. terraform apply

Additional Context

  • For a long while now, implicit attribute reference ordering has meant that various configuration components do not require depends_on explicit ordering. This change would be an anti-pattern comparatively.
  • If the new behavior is intended and required, providers will not be able to acceptance test against both Terraform 0.12 and 0.13 easily since the configuration will need to be different for each version or every test will need to be split into multiple configurations and applied separately.
@bflad bflad added bug new new issue not yet triaged labels Aug 21, 2020
@jbardin
Copy link
Member

jbardin commented Aug 21, 2020

Thanks @bflad!

I don't think this was documented as working in 0.12, since aws_ssm_document.test.name is statically defined and "known" within the config when we initially need to read the data source. It however did work because data sources were only read during Refresh, and due to the behavior of the evaluator and the Refresh walk, the reference to aws_ssm_document.test.name appears to be unknown at that time.

References in core are always evaluated through the state. During Apply this is obvious, since resources are written to the state as they are created. During Plan we have a temporary working state, in which partial resources are written in order to be evaluated. During Refresh, only resources that exist already are written to the state, which means that even if new resource configurations contain static values, they cannot be evaluated at that time. This is what causes the aws_ssm_document config to appear as unknown, deferring the initial read. This Refresh behavior also currently still exists in 0.13.

In 0.13 we now have the ability to plan data sources (in preparation for the future handling of Refresh during plan as well). The above failure in 0.13 is happening during Plan, where we do have the static config values available in the state for evaluation. This means that when checking the config of aws_ssm_document during plan it is wholly-known, and it appears we can proceed with reading the data source.

Since this is also the behavior of resources within the configuration, I'm not sure yet how we want to go about handling the change. This better aligns the behavior with the interpolation in other contexts (resources, providers, locals, etc), where the config can be treated as if it were evaluated as a whole, so I'm not sure about creating an exception for data sources during Plan (or if that's possible yet). For now we can consider the use of depends_on as a workaround, though that will not be possible when the value is passed in through a module input, since there would be no resource in scope to reference.

@jbardin jbardin added confirmed a Terraform Core team member has reproduced this issue explained a Terraform Core team member has described the root cause of this issue in code config v0.13 Issues (primarily bugs) reported against v0.13 releases and removed new new issue not yet triaged labels Aug 21, 2020
@apparentlymart
Copy link
Contributor

apparentlymart commented Aug 21, 2020

There's definitely some mulling to be done here, because we have both some historical tradeoffs and some forward-looking tradeoffs to consider.

On a purely historical note: this idea of using the "known-ness" of the data resource configuration to decide whether to defer it was always a "best we can do" tradeoff back in the original data source implementation in 0.7, because our design principles at the time said that we would not do any network requests during planning. That forced moving the data source reads into the refresh phase, which in turn meant relying on this imprecise heuristic to decide whether to defer to the apply phase.

We've since changed that "no network requests during planning" principle, in recognition of the fact that accurate planning sometimes just requires making network requests, and our forthcoming planned merger of the refresh and plan walks brings that to its logical conclusion. That means we have an opportunity to revisit the original design tradeoffs for data sources, with some different (looser?) design constraints.

I can imagine a few sorts of static analysis we could do against the diff once we have the refresh and plan walks merged -- in principle during the plan walk we could see that aws_ssm_document.test.name has a + diff on it and defer it even though the name is known. However, that sort of analysis would only work if the reference were totally static, and that tends to not be the case in common cases because folks (reasonably) want to do their lookups dynamically based on each.key and count.index, for example.

Therefore that sort of static analysis would, in a lot of cases, just degenerate to something simpler to implement and simpler to explain: treating any expression reference to aws_ssm_document.test in exactly the same way as an explicit depends_on = [aws_ssm_document.test], deferring the data source to apply time if there's any pending change to that resource at all, even if it doesn't include a change to the name attribute.

That would upset some other use-cases that are possible today by relying on certain resource arguments being known at plan time, but I think those cases are less common and users can work around them by factoring out the known-value expression into a local value and referring to that in both places, rather than having the data resource refer to the managed resource:

locals {
  ssm_document_name = "bflad-testing"
}

resource "aws_ssm_document" "test" {
  name = local.ssm_document_name
  # etc...
}

data "aws_ssm_document" "test" {
  # Refer to the local value if for some reason you _don't_ want
  # this data resource to be deferred until after the managed
  # resource is applied.
  name = local.ssm_document_name
}

Such a change won't be possible in the 0.13.x series because it would be breaking for some less-common cases, but perhaps we can do something like this in 0.14.

@bflad
Copy link
Contributor Author

bflad commented Aug 27, 2020

Just got asked internally about this behavior for provider testing so commenting here to highlight the challenges on our side.

treating any expression reference to aws_ssm_document.test in exactly the same way as an explicit depends_on = [aws_ssm_document.test], deferring the data source to apply time if there's any pending change to that resource at all, even if it doesn't include a change to the name attribute.

From the provider testing perspective, this represents the behavior of Terraform core prior to 0.13 (at least from my knowledge/testing in 0.10, 0.11, and 0.12). Effectively without doing anything, Terraform Providers cannot use the new Terraform Plugin SDK binary testing framework against 0.13.0 and later without significant code changes, which puts us in a bind between these choices:

  • Spent immediate time now to refactor significant portions of data source testing (e.g. splitting up TestStep so only the second test configuration contains data sources) only to have that effort potentially be unnecessary for Terraform 0.14 and later again
  • Not perform real-world testing against Terraform 0.13.x releases and potentially not be able to reproduce certain behaviors

I'll work with the other official provider teams to gather the number of affected tests/configurations for those who can test with the binary testing framework.

Understandably, changing this behavior potentially represents a breaking change. I'm just worried we may want to expedite that to remove this confusion in the Terraform Provider ecosystem, unless we want to say this is a regression and treat it as a bug fix.

@ghost
Copy link

ghost commented Oct 22, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked as resolved and limited conversation to collaborators Oct 22, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug config confirmed a Terraform Core team member has reproduced this issue explained a Terraform Core team member has described the root cause of this issue in code v0.13 Issues (primarily bugs) reported against v0.13 releases
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants