Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow provider to define uniqueness of a resource #224

Open
radeksimko opened this issue Oct 30, 2019 · 7 comments
Open

Allow provider to define uniqueness of a resource #224

radeksimko opened this issue Oct 30, 2019 · 7 comments
Labels
enhancement New feature or request upstream-protocol Requires change of protocol specification, i.e. can't be done under the current protocol upstream-terraform

Comments

@radeksimko
Copy link
Member

radeksimko commented Oct 30, 2019

Problem Statement

ID field

Each resource today has an ability to set an ID.

This was and to some extent still is treated as a "meta-field" which is special-cased throughout Terraform and SDK, e.g. it has its own getter/setter (ResourceData.GetId(), ResourceData.SetId()), existence of ID was (in Terraform <0.12) internally used as indicator whether a resource exists or not and it is currently the easiest way to import resources (via their ID) and id is always available in CRUD but must not be overridden in the schema.

ID is however assigned when the resource has either started the process of creation or finished creating (in CRUD). Terraform currently has no way of constructing unique identifier beforehand and therefore no way of predicting whether it will collide with any other unique identifier (it doesn't even attempt to do so for that reason).

Example

DNS is a globally shared namespace and it's possible that in some DNS providers you are able to create duplicate domains for valid reasons (e.g. migration), but often times this is not possible. Route53 will return error for one of the following zones - depending on which one is scheduled for creation 1st while Terraform parallelises requests.

resource "aws_route53_zone" "primary" {
  name = "example.com"
}

resource "aws_route53_zone" "secondary" {
  name = "example.com"
}

Proposal

Implementation is likely to be subject to RFC & further discussion.

The schema could allow marking certain fields as "identifier", then core would be able to query provider for such enhanced schema and combined with values from the config it should be able to tell if there are any duplicate resources.

(hypothetical) Example

func resourceAwsRoute53Zone() *schema.Resource {
	return &schema.Resource{
		// ...
		Schema: map[string]*schema.Schema{
			"name": {
				Type:         schema.TypeString,
				Required:     true,
				ForceNew:     true,
				IsIdentifier: true,
			},

then assuming the HCL config from above

$ terraform plan

Error: 2 duplicate resources found.

  on main.tf line 2, in resource "aws_route53_zone" "primary":
   1: resource "aws_route53_zone" "primary" {
   2:     name = "example.com"

Better and more detailed diagnostic messages are subject to our ability to either display & highlight multiple fields which can make up the identifier or be able to identify the most important one in case there's more than one.

References

@bflad
Copy link
Contributor

bflad commented Nov 6, 2019

(Updated the issue description to include a references section 👍 )

In the Terraform AWS Provider, this functionality would become immensely more helpful if given the availability of both resource configuration value(s) (historically via ResourceData) and provider configuration values (historically via meta). As a small sample, the following resources have uniqueness constraints on both dimensions of AWS Region and a name-like identifier:

  • aws_backup_vault (unique name per AWS Region per AWS Account ID)
  • aws_dynamodb_table (unique name per AWS Region per AWS Account ID)
  • aws_sns_topic (unique name per AWS Region per AWS Account ID)

Or 1 per provider configuration, such as:

  • aws_ebs_default_kms_key (1 per AWS Region per AWS Account ID)
  • aws_ebs_encryption_by_default (1 per AWS Region per AWS Account ID)
  • aws_iam_account_alias (1 per AWS Account ID, naming doesn't matter)
  • aws_iam_account_password_policy (1 per AWS Account ID, no naming)

Another consideration may be resources where there can only be 1 per some resource. Sample resources:

  • aws_s3_bucket_notification (1 per S3 Bucket name)
  • aws_s3_bucket_policy (1 per S3 Bucket name) -- same applies across most, if not all aws_*_policy resources except IAM

Or as another example, 1 per containing resource and name-like identifier, such as:

  • aws_iam_group_policy (unique name per IAM Group name)
  • aws_iam_role_policy (unique name per IAM Role name)
  • aws_iam_user_policy (unique name per IAM User name)

Hopefully the examples above highlight that determining the uniqueness constraints might be a little more nuanced than a simple field on a single attribute. To put some of the examples into "how these values would be determined in a Create/Read/Update/Delete function manner with ResourceData and the provider interface":

  • aws_backup_vault: fmt.Sprintf("%s:%s:%s", meta.(*AWSClient).accountid, meta.(*AWSClient).region, d.Get("name").(string)
  • aws_ebs_default_kms_key: fmt.Sprintf("%s:%s", meta.(*AWSClient).accountid, meta.(*AWSClient).region
  • aws_s3_bucket_notification: d.Get("bucket").(string)
  • aws_iam_group_policy: fmt.Sprintf("%s:%s:%s", meta.(*AWSClient).accountid, d.Get("group").(string), d.Get("name").(string))

Maybe something conceptually like the below at the resource level could do this?

type UniquenessIdentifierFunc func(ResourceData, interface{}) string

type Resource struct {
  // UniquenessIdentifier is a string identifier to determine duplicate resource configurations.
  // This is globally checked against all other Terraform resources in a single run.
  // Any values that cannot be determined at plan time, such as those not known until an upstream
  // resource has been applied, this check to be skipped. If the return value is an empty
  // string (""), this check will be skipped.
  UniquenessIdentifier UniquenessIdentifierFunc
}

// Example in a Terraform AWS Provider resource, such as aws_backup_vault
func PerRegionalNameUniquenessIdentifier(d schema.ResourceData, meta interface{}) string {
  return fmt.Sprintf("%s:%s:%s", meta.(*AWSClient).accountid, meta.(*AWSClient).region, d.Get("name").(string))
}

UniquenessIdentifier: PerRegionalNameUniquenessIdentifier,

// Another example in Terraform AWS Provider, such as aws_ebs_default_kms_key
func PerRegionUniquenessIdentifier(d schema.ResourceData, meta interface{}) string {
  return fmt.Sprintf("%s:%s", meta.(*AWSClient).accountid, meta.(*AWSClient).region)
}

UniquenessIdentifier: PerRegionUniquenessIdentifier,

The nice part here is that it gives resources the freedom to fully synthesize the criteria for uniqueness although it comes at the expense of being a little verbose. The logic checking these would need bypass any with computed attribute values.

When writing up the above examples, I found it may make more sense to have the return value be a *string to allow for nil values or treat "" as a special value to skip the check. A problem that we have in the Terraform AWS Provider is that while the AWS Account ID is generally available, there are reasons to disable the provider initialization for it (skip_requesting_account_id) which leaves that crucial meta.(*AWSClient).accountid empty for us. We wouldn't want a configuration that happens to be against two AWS Accounts correctly to throw a uniqueness error (the result being ":us-east-1" for both resources in that case). Here's how the logic would more likely have to look to appropriately handle that situation:

// Example in a Terraform AWS Provider resource, such as aws_backup_vault
func PerRegionalNameUniquenessIdentifier(d schema.ResourceData, meta interface{}) string {
  accountID := meta.(*AWSClient).accountid

  if accountID == "" {
    return "" // or nil if *string
  }

  return fmt.Sprintf("%s:%s:%s", accountID, meta.(*AWSClient).region, d.Get("name").(string))
}

UniquenessIdentifier: PerRegionalNameUniquenessIdentifier,

// Another example in Terraform AWS Provider, such as aws_ebs_default_kms_key
func PerRegionUniquenessIdentifier(d schema.ResourceData, meta interface{}) string {
  accountID := meta.(*AWSClient).accountid

  if accountID == "" {
    return "" // or nil if *string
  }

  return fmt.Sprintf("%s:%s", accountID, meta.(*AWSClient).region)
}

UniquenessIdentifier: PerRegionUniquenessIdentifier,

I also wonder if maybe the return value could be something like []string, where if any of the values are "" then it automatically skips the check. This would cover the AWS Account ID scenario above as well as computed values.

type UniquenessIdentifierFunc func(ResourceData, interface{}) []string

type Resource struct {
  // UniquenessIdentifier is an identifier to determine duplicate resource configurations.
  // This is globally checked against all other Terraform resources in a single run.
  // If the return value is an empty string ("") for any elements, this check will be skipped.
  // Any values that cannot be determined at plan time, such as those not known until an
  // upstream resource has been applied, will also cause this check to be skipped.
  UniquenessIdentifier UniquenessIdentifierFunc
}

// Example in a Terraform AWS Provider resource, such as aws_backup_vault
func PerRegionalNameUniquenessIdentifier(d schema.ResourceData, meta interface{}) []string {
  return []string{
    meta.(*AWSClient).accountid,
    meta.(*AWSClient).region,
    d.Get("name").(string),
  }
}

UniquenessIdentifier: PerRegionalNameUniquenessIdentifier,

// Another example in Terraform AWS Provider, such as aws_ebs_default_kms_key
func PerRegionUniquenessIdentifier(d schema.ResourceData, meta interface{}) []string {
  return []string{
    meta.(*AWSClient).accountid,
    meta.(*AWSClient).region,
  }
}

UniquenessIdentifier: PerRegionUniquenessIdentifier,

Thanks for the consideration! 😄

@radeksimko
Copy link
Member Author

@paddycarver described to me earlier in a separate conversation that it would be helpful in GCP provider to have the ability to validate all occurrences of a given resource type. The main reason was also uniqueness - so I believe the problem definition matches one of this issue. We couldn't really come up with any other reason to run such validation other than to guarantee/validate some form of uniqueness.

Paddy mentioned two GCP specific examples:

  • google_app_engine_application is unique per Google Project
  • a disk may only be attached to one instance at a time -> while a provider may have _attachment resource which manages the disk/instance relationship and that resource may have unique ID, not all providers have _attachment resources for various reasons

@paddycarver feel free to add/correct anything above.

@bflad
Copy link
Contributor

bflad commented Nov 6, 2019

There is a related upstream Terraform Core issue here: hashicorp/terraform#22094

@bflad
Copy link
Contributor

bflad commented Nov 6, 2019

a disk may only be attached to one instance at a time -> while a provider may have _attachment resource which manages the disk/instance relationship and that resource may have unique ID, not all providers have _attachment resources for various reasons

Oh! This reminds me of very similar problems we have in the Terraform AWS Provider where there may be two methods for managing a specific piece of infrastructure such as EC2 Security Groups, but using both methods can causes perpetual differences:

  • Inline aws_security_group resource rules via ingress and egress configuration blocks
  • Separate aws_security_group_rule resources with type argument set to ingress or egress

In this case the "uniqueness" is even more nuanced since its involves the values of "sub" resources, but the "sub" resources have no identifiers of their own and it actually has nothing to do with overlap either (its matching EC2 Security Group IDs and the aws_security_group resource having any inline blocks of the same aws_security_group_rule resource type argument 😬).

I'm wondering if this class of problem should be considered outside the scope of this feature request due to its inherent additional complexity and maybe treated with a different type of solution that can take specific other resource types and their attribute keys/values into consideration.

@tombuildsstuff
Copy link
Contributor

Presumably having this information available would make caching of duplicate Data Sources possible too? hashicorp/terraform#19942

@radeksimko
Copy link
Member Author

radeksimko commented Dec 9, 2019

@tombuildsstuff I would say easier.

Due to their nature, data sources may require slightly different approach though. Unlike resources which tend to be managed (and changed) entirely via Terraform, data sources may change outside of Terraform more often and what we consider ID today is far less likely to represent unique state and data sources are expected to always have fresh state.

For example you wouldn't want aws_ami to keep stale AMI ID, when you have a fresh one, nor you'd want to implement something like fresh = true/false that will just confuse less skilled users and make them flip it back and forth. I think I'd just expect this to work out of the box and not make unnecessary requests, but always get fresh data.

#7 describes an example of how we may address that. Perhaps a combination of these two suggestions will work - labelling fields as "unique combination" (e.g. name+etag) and make it easier to send more "lightweight" HEAD requests to check whether anything changed, before falling back to Read() and make it easier to just attach headers such as If-None-Match to all requests.

We'll probably need to check what major APIs support today but I'm guessing some combination of the above will satisfy the majority.

@bflad
Copy link
Contributor

bflad commented Jul 30, 2020

I no longer have rights in this repository to edit the issue description references again, but please note I've consolidated the various Terraform AWS Provider use cases into hashicorp/terraform-provider-aws#14394.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request upstream-protocol Requires change of protocol specification, i.e. can't be done under the current protocol upstream-terraform
Projects
None yet
Development

No branches or pull requests

3 participants