Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws_storagegateway_upload_buffer produces inconsistent result after apply when created on an Outpost #17809

Closed
adam-imeson opened this issue Feb 25, 2021 · 7 comments · Fixed by #18313
Assignees
Labels
service/ec2 Issues and PRs that pertain to the ec2 service. service/iam Issues and PRs that pertain to the iam service. service/outposts Issues and PRs that pertain to the outposts service. service/s3 Issues and PRs that pertain to the s3 service. service/storagegateway Issues and PRs that pertain to the storagegateway service.
Milestone

Comments

@adam-imeson
Copy link

adam-imeson commented Feb 25, 2021

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform CLI and Terraform AWS Provider Version

$ terraform -v
Terraform v0.14.7
+ provider registry.terraform.io/hashicorp/aws v3.29.1
+ provider registry.terraform.io/hashicorp/random v3.1.0
+ provider registry.terraform.io/hashicorp/tls v3.1.0

Affected Resource(s)

  • aws_storagegateway_upload_buffer

Terraform Configuration Files

This is part of a larger config, but this is the relevant section. You need an Outpost to use it.

# -----------------------------------------------------------------------------
# Common module variables
# -----------------------------------------------------------------------------
variable "username" {
  type        = string
  description = "Your username - will be prepended to most resource names to track what's yours."
}

variable "tags" {
  type        = map(any)
  default     = {}
  description = "Common tags to apply to all taggable resources."
}

# -----------------------------------------------------------------------------
# Storage Gateway variables
# -----------------------------------------------------------------------------

variable "region_prefixlist_mapping" {
  description = "mapping for finding correct prefix list for corporate network"
  default = {
    "us-east-1" = "pl-60b85b09",
    "us-west-2" = "pl-f8a64391"
  }
}

variable "main_vpc_id" {
  type = string
}

variable "subnet_id" {
  type = string
}

variable "op_id" {
  type = string
}

variable "region" {
  type = string
}

variable "gateway_name" {
  type = string
}

variable "gateway_type" {
  type = string
}

variable "instance_type" {
  type = string
}

# -----------------------------------------------------------------------------
# Storage Gateway - EC2 Setup
# -----------------------------------------------------------------------------
# Creates EC2 instance that will host the gateway and required resources

data "aws_outposts_outpost" "op" {
  id = var.op_id
}

# create RDS key for AWS key pair
resource "tls_private_key" "sgw_tls" {
  algorithm = "RSA"
  rsa_bits  = 4096
}

# create AWS key pair
resource "aws_key_pair" "storagegateway" {
  key_name = join("-",[var.username, "storage-gateway-key"])
  public_key = tls_private_key.sgw_tls.public_key_openssh
}

# get the most recent storage gateway AMI
data "aws_ami" "sg_ami" {
  most_recent      = true
  owners           = ["amazon"]
  filter {
    name   = "name"
    values = ["aws-storage-gateway*"]
  }
  filter {
    name   = "root-device-type"
    values = ["ebs"]
  }
}

# create the instance
resource "aws_instance" "storage_gateway_server" {
  ami = data.aws_ami.sg_ami.image_id
  instance_type = var.instance_type
  associate_public_ip_address = true
  key_name = aws_key_pair.storagegateway.key_name
  subnet_id = var.subnet_id
  vpc_security_group_ids = [aws_security_group.storage_gateway_sg.id]
  root_block_device {
            volume_size = 80
            volume_type = "gp2"
  }
  tags = {
    Name = join("-",[var.username, "storage-gateway"])
  }
}

resource "aws_security_group" "storage_gateway_sg" {
  name = join("-",[var.username, "storage-gateway-activation-sg"])
  vpc_id = var.main_vpc_id
  ingress {
    prefix_list_ids = [lookup(var.region_prefixlist_mapping, var.region)]
    protocol = "-1"
    from_port = 0
    to_port = 0
  }
  egress {
    protocol = "-1"
    from_port = 0
    to_port = 0
    cidr_blocks = ["0.0.0.0/0"]
  }
}

resource "aws_ebs_volume" "storage_gateway_server_cache_disk" {
    availability_zone = data.aws_outposts_outpost.op.availability_zone
    outpost_arn = data.aws_outposts_outpost.op.arn
    size = 150
    encrypted = true
    type = "gp2"
}

resource "aws_volume_attachment" "storage_gateway_attach" {
  device_name = "/dev/sdf"
  volume_id   = aws_ebs_volume.storage_gateway_server_cache_disk.id
  instance_id = aws_instance.storage_gateway_server.id
}

resource "aws_ebs_volume" "storage_gateway_server_buffer_disk" {
    availability_zone = data.aws_outposts_outpost.op.availability_zone
    outpost_arn = data.aws_outposts_outpost.op.arn
    size = 150
    encrypted = true
    type = "gp2"
}

resource "aws_volume_attachment" "storage_gateway_attach_buffer" {
  device_name = "/dev/sdh"
  volume_id   = aws_ebs_volume.storage_gateway_server_buffer_disk.id
  instance_id = aws_instance.storage_gateway_server.id
}

resource "aws_storagegateway_gateway" "storage_gateway" {
  gateway_ip_address = aws_instance.storage_gateway_server.public_ip
  gateway_name       = var.gateway_name
  gateway_timezone   = "GMT"
  gateway_type       = var.gateway_type
}

data "aws_storagegateway_local_disk" "storage_gateway_data" {
  disk_node = aws_volume_attachment.storage_gateway_attach.device_name
  gateway_arn = aws_storagegateway_gateway.storage_gateway.arn
}

resource "aws_storagegateway_cache" "storage_gateway_cache" {
  disk_id     = data.aws_storagegateway_local_disk.storage_gateway_data.id
  gateway_arn = aws_storagegateway_gateway.storage_gateway.arn
}

data "aws_storagegateway_local_disk" "storage_gateway_buffer" {
  disk_node   = aws_volume_attachment.storage_gateway_attach_buffer.device_name
  gateway_arn = aws_storagegateway_gateway.storage_gateway.arn
}

resource "aws_storagegateway_upload_buffer" "buffer" {
  disk_id = data.aws_storagegateway_local_disk.storage_gateway_buffer.id
  gateway_arn = aws_storagegateway_gateway.storage_gateway.arn
}

resource "aws_storagegateway_cached_iscsi_volume" "example" {
  count = var.gateway_type == "CACHED" ? 1 : 0
  gateway_arn          = aws_storagegateway_cache.storage_gateway_cache.gateway_arn
  network_interface_id = aws_instance.storage_gateway_server.private_ip
  target_name          = join("-",[var.username, "target-volume"])
  volume_size_in_bytes = 5368709120 # 5 GB
}

resource "aws_iam_role" "transfer_role" { 
  name = join("-",[var.username, var.gateway_name, "role"])
  assume_role_policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "sts:AssumeRole",
      "Principal": {
        "Service": "storagegateway.amazonaws.com"
      },
      "Effect": "Allow",
      "Sid": ""
    }
  ]
}
EOF
}

resource "aws_iam_policy" "transfer_policy_sg" {
  name = join("-",[var.username, var.gateway_name, "policy"])
  description = "Allows access to storage gateway"
  policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
  {
    "Action": [
      "s3:GetAccelerateConfiguration",
      "s3:GetBucketLocation",
      "s3:GetBucketVersioning",
      "s3:ListBucket",
      "s3:ListBucketVersions",
      "s3:ListBucketMultipartUploads"
    ],
    "Resource": "arn:aws:s3:::${aws_s3_bucket.backup_test_bucket.bucket}",
    "Effect": "Allow"
  },
  {
    "Action": [
      "s3:AbortMultipartUpload",
      "s3:DeleteObject",
      "s3:DeleteObjectVersion",
      "s3:GetObject",
      "s3:GetObjectAcl",
      "s3:GetObjectVersion",
      "s3:ListMultipartUploadParts",
      "s3:PutObject",
      "s3:PutObjectAcl"
    ],
    "Resource": "arn:aws:s3:::${aws_s3_bucket.backup_test_bucket.bucket}/*",
    "Effect": "Allow"
  } ]
}
EOF
}

resource "aws_iam_policy_attachment" "transfer_attach" {
  name = join("-",[var.username, var.gateway_name, "attach"])
  roles = [aws_iam_role.transfer_role.name]
  policy_arn = aws_iam_policy.transfer_policy_sg.arn
}

resource "aws_s3_bucket" "backup_test_bucket" {
  acl    = "private"
  tags = {
    Name = join("-",[var.username, var.gateway_name, "bucket"])
    Environment = "Dev"
  }
}

Debug Output

Last ~500 lines of the trace debug are here: https://gist.github.com/adam-imeson/71757299e95cc2a46be180804e4ea71b

This debug output begins right before Terraform attempts to create the offending aws_storagegateway_upload_buffer resource. I would include more, but the total trace output for my entire stack is over six megabytes. If we need to log dive all of it then I can find a way to host it somewhere convenient.

Expected Behavior

A storage gateway (volume gateway, in this case) stands up on my Outpost.

Actual Behavior

This error appears toward the end of the deployment and terraform halts:

Error: Provider produced inconsistent result after apply

When applying changes to
module.volume_gateway[0].aws_storagegateway_upload_buffer.buffer, provider
"registry.terraform.io/hashicorp/aws" produced an unexpected new value: Root
resource was present, but now absent.

This is a bug in the provider, which should be reported in the provider's own
issue tracker.

Steps to Reproduce

  1. Get an Outpost to test against.
  2. terraform apply

Important Factoids

This storage gateway is being launched on an Outpost.

References

  • #0000
@ghost ghost added the service/storagegateway Issues and PRs that pertain to the storagegateway service. label Feb 25, 2021
@github-actions github-actions bot added the needs-triage Waiting for first response or review from a maintainer. label Feb 25, 2021
@ghost ghost added service/ec2 Issues and PRs that pertain to the ec2 service. service/iam Issues and PRs that pertain to the iam service. service/outposts Issues and PRs that pertain to the outposts service. service/s3 Issues and PRs that pertain to the s3 service. labels Feb 25, 2021
@bflad
Copy link
Contributor

bflad commented Mar 1, 2021

This type of Terraform error occurs when the resource is intended for creation, the API returns back a "not found" type error or the lookup value changes, and the resource returns back to Terraform that the resource does not exist without error (generally used to trigger resource recreation automatically). The handling of this situation is certainly a bug in the resource implementation, which is being more globally tracked with #16796 so that resources will at least return a more helpful error than the generic Terraform CLI error seen here.

That being said, it appears the Storage Gateway DescribeUploadBuffer API accepts using a friendly device path in the request, but returns results using the canonicalized disk identifier in its response. Normally, the aws_storagegateway_local_disk data source should be handling this conversion for you so the disk identifier is already converted by the time it reaches resource creation, but it is unclear why that did not happen in this situation. The Storage Gateway ListLocalDisks API operation specifies the DiskId value as "The unique device ID or other distinguishing data that identifies a local disk.".

Here are the relevant snippets from the debug logging:

2021-02-24T22:14:01.135-0500 [INFO]  plugin.terraform-provider-aws_v3.29.1_x5: 2021/02/24 22:14:01 [DEBUG] [aws-sdk-go] DEBUG: Response storagegateway/ListLocalDisks Details:
...
2021-02-24T22:14:01.136-0500 [INFO]  plugin.terraform-provider-aws_v3.29.1_x5: 2021/02/24 22:14:01 [DEBUG] [aws-sdk-go] {"Disks":[{"DiskAllocationType":"AVAILABLE","DiskAttributeList":[],"DiskId":"/dev/nvme2n1","DiskNode":"/dev/sdf","DiskPath":"/dev/nvme2n1","DiskSizeInBytes":161061273600,"DiskStatus":"present"},{"DiskAllocationType":"AVAILABLE","DiskAttributeList":[],"DiskId":"/dev/nvme1n1","DiskNode":"/dev/sdh","DiskPath":"/dev/nvme1n1","DiskSizeInBytes":161061273600,"DiskStatus":"present"}],"GatewayARN":"arn:aws:storagegateway:us-west-2:OMITTED:gateway/sgw-FB719892"}: timestamp=2021-02-24T22:14:01.135-0500
...
2021-02-24T22:14:01.135-0500 [INFO]  plugin.terraform-provider-aws_v3.29.1_x5: 2021/02/24 22:14:01 [DEBUG] [aws-sdk-go] DEBUG: Request storagegateway/AddUploadBuffer Details:
...
{"DiskIds":["/dev/nvme1n1"],"GatewayARN":"arn:aws:storagegateway:us-west-2:OMITTED:gateway/sgw-FB719892"}
...
2021-02-24T22:14:05.800-0500 [INFO]  plugin.terraform-provider-aws_v3.29.1_x5: 2021/02/24 22:14:05 [DEBUG] [aws-sdk-go] DEBUG: Response storagegateway/AddUploadBuffer Details:
---[ RESPONSE ]--------------------------------------
HTTP/1.1 200 OK
Connection: close
Content-Length: 83
Content-Type: application/x-amz-json-1.1
Date: Thu, 25 Feb 2021 03:14:05 GMT
X-Amzn-Requestid: 080df764-0024-46af-96c6-6a5df625bf01


-----------------------------------------------------
2021-02-24T22:14:05.801-0500 [INFO]  plugin.terraform-provider-aws_v3.29.1_x5: 2021/02/24 22:14:05 [DEBUG] [aws-sdk-go] DEBUG: Request storagegateway/DescribeUploadBuffer Details:
...
{"GatewayARN":"arn:aws:storagegateway:us-west-2:799838960553:gateway/sgw-FB719892"}
...
2021-02-24T22:14:06.249-0500 [INFO]  plugin.terraform-provider-aws_v3.29.1_x5: 2021/02/24 22:14:06 [DEBUG] [aws-sdk-go] DEBUG: Response storagegateway/DescribeUploadBuffer Details:
...
2021-02-24T22:14:06.249-0500 [INFO]  plugin.terraform-provider-aws_v3.29.1_x5: 2021/02/24 22:14:06 [DEBUG] [aws-sdk-go] {"DiskIds":["2255f3b7-3ff1-4121-b2d9-4f548e8f848e"],"GatewayARN":"arn:aws:storagegateway:us-west-2:OMITTED:gateway/sgw-FB719892","UploadBufferAllocatedInBytes":161061273600,"UploadBufferUsedInBytes":0}

It might be worth checking with the Storage Gateway service team why this value returned as the disk path in the DiskId field of the ListLocalDisks API operation instead of the canonical disk identifier.

@adam-imeson
Copy link
Author

Thanks for the diagnosis. I've contacted them to see what they want that API to return. Will the bug in the resource configuration be fixed as part of #16796, or is it an individual effort to be completed after #16796 is done?

@bflad
Copy link
Contributor

bflad commented Mar 1, 2021

I will submit the fix for the aws_storagegateway_upload_buffer resource to return a friendlier error shortly (satisfying the specific goal of #16796), which should make this replace the error with messaging such as:

Error: error reading Storage Gateway Upload Buffer (arn:aws:storagegateway:us-west-2:123456789012:gateway/sgw-FB719892:/dev/nvme1n1): not found

The underlying issue with getting the proper disk identifier before it reaches the aws_storagegateway_upload_buffer resource (e.g. ListLocalDisks returning the identifier instead of the path) or having the DescribeUploadBuffer API response be able to match the given request (e.g. return both path and identifier) will be necessary to remove the error completely.

@bflad bflad removed the needs-triage Waiting for first response or review from a maintainer. label Mar 1, 2021
bflad added a commit that referenced this issue Mar 2, 2021
… inconsistent result after apply` with actual error message

Reference: #16796
Reference: #17809

Output from acceptance testing:

```
--- PASS: TestAccAWSStorageGatewayUploadBuffer_basic (455.25s)
```
bflad added a commit that referenced this issue Mar 5, 2021
… inconsistent result after apply` with actual error message (#17880)

Reference: #16796
Reference: #17809

Output from acceptance testing:

```
--- PASS: TestAccAWSStorageGatewayUploadBuffer_basic (455.25s)
```
@adam-imeson
Copy link
Author

Got some information on this. There is different behavior in the ListLocalDisks DiskId response depending on whether the disk has been added as a cache or upload buffer yet. This is only true for CACHED and VTL type gateways. There are some legacy reasons as to why it works this way. See below:

For CACHED/VTL
The diskId starts unallocated, and uses the canonical path.

Once disk is added as a cache or upload buffer this gets switched to its ID label example: "dec4c6bc-757c-4912-8d0e-a59f82c861d1"

For STORED

This distinguishing identifier shouldn't change regardless of its add disk state

Is this sufficient to solve the problem?

I tried to work around the issue by setting a manual dependency on the disk attachment (as below), but it didn't seem to change anything. I'm guessing that it didn't work because this state change is related to storage gateway registering the disk after it's attached rather than the disk being attached to the EC2 instance.

resource "aws_storagegateway_upload_buffer" "buffer" {
  disk_id = data.aws_storagegateway_local_disk.storage_gateway_buffer.id
  gateway_arn = aws_storagegateway_gateway.storage_gateway.arn

  depends_on = [aws_volume_attachment.storage_gateway_attach_buffer]
}

bflad added a commit that referenced this issue Mar 19, 2021
Reference: #17809

The referenced issue contains the context, but the gist of this issue is that the Storage Gateway API canonicalizes the Cached and VTL disk identifiers only after first cache or upload buffer use, which means that this resource must accept the path first, then perform its own lookup after creation.

This also fixes the `aws_storagegateway_local_disk` data source, which was missing `Computed` on the `disk_node` and `disk_path` attributes, which prevented it for being used to lookup one for the value of the other.

Previously:

```
=== CONT  TestAccAWSStorageGatewayUploadBuffer_DiskPath
    resource_aws_storagegateway_upload_buffer_test.go:107: Step 1/2 error: Error running apply: exit status 1

        Error: error reading Storage Gateway Upload Buffer (arn:aws:storagegateway:us-west-2:187416307283:gateway/sgw-D0A941B9:/dev/nvme1n1): not found

          on terraform_plugin_test.tf line 129, in resource "aws_storagegateway_upload_buffer" "test":
         129: resource "aws_storagegateway_upload_buffer" "test" {

--- FAIL: TestAccAWSStorageGatewayUploadBuffer_DiskPath (418.35s)
```

Output from acceptance testing:

```
--- PASS: TestAccAWSStorageGatewayLocalDiskDataSource_DiskNode (288.52s)
--- PASS: TestAccAWSStorageGatewayLocalDiskDataSource_DiskPath (300.60s)

--- PASS: TestAccAWSStorageGatewayUploadBuffer_basic (418.26s)
--- PASS: TestAccAWSStorageGatewayUploadBuffer_DiskPath (444.33s)
```
@bflad
Copy link
Contributor

bflad commented Mar 19, 2021

Hi @adam-imeson 👋 Thank you for the additional details. This unfortunately means that we must implement additional changes in the Terraform resource logic to support the mismatched behavior between gateway types. To that end, I have added disk_path as a supported argument in this pull request: #18313

After this change, cached gateway upload buffers can be implemented with:

data "aws_storagegateway_local_disk" "test" {
  disk_node   = aws_volume_attachment.test.device_name
  gateway_arn = aws_storagegateway_gateway.test.arn
}

resource "aws_storagegateway_upload_buffer" "test" {
  disk_path   = data.aws_storagegateway_local_disk.test.disk_path
  gateway_arn = aws_storagegateway_gateway.test.arn
}

Under the hood, this adds the UploadBuffer by DiskPath then reads the ListLocalDisks API to find the (now correct) DiskId. 👍

@bflad bflad self-assigned this Mar 19, 2021
bflad added a commit that referenced this issue Mar 26, 2021
…#18313)

* service/storagegateway: Support Cached and VTL gateway upload buffers

Reference: #17809

The referenced issue contains the context, but the gist of this issue is that the Storage Gateway API canonicalizes the Cached and VTL disk identifiers only after first cache or upload buffer use, which means that this resource must accept the path first, then perform its own lookup after creation.

This also fixes the `aws_storagegateway_local_disk` data source, which was missing `Computed` on the `disk_node` and `disk_path` attributes, which prevented it for being used to lookup one for the value of the other.

Previously:

```
=== CONT  TestAccAWSStorageGatewayUploadBuffer_DiskPath
    resource_aws_storagegateway_upload_buffer_test.go:107: Step 1/2 error: Error running apply: exit status 1

        Error: error reading Storage Gateway Upload Buffer (arn:aws:storagegateway:us-west-2:187416307283:gateway/sgw-D0A941B9:/dev/nvme1n1): not found

          on terraform_plugin_test.tf line 129, in resource "aws_storagegateway_upload_buffer" "test":
         129: resource "aws_storagegateway_upload_buffer" "test" {

--- FAIL: TestAccAWSStorageGatewayUploadBuffer_DiskPath (418.35s)
```

Output from acceptance testing:

```
--- PASS: TestAccAWSStorageGatewayLocalDiskDataSource_DiskNode (288.52s)
--- PASS: TestAccAWSStorageGatewayLocalDiskDataSource_DiskPath (300.60s)

--- PASS: TestAccAWSStorageGatewayUploadBuffer_basic (418.26s)
--- PASS: TestAccAWSStorageGatewayUploadBuffer_DiskPath (444.33s)
```

* Update CHANGELOG for #18313
@github-actions github-actions bot added this to the v3.34.0 milestone Mar 26, 2021
@ghost
Copy link

ghost commented Mar 26, 2021

This has been released in version 3.34.0 of the Terraform AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template for triage. Thanks!

@ghost
Copy link

ghost commented Apr 25, 2021

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

@ghost ghost locked as resolved and limited conversation to collaborators Apr 25, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
service/ec2 Issues and PRs that pertain to the ec2 service. service/iam Issues and PRs that pertain to the iam service. service/outposts Issues and PRs that pertain to the outposts service. service/s3 Issues and PRs that pertain to the s3 service. service/storagegateway Issues and PRs that pertain to the storagegateway service.
Projects
None yet
2 participants