Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS Neptune Cluster from snapshot #23601

Closed
kastlbo opened this issue Mar 9, 2022 · 18 comments · Fixed by #28051
Closed

AWS Neptune Cluster from snapshot #23601

kastlbo opened this issue Mar 9, 2022 · 18 comments · Fixed by #28051
Labels
bug Addresses a defect in current functionality. service/neptune Issues and PRs that pertain to the neptune service.

Comments

@kastlbo
Copy link

kastlbo commented Mar 9, 2022

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform CLI and Terraform AWS Provider Version

Version 1.1.6
Provider 3.74.4. (Use this provider because we are having issues with s3 buckets with the newer provider)

Affected Resource(s)

  • aws_neptune_cluster

Terraform Configuration Files

Please include all Terraform configurations required to reproduce the bug. Bug reports without a functional reproduction may be closed without investigation.

I can't include configuration as I work for a company that would call it a security violation.

Debug Output

Panic Output

Expected Behavior

A neptune cluster should be created from a snapshot when you add in the snapshot_identifier tag.

Actual Behavior

Errors and won't create the cluster.

Cannot modify engine version without a healthy primary instance in DB cluster: sf-conncen-test-c3-graph-database
│ 	status code: 400, request id: 577cca5d-39af-4748-88eb-35980f6c0fa5
│ 
│   with module.c3_graph_database_neptune_cluster.aws_neptune_cluster.neptune_cluster[0],
│   on .terraform/modules/c3_graph_database_neptune_cluster/modules/neptune/cluster/main.tf line 3, in resource "aws_neptune_cluster" "neptune_cluster":
│    3: resource "aws_neptune_cluster" "neptune_cluster" {

Steps to Reproduce

We use a custom module built off of the resource listed above. I am having troubles creating a neptune cluster from a snapshot. Each time i run with the snapshot identifier it will fail:

Cannot modify engine version without a healthy primary instance in DB cluster: sf-conncen-test-c3-graph-database
│ 	status code: 400, request id: 577cca5d-39af-4748-88eb-35980f6c0fa5
│ 
│   with module.c3_graph_database_neptune_cluster.aws_neptune_cluster.neptune_cluster[0],
│   on .terraform/modules/c3_graph_database_neptune_cluster/modules/neptune/cluster/main.tf line 3, in resource "aws_neptune_cluster" "neptune_cluster":
│    3: resource "aws_neptune_cluster" "neptune_cluster" {

The documentation for this resource says that you can create the cluster from the snapshot by adding in the snapshot_identifier tag. But it doesn't work as advertised. I have tried checking the terraform without the identifier and the cluster is created without any problems. I have read the issue board and i don't see anyone else with this problem. I did see some talk about encryption being the problem but it really didn't apply to neptune.
To reproduce just create a snapshot and try to get the terraform to create it.

  1. terraform apply

Important Factoids

All terraform is run in a ci/cd pipeline using TFE.

References

  • #0000
@github-actions github-actions bot added needs-triage Waiting for first response or review from a maintainer. service/neptune Issues and PRs that pertain to the neptune service. labels Mar 9, 2022
@justinretzolk justinretzolk added bug Addresses a defect in current functionality. and removed needs-triage Waiting for first response or review from a maintainer. labels Mar 15, 2022
@bschaatsbergen
Copy link
Member

Going for a short holiday, back on monday, but I'll happily look into this.

@georgedivya
Copy link

I'm also having issues with creating a cluster from the snapshot. I'm recreating the cluster from a snapshot and it fails on cluster creation with the following error:

DBClusterRoleAlreadyExists: Role ARN <role arn> is already associated with DB Cluster: Verify your role ARN and try again.

The same code works without specifying a snapshot identifier.

@jaw111
Copy link

jaw111 commented Jun 14, 2022

@kastlbo given the error you reported, it would be worthwhile to check the Neptune engine version used for the snapshot and try using the same engine version in the new cluster.

@wolli-lenzen
Copy link

Same problem here - when I try to create Neptune-Cluster from an existing snapshot it runs into Error

"InvalidDBClusterStateFault: Cannot modify engine version without a healthy primary instance in DB cluster:"

For sure, engine version of snapshot and engine verion of the newly created cluster are the same.

@wolli-lenzen
Copy link

@bschaatsbergen, @justinretzolk do you see any chance to get this fixed quite soon? It is identified as bug since march and is stopping us to write and test desaster recovery code for DB

@justinretzolk
Copy link
Member

Hey @wolli-lenzen 👋 Thank you for checking in on this. Unfortunately, I'm not able to provide an estimate on when this will be looked into due to the potential of shifting priorities (we prioritize work by count of ":+1:" reactions, as well as a few other things). For more information on how we prioritize, check out out prioritization guide.

@nikunjundhad
Copy link

nikunjundhad commented Aug 18, 2022

Terraform version: 1.2.3
AWS provider version: 4.26.0
And still reproducible, this is deadlock for our disaster recovery. With this behaviour we can't recover our DB cluster if something goes wrong, and in this case we can't rely on terraform for our DB management.
FYI similar type of issue is observed for RDS aws_db_cluster_snapshot as well, there also our cluster creation from snapshot not going healthy.

aws_neptune_cluster.neptune-db-n: Still creating... [16m41s elapsed]
aws_neptune_cluster.neptune-db-n: Still creating... [16m51s elapsed]
╷
│ Error: Failed to modify Neptune Cluster (ae-sbx-neptune-cluster-new): InvalidDBClusterStateFault: Cannot modify engine version without a healthy primary instance in DB cluster: ae-sbx-neptune-cluster-new
│ 	status code: 400, request id: 89261a6a-3ee7-4406-807e-24e0e02b4523
│
│   with aws_neptune_cluster.neptune-db-n,
│   on neptune-cluster.tf line 37, in resource "aws_neptune_cluster" "neptune-db-n":
│   37: resource "aws_neptune_cluster" "neptune-db-n" {
│
╵

---- update ----
When I removed engine_version from resource it successfully created new cluster with latest version available, however our old DB cluster is running with older version and snapshot is also showing older version. And when I added it again it will start failing with similar error. So definitely issue is with when we specify version.

After above error cluster on AWS console showing with status Available
Also very wired thing is about state after above terraform apply, check below is state for resource name neptune-db-n and many required field values are null for example arn

{
      "mode": "managed",
      "type": "aws_neptune_cluster",
      "name": "neptune-db-n",
      "provider": "provider[\"registry.terraform.io/hashicorp/aws\"]",
      "instances": [
        {
          "status": "tainted",
          "schema_version": 0,
          "attributes": {
            "allow_major_version_upgrade": null,
            "apply_immediately": true,
            "arn": null,
            "availability_zones": [
              "us-east-1a",
              "us-east-1b",
              "us-east-1c"
            ],
            "backup_retention_period": 5,
            "cluster_identifier": "ae-sbx-neptune-cluster-new",
            "cluster_identifier_prefix": null,
            "cluster_members": [],
            "cluster_resource_id": null,
            "copy_tags_to_snapshot": false,
            "deletion_protection": null,
            "enable_cloudwatch_logs_exports": null,
            "endpoint": null,
            "engine": "neptune",
            "engine_version": "1.0.5.1",
            "final_snapshot_identifier": null,
            "hosted_zone_id": null,
            "iam_database_authentication_enabled": false,
            "iam_roles": null,
            "id": "ae-sbx-neptune-cluster-new",
            "kms_key_arn": null,
            "neptune_cluster_parameter_group_name": "default.neptune1",
            "neptune_subnet_group_name": null,
            "port": 8182,
            "preferred_backup_window": "07:00-09:00",
            "preferred_maintenance_window": null,
            "reader_endpoint": null,
            "replication_source_identifier": null,
            "skip_final_snapshot": true,
            "snapshot_identifier": "arn:aws:rds:us-east-1:503330882943:cluster-snapshot:ae-sbx-neptune-db-snap-18aug",
            "storage_encrypted": false,
            "tags": null,
            "tags_all": null,
            "timeouts": null,
            "vpc_security_group_ids": []
          },
          "sensitive_attributes": [],
          "private": "eyJlMmJmYjczMC1lY2FhLTExZTYtOGY4OC0zNDM2M2JjN2M0YzAiOnsiY3JlYXRlIjo3MjAwMDAwMDAwMDAwLCJkZWxldGUiOjcyMDAwMDAwMDAwMDAsInVwZGF0ZSI6NzIwMDAwMDAwMDAwMH19",
          "dependencies": [
            "aws_neptune_cluster_snapshot.snapshot-18Aug"
          ]
        }
      ]
    }

Also when I show plan after last apply without any change, it's always replacing the cluster and we are in endless loop of cluster re-creation. Check below plan.

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # aws_neptune_cluster.neptune-db-n is tainted, so must be replaced
-/+ resource "aws_neptune_cluster" "neptune-db-n" {
      + allow_major_version_upgrade          = (known after apply)
      ~ arn                                  = "arn:aws:rds:us-east-1:503330882943:cluster:ae-sbx-neptune-cluster-new" -> (known after apply)
      + cluster_identifier_prefix            = (known after apply)
      ~ cluster_members                      = [] -> (known after apply)
      ~ cluster_resource_id                  = "cluster-MGAITJE2YI5J7P4KRPX6LG7YAY" -> (known after apply)
      - deletion_protection                  = false -> null
      - enable_cloudwatch_logs_exports       = [] -> null
      ~ endpoint                             = "ae-sbx-neptune-cluster-new.cluster-czhruub712uf.us-east-1.neptune.amazonaws.com" -> (known after apply)
      ~ hosted_zone_id                       = "ZUFXD4SLT2LS7" -> (known after apply)
      - iam_roles                            = [] -> null
      ~ id                                   = "ae-sbx-neptune-cluster-new" -> (known after apply)
      + kms_key_arn                          = (known after apply)
      ~ neptune_subnet_group_name            = "default" -> (known after apply)
      ~ preferred_maintenance_window         = "mon:05:39-mon:06:09" -> (known after apply)
      ~ reader_endpoint                      = "ae-sbx-neptune-cluster-new.cluster-ro-czhruub712uf.us-east-1.neptune.amazonaws.com" -> (known after apply)
      - tags                                 = {} -> null
      ~ tags_all                             = {} -> (known after apply)
      ~ vpc_security_group_ids               = [
          - "sg-03f5bd30",
        ] -> (known after apply)
        # (14 unchanged attributes hidden)
    }

Plan: 1 to add, 0 to change, 1 to destroy.

@bschaatsbergen @justinretzolk can you guys please check this on priority? As this is showstopper for us and many others. If there is a way to make this high priority let us know we will like to do that. Thanks in advance for your guidance.
If issue is already identified and if there is any workaround till you properly fix it, please let us know that so we can unblock our self and move ahead. Thanks.
@danielcweber already raised a pull request, when can we expect that will be merged into main? #25982

@nikunjundhad
Copy link

Any update on this issue guys?

@roshanjoseph23
Copy link

trying to restore snapshot using terraform and always ending up with DBClusterRoleAlreadyExists error.
Tried with same engine version used in snapshot, and even not specifying the engine version isn't helping.

@tgourley01
Copy link

tgourley01 commented Nov 1, 2022

Wish this would get some attention. This is a non-starter for production environments.
Are there any known workarounds? older provider versions maybe?

I see @danielcweber has raised pull request #25982, can it be merged?

@slatsinoglou
Copy link

We are experiencing the same issue. Is there any update on this?

@pluksha
Copy link

pluksha commented Nov 23, 2022

The same issue. Are there any updates?

@vgarkusha
Copy link

Same issue with provisioning neptune from snapshot like @georgedivya have. Any updates around it?

@danielcweber
Copy link
Contributor

Make sure you upvote the proposed PR #25982 if it works for you.

@LennyCastaneda
Copy link

Having the same issue here...what is the status of this solution?

@Tom-Carpenter
Copy link

This coupled with this #15563 makes the user experience for terraform pretty poor

@joeynaor
Copy link

Any updates on this? This issue completely breaks our DR pipeline

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 30, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Addresses a defect in current functionality. service/neptune Issues and PRs that pertain to the neptune service.
Projects
None yet
Development

Successfully merging a pull request may close this issue.