AWS Neptune Cluster from snapshot #23601

kastlbo · 2022-03-09T23:48:00Z

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform CLI and Terraform AWS Provider Version

Version 1.1.6
Provider 3.74.4. (Use this provider because we are having issues with s3 buckets with the newer provider)

Affected Resource(s)

aws_neptune_cluster

Terraform Configuration Files

Please include all Terraform configurations required to reproduce the bug. Bug reports without a functional reproduction may be closed without investigation.

I can't include configuration as I work for a company that would call it a security violation.

Debug Output

Panic Output

Expected Behavior

A neptune cluster should be created from a snapshot when you add in the snapshot_identifier tag.

Actual Behavior

Errors and won't create the cluster.

Cannot modify engine version without a healthy primary instance in DB cluster: sf-conncen-test-c3-graph-database
│ 	status code: 400, request id: 577cca5d-39af-4748-88eb-35980f6c0fa5
│ 
│   with module.c3_graph_database_neptune_cluster.aws_neptune_cluster.neptune_cluster[0],
│   on .terraform/modules/c3_graph_database_neptune_cluster/modules/neptune/cluster/main.tf line 3, in resource "aws_neptune_cluster" "neptune_cluster":
│    3: resource "aws_neptune_cluster" "neptune_cluster" {

Steps to Reproduce

We use a custom module built off of the resource listed above. I am having troubles creating a neptune cluster from a snapshot. Each time i run with the snapshot identifier it will fail:

Cannot modify engine version without a healthy primary instance in DB cluster: sf-conncen-test-c3-graph-database
│ 	status code: 400, request id: 577cca5d-39af-4748-88eb-35980f6c0fa5
│ 
│   with module.c3_graph_database_neptune_cluster.aws_neptune_cluster.neptune_cluster[0],
│   on .terraform/modules/c3_graph_database_neptune_cluster/modules/neptune/cluster/main.tf line 3, in resource "aws_neptune_cluster" "neptune_cluster":
│    3: resource "aws_neptune_cluster" "neptune_cluster" {

The documentation for this resource says that you can create the cluster from the snapshot by adding in the snapshot_identifier tag. But it doesn't work as advertised. I have tried checking the terraform without the identifier and the cluster is created without any problems. I have read the issue board and i don't see anyone else with this problem. I did see some talk about encryption being the problem but it really didn't apply to neptune.
To reproduce just create a snapshot and try to get the terraform to create it.

terraform apply

Important Factoids

All terraform is run in a ci/cd pipeline using TFE.

References

#0000

The text was updated successfully, but these errors were encountered:

bschaatsbergen · 2022-03-24T22:03:32Z

Going for a short holiday, back on monday, but I'll happily look into this.

georgedivya · 2022-05-13T21:02:46Z

I'm also having issues with creating a cluster from the snapshot. I'm recreating the cluster from a snapshot and it fails on cluster creation with the following error:

DBClusterRoleAlreadyExists: Role ARN <role arn> is already associated with DB Cluster: Verify your role ARN and try again.

The same code works without specifying a snapshot identifier.

jaw111 · 2022-06-14T21:26:07Z

@kastlbo given the error you reported, it would be worthwhile to check the Neptune engine version used for the snapshot and try using the same engine version in the new cluster.

wolli-lenzen · 2022-07-22T18:40:03Z

Same problem here - when I try to create Neptune-Cluster from an existing snapshot it runs into Error

"InvalidDBClusterStateFault: Cannot modify engine version without a healthy primary instance in DB cluster:"

For sure, engine version of snapshot and engine verion of the newly created cluster are the same.

wolli-lenzen · 2022-07-25T15:20:03Z

@bschaatsbergen, @justinretzolk do you see any chance to get this fixed quite soon? It is identified as bug since march and is stopping us to write and test desaster recovery code for DB

justinretzolk · 2022-07-25T20:00:37Z

Hey @wolli-lenzen 👋 Thank you for checking in on this. Unfortunately, I'm not able to provide an estimate on when this will be looked into due to the potential of shifting priorities (we prioritize work by count of ":+1:" reactions, as well as a few other things). For more information on how we prioritize, check out out prioritization guide.

nikunjundhad · 2022-08-18T13:02:41Z

Terraform version: 1.2.3
AWS provider version: 4.26.0
And still reproducible, this is deadlock for our disaster recovery. With this behaviour we can't recover our DB cluster if something goes wrong, and in this case we can't rely on terraform for our DB management.
FYI similar type of issue is observed for RDS aws_db_cluster_snapshot as well, there also our cluster creation from snapshot not going healthy.

aws_neptune_cluster.neptune-db-n: Still creating... [16m41s elapsed]
aws_neptune_cluster.neptune-db-n: Still creating... [16m51s elapsed]
╷
│ Error: Failed to modify Neptune Cluster (ae-sbx-neptune-cluster-new): InvalidDBClusterStateFault: Cannot modify engine version without a healthy primary instance in DB cluster: ae-sbx-neptune-cluster-new
│ 	status code: 400, request id: 89261a6a-3ee7-4406-807e-24e0e02b4523
│
│   with aws_neptune_cluster.neptune-db-n,
│   on neptune-cluster.tf line 37, in resource "aws_neptune_cluster" "neptune-db-n":
│   37: resource "aws_neptune_cluster" "neptune-db-n" {
│
╵

---- update ----
When I removed engine_version from resource it successfully created new cluster with latest version available, however our old DB cluster is running with older version and snapshot is also showing older version. And when I added it again it will start failing with similar error. So definitely issue is with when we specify version.

After above error cluster on AWS console showing with status Available
Also very wired thing is about state after above terraform apply, check below is state for resource name neptune-db-n and many required field values are null for example arn

{
      "mode": "managed",
      "type": "aws_neptune_cluster",
      "name": "neptune-db-n",
      "provider": "provider[\"registry.terraform.io/hashicorp/aws\"]",
      "instances": [
        {
          "status": "tainted",
          "schema_version": 0,
          "attributes": {
            "allow_major_version_upgrade": null,
            "apply_immediately": true,
            "arn": null,
            "availability_zones": [
              "us-east-1a",
              "us-east-1b",
              "us-east-1c"
            ],
            "backup_retention_period": 5,
            "cluster_identifier": "ae-sbx-neptune-cluster-new",
            "cluster_identifier_prefix": null,
            "cluster_members": [],
            "cluster_resource_id": null,
            "copy_tags_to_snapshot": false,
            "deletion_protection": null,
            "enable_cloudwatch_logs_exports": null,
            "endpoint": null,
            "engine": "neptune",
            "engine_version": "1.0.5.1",
            "final_snapshot_identifier": null,
            "hosted_zone_id": null,
            "iam_database_authentication_enabled": false,
            "iam_roles": null,
            "id": "ae-sbx-neptune-cluster-new",
            "kms_key_arn": null,
            "neptune_cluster_parameter_group_name": "default.neptune1",
            "neptune_subnet_group_name": null,
            "port": 8182,
            "preferred_backup_window": "07:00-09:00",
            "preferred_maintenance_window": null,
            "reader_endpoint": null,
            "replication_source_identifier": null,
            "skip_final_snapshot": true,
            "snapshot_identifier": "arn:aws:rds:us-east-1:503330882943:cluster-snapshot:ae-sbx-neptune-db-snap-18aug",
            "storage_encrypted": false,
            "tags": null,
            "tags_all": null,
            "timeouts": null,
            "vpc_security_group_ids": []
          },
          "sensitive_attributes": [],
          "private": "eyJlMmJmYjczMC1lY2FhLTExZTYtOGY4OC0zNDM2M2JjN2M0YzAiOnsiY3JlYXRlIjo3MjAwMDAwMDAwMDAwLCJkZWxldGUiOjcyMDAwMDAwMDAwMDAsInVwZGF0ZSI6NzIwMDAwMDAwMDAwMH19",
          "dependencies": [
            "aws_neptune_cluster_snapshot.snapshot-18Aug"
          ]
        }
      ]
    }

Also when I show plan after last apply without any change, it's always replacing the cluster and we are in endless loop of cluster re-creation. Check below plan.

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # aws_neptune_cluster.neptune-db-n is tainted, so must be replaced
-/+ resource "aws_neptune_cluster" "neptune-db-n" {
      + allow_major_version_upgrade          = (known after apply)
      ~ arn                                  = "arn:aws:rds:us-east-1:503330882943:cluster:ae-sbx-neptune-cluster-new" -> (known after apply)
      + cluster_identifier_prefix            = (known after apply)
      ~ cluster_members                      = [] -> (known after apply)
      ~ cluster_resource_id                  = "cluster-MGAITJE2YI5J7P4KRPX6LG7YAY" -> (known after apply)
      - deletion_protection                  = false -> null
      - enable_cloudwatch_logs_exports       = [] -> null
      ~ endpoint                             = "ae-sbx-neptune-cluster-new.cluster-czhruub712uf.us-east-1.neptune.amazonaws.com" -> (known after apply)
      ~ hosted_zone_id                       = "ZUFXD4SLT2LS7" -> (known after apply)
      - iam_roles                            = [] -> null
      ~ id                                   = "ae-sbx-neptune-cluster-new" -> (known after apply)
      + kms_key_arn                          = (known after apply)
      ~ neptune_subnet_group_name            = "default" -> (known after apply)
      ~ preferred_maintenance_window         = "mon:05:39-mon:06:09" -> (known after apply)
      ~ reader_endpoint                      = "ae-sbx-neptune-cluster-new.cluster-ro-czhruub712uf.us-east-1.neptune.amazonaws.com" -> (known after apply)
      - tags                                 = {} -> null
      ~ tags_all                             = {} -> (known after apply)
      ~ vpc_security_group_ids               = [
          - "sg-03f5bd30",
        ] -> (known after apply)
        # (14 unchanged attributes hidden)
    }

Plan: 1 to add, 0 to change, 1 to destroy.

@bschaatsbergen @justinretzolk can you guys please check this on priority? As this is showstopper for us and many others. If there is a way to make this high priority let us know we will like to do that. Thanks in advance for your guidance.
If issue is already identified and if there is any workaround till you properly fix it, please let us know that so we can unblock our self and move ahead. Thanks.
@danielcweber already raised a pull request, when can we expect that will be merged into main? #25982

nikunjundhad · 2022-10-19T05:18:08Z

Any update on this issue guys?

roshanjoseph23 · 2022-10-28T14:34:30Z

trying to restore snapshot using terraform and always ending up with DBClusterRoleAlreadyExists error.
Tried with same engine version used in snapshot, and even not specifying the engine version isn't helping.

tgourley01 · 2022-11-01T15:47:50Z

Wish this would get some attention. This is a non-starter for production environments.
Are there any known workarounds? older provider versions maybe?

I see @danielcweber has raised pull request #25982, can it be merged?

slatsinoglou · 2022-11-02T17:56:54Z

We are experiencing the same issue. Is there any update on this?

pluksha · 2022-11-23T18:24:08Z

The same issue. Are there any updates?

vgarkusha · 2022-12-15T12:50:47Z

Same issue with provisioning neptune from snapshot like @georgedivya have. Any updates around it?

danielcweber · 2022-12-15T15:59:16Z

Make sure you upvote the proposed PR #25982 if it works for you.

LennyCastaneda · 2023-01-06T19:12:56Z

Having the same issue here...what is the status of this solution?

Tom-Carpenter · 2023-02-02T14:16:22Z

This coupled with this #15563 makes the user experience for terraform pretty poor

joeynaor · 2023-02-27T09:37:29Z

Any updates on this? This issue completely breaks our DR pipeline

github-actions · 2023-03-30T02:13:40Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

github-actions bot added needs-triage Waiting for first response or review from a maintainer. service/neptune Issues and PRs that pertain to the neptune service. labels Mar 9, 2022

justinretzolk added bug Addresses a defect in current functionality. and removed needs-triage Waiting for first response or review from a maintainer. labels Mar 15, 2022

danielcweber mentioned this issue Jul 26, 2022

Attempt to fix #23601 #25982

Closed

ewbankkit mentioned this issue Feb 12, 2023

r/neptune_cluster - fix major version upgrade #28051

Merged

ewbankkit closed this as completed in #28051 Feb 13, 2023

github-actions bot locked as resolved and limited conversation to collaborators Mar 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWS Neptune Cluster from snapshot #23601

AWS Neptune Cluster from snapshot #23601

kastlbo commented Mar 9, 2022 •

edited by justinretzolk

Loading

bschaatsbergen commented Mar 24, 2022

georgedivya commented May 13, 2022

jaw111 commented Jun 14, 2022

wolli-lenzen commented Jul 22, 2022

wolli-lenzen commented Jul 25, 2022

justinretzolk commented Jul 25, 2022

nikunjundhad commented Aug 18, 2022 •

edited

Loading

nikunjundhad commented Oct 19, 2022

roshanjoseph23 commented Oct 28, 2022

tgourley01 commented Nov 1, 2022 •

edited

Loading

slatsinoglou commented Nov 2, 2022

pluksha commented Nov 23, 2022

vgarkusha commented Dec 15, 2022

danielcweber commented Dec 15, 2022

LennyCastaneda commented Jan 6, 2023

Tom-Carpenter commented Feb 2, 2023

joeynaor commented Feb 27, 2023

github-actions bot commented Mar 30, 2023

AWS Neptune Cluster from snapshot #23601

AWS Neptune Cluster from snapshot #23601

Comments

kastlbo commented Mar 9, 2022 • edited by justinretzolk Loading

Community Note

Terraform CLI and Terraform AWS Provider Version

Affected Resource(s)

Terraform Configuration Files

Debug Output

Panic Output

Expected Behavior

Actual Behavior

Steps to Reproduce

Important Factoids

References

bschaatsbergen commented Mar 24, 2022

georgedivya commented May 13, 2022

jaw111 commented Jun 14, 2022

wolli-lenzen commented Jul 22, 2022

wolli-lenzen commented Jul 25, 2022

justinretzolk commented Jul 25, 2022

nikunjundhad commented Aug 18, 2022 • edited Loading

nikunjundhad commented Oct 19, 2022

roshanjoseph23 commented Oct 28, 2022

tgourley01 commented Nov 1, 2022 • edited Loading

slatsinoglou commented Nov 2, 2022

pluksha commented Nov 23, 2022

vgarkusha commented Dec 15, 2022

danielcweber commented Dec 15, 2022

LennyCastaneda commented Jan 6, 2023

Tom-Carpenter commented Feb 2, 2023

joeynaor commented Feb 27, 2023

github-actions bot commented Mar 30, 2023

kastlbo commented Mar 9, 2022 •

edited by justinretzolk

Loading

nikunjundhad commented Aug 18, 2022 •

edited

Loading

tgourley01 commented Nov 1, 2022 •

edited

Loading