azurerm_network_security_group_association - prevent deadlock between association and network interface creation #4501

nitzanm · 2019-10-03T17:17:24Z

I believe we have found a bug in the Azure Resource Manager provider for Terraform, having to do with the azurerm_subnet_network_security_group_association resource and its interaction with azurerm_network_interface.

The issue is that the azurerm_subnet_network_security_group_association resource, when being created, locks the following resources, in the following order:

Network security group
Virtual network
Subnet

However, the azurerm_network_interface resource, when being created, locks the following resources, in the following order:

Network security group
Subnet
Virtual network

You will notice that both lock the virtual network and subnet, but in the opposite order. This means that if both resources happen to be created at the same time in two threads, the following could happen:

The association successfully locks the virtual network.
The network interface successfully locks the subnet.
The association now tries to lock the subnet, but it can't, because the network interface already locked it.
The network interface now tries to lock the virtual network, but it can't, because the association already locked it.

In this situation, the two resources will wait indefinitely for each other until Terraform is terminated - a deadlock. Our trigger for finding this bug was that Terraform would be trying to create interfaces and associations forever. Of course, this only happens intermittently, because it depends on the exact order above happening. For example (resource names redacted):

azurerm_network_interface.interface1: Still creating... [58m30s elapsed]
azurerm_network_interface.interface2: Still creating... [58m30s elapsed]
azurerm_network_interface.interface3: Still creating... [58m30s elapsed]
azurerm_subnet_network_security_group_association.association1: Still creating... [58m20s elapsed]

To fix this, we simply need to lock the three resources in the same order across both resources. As a workaround, we are currently having all network interfaces depend on our NSG association, since that will ensure both are never created at the same time.

As an aside, is there any way to review all locks across the provider, to ensure this situation doesn't happen elsewhere? Perhaps there could be a policy to always lock resources in alphabetical order so that in the future this doesn't happen again?

Update from master TF repo

…_interface, caused by them locking the same resources, but in a different order.

katbyte

Fantastic spot @nitzanm!

Thank you very much for this fix, LGTM and I'll review all other lock orders in the provider now.

…ndle private_dns zone Using the upstream azurerm provider is not possible for now because of following reasons: 1) There is not srv record resource for private dns zone 2) The version of provider that has the private dns zone resources `1.34.0` has a lot of bugs like * hashicorp/terraform-provider-azurerm#4452 * hashicorp/terraform-provider-azurerm#4453 * hashicorp/terraform-provider-azurerm#4501 Some of these bugs are fixed, and some are in flight. Another reason moving to `1.36.0` which might have all the fixes we need is the provider has moved to using `standalone terraform plugin SDK v1.1.1` [1]. Because we vendor both terraform and providers, this causes errors like `panic: gob: registering duplicate types for "github.com/zclconf/go-cty/cty.primitiveType": cty.primitiveType != cty.primitiveType` Therefore, we would have to move towards a single vendor for terraform and plugins for correct inter-operation, which is tricker due to conflicts elsewhere A simple 4 resource plugin that re-uses the already vendored azurerm provider as library and carries over the required resources seems like an easy fix for now. [1]: hashicorp/terraform-provider-azurerm#4474

ghost · 2019-10-29T10:34:21Z

This has been released in version 1.36.0 of the provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. As an example:

provider "azurerm" {
    version = "~> 1.36.0"
}
# ... other configuration ...

ghost · 2019-11-04T14:50:41Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 hashibot-feedback@hashicorp.com. Thanks!

nitzanm added 2 commits October 3, 2019 09:31

Merge pull request #1 from terraform-providers/master

9123139

Update from master TF repo

Fix a deadlock between network_security_group_association and network…

e9aa07a

…_interface, caused by them locking the same resources, but in a different order.

ghost added the size/XS label Oct 3, 2019

Removed extra linebreak.

3fd89e9

katbyte added this to the v1.35.0 milestone Oct 3, 2019

katbyte added the service/network-security label Oct 3, 2019

abhinavdahiya mentioned this pull request Oct 3, 2019

terraform: update aws, azure, google providers to latest openshift/installer#2396

Closed

tombuildsstuff modified the milestones: v1.35.0, v1.36.0 Oct 4, 2019

katbyte approved these changes Oct 5, 2019

View reviewed changes

katbyte changed the title ~~Fix deadlock between NSG association and network interface creation~~ azurerm_network_security_group_association - prevent deadlock between association and network interface creation Oct 5, 2019

katbyte merged commit 4c93e42 into hashicorp:master Oct 5, 2019

katbyte added a commit that referenced this pull request Oct 5, 2019

Update CHANGELOG.md to include #4501

7d06127

katbyte mentioned this pull request Oct 5, 2019

provider: ensure all locks are done in the same order #4527

Merged

abhinavdahiya mentioned this pull request Oct 8, 2019

Use PrivateDNSZones instead of DNSZones type Private for clusters openshift/installer#2470

Merged

ghost locked and limited conversation to collaborators Nov 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

azurerm_network_security_group_association - prevent deadlock between association and network interface creation #4501

azurerm_network_security_group_association - prevent deadlock between association and network interface creation #4501

nitzanm commented Oct 3, 2019

katbyte left a comment

ghost commented Oct 29, 2019

ghost commented Nov 4, 2019

azurerm_network_security_group_association - prevent deadlock between association and network interface creation #4501

azurerm_network_security_group_association - prevent deadlock between association and network interface creation #4501

Conversation

nitzanm commented Oct 3, 2019

katbyte left a comment

Choose a reason for hiding this comment

ghost commented Oct 29, 2019

ghost commented Nov 4, 2019