Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Management Locks need a better lifecycle support #23768

Open
1 task done
tim-chaffin opened this issue Nov 2, 2023 · 1 comment
Open
1 task done

Management Locks need a better lifecycle support #23768

tim-chaffin opened this issue Nov 2, 2023 · 1 comment

Comments

@tim-chaffin
Copy link

tim-chaffin commented Nov 2, 2023

Is there an existing issue for this?

  • I have searched the existing issues

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment and review the contribution guide to help.

Description

Often, we as a community, through various GH noted issues (below) have found it a challenge to use the azurerm_management_lock.

Here's the context:
We want to apply management locks through Terraform, so that users in Azure Portal do not accidentally delete a resource, causing a disruption. Or in some cases, we don't want people to even change a resource because of security implications, like a network security group, a firewall, or other related security settings.

Problem statement:
When we try to apply a resource group, Terraform doesn't always have a reliable resource graph relationship associated to that lock. So, assume I have a network security group within a resource group called "rg-network-security". And I have applied a resource group lock, to protect those resources.

But, our SecOps team has written a new PR to the TF, dropping an NSG rule, or something similar... the Terraform will error out on apply because the scope of the management block is not included in the plan.

Work arounds:
Now, you can create a crazy mess of "depends_on" statements to try and force this behavior, but its not guaranteed, and its a lot of work.

Proposed solution / feature:
Much like Azure Blueprints, or Azure Stacks, allow certain managed identities or service principals to circumvent the resource lock. This way, you don't have to worry about the dependency graph, and you can allow certain elevated identities to do the work, where its needed.

After reviewing the code, I considered suggesting that we presume the azurerm_client_config.current.object_id is "assumed" as an ID that can override the lock. However, it also occurred to me that very sensitive resources, like SSH keys, or Key Vaults and so on, should not be deleted or modified by TF either, unless manual intervention occurrs.

New or Affected Resource(s)/Data Source(s)

azurerm_management_lock

Potential Terraform Configuration

# Suggestion for a subscription lock:
resource "azurerm_management_lock" "subscription-level" {
  name             = "subscription-level"
  scope            = data.azurerm_subscription.current.id
  lock_level       = "ReadOnly"
  lock_override_id = [
    azuread_service_principal.example.object_id,
    data.azurerm_client_config.current.object_id
  ]
  notes            = "This is a production subscription, and cannot be modified manually. Please use the Terraform workflows in GitHub."
}

# Suggestion for a resource group lock:
resource "azurerm_management_lock" "resource-group-level" {
  name             = "resource-group-level"
  scope            = azurerm_resource_group.network_security.id
  lock_level       = "ReadOnly"
  lock_override_id = [
    data.azurerm_client_config.current.object_id,
    data.azuread_group.security_group.object_id
  ]
  notes            = "This Resource Group is Read-Only. Only changes through Terraform, or through the Cybersecurity team may be made."
}

# Suggestion for an individual resource:
resource "azurerm_management_lock" "cosmosdb" {
  name             = "cosmosdb"
  scope            = azurerm_cosmosdb_account.example.id
  lock_level       = "CanNotDelete"
  lock_override_id = [
    data.azurerm_client_config.current.object_id
  ]
  notes            = "CosmosDB is locked because it's needed by a third-party. Only the Terraform workflows may destroy this account."
}

References

Related GH issues:

@cveld
Copy link

cveld commented Nov 11, 2023

It would be cool if terraform cli introduces new semantics for this so that the azurerm provider can do the following:

  1. Check if a lock is part of the state
  2. If so, plan to temporarily remove it
  3. Plan the main operations
  4. Plan the lock recovery
    Provide this plan to the user for review.
    Maybe introduce policies when temporary lock removal plan suggestion is desired, e.g. rule out key vault operations.

Same would be cool for firewall rules, allowing temporary access to the data plane. But this involves refactoring the refresh stage as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants