Management Locks need a better lifecycle support #23768

tim-chaffin · 2023-11-02T17:42:42Z

Is there an existing issue for this?

I have searched the existing issues

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment and review the contribution guide to help.

Description

Often, we as a community, through various GH noted issues (below) have found it a challenge to use the azurerm_management_lock.

Here's the context:
We want to apply management locks through Terraform, so that users in Azure Portal do not accidentally delete a resource, causing a disruption. Or in some cases, we don't want people to even change a resource because of security implications, like a network security group, a firewall, or other related security settings.

Problem statement:
When we try to apply a resource group, Terraform doesn't always have a reliable resource graph relationship associated to that lock. So, assume I have a network security group within a resource group called "rg-network-security". And I have applied a resource group lock, to protect those resources.

But, our SecOps team has written a new PR to the TF, dropping an NSG rule, or something similar... the Terraform will error out on apply because the scope of the management block is not included in the plan.

Work arounds:
Now, you can create a crazy mess of "depends_on" statements to try and force this behavior, but its not guaranteed, and its a lot of work.

Proposed solution / feature:
Much like Azure Blueprints, or Azure Stacks, allow certain managed identities or service principals to circumvent the resource lock. This way, you don't have to worry about the dependency graph, and you can allow certain elevated identities to do the work, where its needed.

After reviewing the code, I considered suggesting that we presume the azurerm_client_config.current.object_id is "assumed" as an ID that can override the lock. However, it also occurred to me that very sensitive resources, like SSH keys, or Key Vaults and so on, should not be deleted or modified by TF either, unless manual intervention occurrs.

New or Affected Resource(s)/Data Source(s)

azurerm_management_lock

Potential Terraform Configuration

# Suggestion for a subscription lock:
resource "azurerm_management_lock" "subscription-level" {
  name             = "subscription-level"
  scope            = data.azurerm_subscription.current.id
  lock_level       = "ReadOnly"
  lock_override_id = [
    azuread_service_principal.example.object_id,
    data.azurerm_client_config.current.object_id
  ]
  notes            = "This is a production subscription, and cannot be modified manually. Please use the Terraform workflows in GitHub."
}

# Suggestion for a resource group lock:
resource "azurerm_management_lock" "resource-group-level" {
  name             = "resource-group-level"
  scope            = azurerm_resource_group.network_security.id
  lock_level       = "ReadOnly"
  lock_override_id = [
    data.azurerm_client_config.current.object_id,
    data.azuread_group.security_group.object_id
  ]
  notes            = "This Resource Group is Read-Only. Only changes through Terraform, or through the Cybersecurity team may be made."
}

# Suggestion for an individual resource:
resource "azurerm_management_lock" "cosmosdb" {
  name             = "cosmosdb"
  scope            = azurerm_cosmosdb_account.example.id
  lock_level       = "CanNotDelete"
  lock_override_id = [
    data.azurerm_client_config.current.object_id
  ]
  notes            = "CosmosDB is locked because it's needed by a third-party. Only the Terraform workflows may destroy this account."
}

References

Related GH issues:

cveld · 2023-11-11T07:54:37Z

It would be cool if terraform cli introduces new semantics for this so that the azurerm provider can do the following:

Check if a lock is part of the state
If so, plan to temporarily remove it
Plan the main operations
Plan the lock recovery
Provide this plan to the user for review.
Maybe introduce policies when temporary lock removal plan suggestion is desired, e.g. rule out key vault operations.

Same would be cool for firewall rules, allowing temporary access to the data plane. But this involves refactoring the refresh stage as well.

rcskosir added thinking service/management-groups labels Nov 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Management Locks need a better lifecycle support #23768

Management Locks need a better lifecycle support #23768

tim-chaffin commented Nov 2, 2023 •

edited

Loading

cveld commented Nov 11, 2023

Management Locks need a better lifecycle support #23768

Management Locks need a better lifecycle support #23768

Comments

tim-chaffin commented Nov 2, 2023 • edited Loading

Is there an existing issue for this?

Community Note

Description

New or Affected Resource(s)/Data Source(s)

Potential Terraform Configuration

References

cveld commented Nov 11, 2023

tim-chaffin commented Nov 2, 2023 •

edited

Loading