Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keyvault: populating the Key Vaults cache using the KeyVault Resource Provider #24019

Conversation

tombuildsstuff
Copy link
Contributor

@tombuildsstuff tombuildsstuff commented Nov 24, 2023

There's been a plethora of attempts to fix this issue over the years, so this one requires a little background to under the choices involved here.

Background

When provisioning Nested Items within a Key Vault, that is:

  • Certificates (using the unversioned URI https://myvault.vault.azure.net/certificates/mycert1 - or the versioned URI https://myvault.vault.azure.net/certificates/mycert1/abc123).
  • Keys (using the unversioned URI https://myvault.vault.azure.net/keys/mykey1 - or the versioned URI https://myvault.vault.azure.net/keys/mykey1/abc123).
  • Secrets (using the unversioned URI https://myvault.vault.azure.net/secrets/mysecret1 - or the versioned URI https://myvault.vault.azure.net/secrets/mysecret1/abc123).
  • Storage Accounts (using the URI https://myvault.vault.azure.net/storage/account1).

These have to be provisioned using the Data Plane endpoint (that is, the API hosted at https://{account}.vault.azure.net) rather than the Resource Manager API due to a disparity in the available functionality.

This means that Terraform needs to be able to access the Key Vault Data Plane API in order to be able to both provision items in it, but also to refresh it's state.

It's worth noting that it's possible for a Key Vault to be available but not accessible - for example when a key Vault has Public Network Access disabled, or where the IP Address where Terraform is being run doesn't have access to the Key Vault.

In addition it's possible for the Key Vault to be available but (temporarily) inaccessible - which are typically DNS issues (particularly around Private Link where the DNS replication can take some time to update).

As such Terraform needs a reliable manner of determining when a Key Vault is available and accessible - and when it's available but inaccessible - and when it's unavailable (i.e. deleted) in order to be able to surface the correct behaviour.

Ultimately this allows us to support:

  1. When a Key Vault has been deleted outside of Terraform, the next Plan will detect that both the Key Vault and it's Nested Items require recreation/should become untracked.
  2. When a Key Vault is available but inaccessible - we should raise an error stating we can't connect to the Key Vault, rather than marking the Nested Items as untracked - since these items aren't deleted, removing from the state and attempting to recreate them will lead to an error during recreation, due to the requires import check.

Problem

When refreshing the Terraform State we have access to only the Resource ID - which for a Nested Item is in the format:

https://vault1.vaults.azure.com/secrets/example1/abc123

Which can get parsed into:

  • Vault Name: vault1
  • Type: secrets
  • Name: example1
  • Version (optional): abc123

However in order to be able to access the Key Vault via the Resource Manager API we need to know both the Subscription ID and Resource Group name where the Key Vault exists.

There's three ways of doing this:

  1. Using the Resources API within the Microsoft.Resources Resource Provider (which we're doing today).
  2. Using the Resources API within the Microsoft.ResourceGraph Resource Provider.
  • Whilst suggested this removes the caching related issues, unfortunately we've encountered outdated information through this endpoint - so we're not convinced this would improve matters.
  1. Using the ListBySubscriptions API within the Key Vault Resource Provider.

Options 1 and 3 have shipped in the provider, however each has been investigated - ultimately these endpoints are used to populate a cache of the Key Vault Name (the key) to MetaData about the Key Vault (e.g. KeyVaultId, DataPlaneBaseUri, ResourceGroup).

This means that once the Key Vault has been obtained once per Provider Instantiation (which happens once at Plan and once at Apply time) - it's cached within the Provider.

However both implementations have a caching issue where when a Key Vault is not present within the cache - it's obtained from the Azure API one at a time. Whilst this does have a minor performance benefit when provisioning a single Key Vault within a Subscription - it means that users provisioning against multiple Key Vaults can hit up against the Timeout for a given Resource (context deadline exceeded).

Solution

This PR changes how the cache is populated, such that when looking up a Key Vault we now:

  1. Determine the cache Cache Key for this Key Vault
  2. Check if this Key Vault is present in the Cache, returning it if so.
  3. If not, we retrieve ALL of the Key Vaults within the specified Subscription and populate ALL of them into the Cache.
  4. Re-check the cache for the Key Vault by it's Cache Key - returning it if so.
  5. If not we return nil so that Resources and Data Sources can surface as needed.

It's worth worth noting that the Cache is populated/evicted as Key Vaults as Provisioned/Deleted - as such whilst we are hitting the ListBySubscription endpoint which (from memory) has a lower rate-limit - since we're fully populating the cache in this implementation means this should only happen one-time.


FWIW this is almost the same approach that we use for Storage Accounts too, now that the caching issue has been resolved.

Notes

osfidelity pushed a commit to fidelity-contributions/hashicorp-terraform-provider-azurerm-archive that referenced this pull request Nov 27, 2023
…her than the Resources API

Similar to hashicorp#24023 and hashicorp#24019 this PR refactors the Private DNS Zone data source to use the ListBySubscription API
rather than listing on the Resources API.

This should both improve reliability, since we're no longer hitting the Resources API (which can serve stale data)
but also removes another usage of `Azure/azure-sdk-for-go`
Copy link
Member

@jackofallops jackofallops left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tombuildsstuff - Just a couple goimports to fix-up, but otherwise LGTM 👍

…against rather than the Resources Client

This is for two-reasons:

1. The `Azure/azure-sdk-for-go` clients are scoped to a single Subscription, meaning we (currently) have a limitation of the
   Key Vault being within the same subscription.
2. This allows us to change this as needed in the future to query against other Subscriptions - handling cross-Subscription
   items within a Key Vault - and provides a path to supporting this for Subscriptions in other Tenants too (since we should
   be able to query the list of current Tenants to find the Subscription where a given Key Vault exists. That obviously requires
   additional thought/investigation - but it sets us up nicely for the future.
```
 $ TF_ACC=1 go test -v ./internal/services/keyvault -run=TestAccKeyVaultCertificateIssuer_ -timeout=60m
=== RUN   TestAccKeyVaultCertificateIssuer_basic
=== PAUSE TestAccKeyVaultCertificateIssuer_basic
=== RUN   TestAccKeyVaultCertificateIssuer_requiresImport
=== PAUSE TestAccKeyVaultCertificateIssuer_requiresImport
=== RUN   TestAccKeyVaultCertificateIssuer_complete
=== PAUSE TestAccKeyVaultCertificateIssuer_complete
=== RUN   TestAccKeyVaultCertificateIssuer_update
=== PAUSE TestAccKeyVaultCertificateIssuer_update
=== RUN   TestAccKeyVaultCertificateIssuer_disappears
=== PAUSE TestAccKeyVaultCertificateIssuer_disappears
=== CONT  TestAccKeyVaultCertificateIssuer_basic
=== CONT  TestAccKeyVaultCertificateIssuer_update
=== CONT  TestAccKeyVaultCertificateIssuer_complete
=== CONT  TestAccKeyVaultCertificateIssuer_requiresImport
=== CONT  TestAccKeyVaultCertificateIssuer_disappears
--- PASS: TestAccKeyVaultCertificateIssuer_disappears (204.28s)

--- PASS: TestAccKeyVaultCertificateIssuer_complete (472.18s)
--- PASS: TestAccKeyVaultCertificateIssuer_basic (483.74s)
--- PASS: TestAccKeyVaultCertificateIssuer_requiresImport (491.51s)
--- PASS: TestAccKeyVaultCertificateIssuer_update (527.73s)
PASS
ok  	github.com/hashicorp/terraform-provider-azurerm/internal/services/keyvault	528.544s
```
@tombuildsstuff tombuildsstuff force-pushed the f/keyvault-populating-cache-from-the-keyvault-listbysubscription-endpoint branch from 354a712 to c51d6b7 Compare November 28, 2023 16:38
@tombuildsstuff tombuildsstuff added this to the v3.83.0 milestone Nov 28, 2023
@tombuildsstuff tombuildsstuff merged commit 4e7c2c4 into main Nov 28, 2023
21 checks passed
@tombuildsstuff tombuildsstuff deleted the f/keyvault-populating-cache-from-the-keyvault-listbysubscription-endpoint branch November 28, 2023 18:04
tombuildsstuff added a commit that referenced this pull request Nov 28, 2023
@asifkd012020
Copy link

Does this feature solve #9738 ?

@phoehnel
Copy link

@AndreasMWalter guess you'll like this one ;-)

dduportal added a commit to jenkins-infra/azure that referenced this pull request Dec 12, 2023
<Actions>
<action
id="f410411e63aff4bb73a81c2aec1d373cf8a903e63b30dee2006b0030d8a94cc8">
        <h3>Bump Terraform `azurerm` provider version</h3>
<details
id="1d9343c012f5434ac9fe8a98135bae3667b399259be16d9b14302ea3bd424a24">
            <summary>Update Terraform lock file</summary>
<p>&#34;hashicorp/azurerm&#34; updated from &#34;3.82.0&#34; to
&#34;3.83.0&#34; in file &#34;.terraform.lock.hcl&#34;</p>
            <details>
                <summary>3.83.0</summary>
<pre>Changelog retrieved
from:&#xA;&#x9;https://github.com/hashicorp/terraform-provider-azurerm/releases/tag/v3.83.0&#xA;UPGRADE
NOTES&#xA;&#xA;* Key Vaults are now loaded using [the
`ListBySubscription` API within the Key Vault Resource
Provider](https://learn.microsoft.com/en-us/rest/api/keyvault/keyvault/vaults/list-by-subscription?view=rest-keyvault-keyvault-2022-07-01&amp;tabs=HTTP)
rather than [the Resources
API](https://learn.microsoft.com/en-us/rest/api/keyvault/keyvault/vaults/list?view=rest-keyvault-keyvault-2022-07-01&amp;tabs=HTTP).
This change means that the Provider now caches the list of Key Vaults
available within a Subscription, rather than loading these piecemeal to
workaround stale data returned from the Resources API
([#24019](https://github.com/hashicorp/terraform-provider-azurerm/issues/24019))&#xA;&#xA;FEATURES:&#xA;&#xA;*
New Data Source: `azurerm_stack_hci_cluster`
([#24032](https://github.com/hashicorp/terraform-provider-azurerm/issues/24032))&#xA;&#xA;ENHANCEMENTS:&#xA;&#xA;*
dependencies: updating to `v0.20231129.1103252` of
`github.com/hashicorp/go-azure-sdk`
([#24063](hashicorp/terraform-provider-azurerm#24063
`automation`: updating to API Version `2023-11-01`
([#24017](hashicorp/terraform-provider-azurerm#24017
`keyvault`: the cache is now populated using the `ListBySubscription`
endpoint on the KeyVault Resource Provider rather than via the
`Resources` API
([#24019](hashicorp/terraform-provider-azurerm#24019
`keyvault`: updating the cache to populate all Key Vaults available
within the Subscription to reduce the number of API calls
([#24019](hashicorp/terraform-provider-azurerm#24019
Data Source `azurerm_private_dns_zone`: refactoring to use the
`ListBySubscription` API rather than the Resources API when
`resource_group_name` is omitted
([#24024](hashicorp/terraform-provider-azurerm#24024
`azurerm_dashboard_grafana` - support for `grafana_major_version`
([#24014](hashicorp/terraform-provider-azurerm#24014
`azurerm_linux_web_app` - add support for dotnet 8
([#23893](hashicorp/terraform-provider-azurerm#23893
`azurerm_linux_web_app_slot` - add support for dotnet 8
([#23893](hashicorp/terraform-provider-azurerm#23893
`azurerm_media_transform` - deprecate `face_detector_preset` and
`video_analyzer_preset`
([#24002](hashicorp/terraform-provider-azurerm#24002
`azurerm_postgresql_database` - update the validation of `collation` to
include `Norwegian_Norway.1252`
([#24070](hashicorp/terraform-provider-azurerm#24070
`azurerm_postgresql_flexible_server` - updating to API Version
`2023-06-01-preview`
([#24016](hashicorp/terraform-provider-azurerm#24016
`azurerm_redis_cache` - support for the
`active_directory_authentication_enabled` property
([#23976](hashicorp/terraform-provider-azurerm#23976
`azurerm_windows_web_app` - add support for dotnet 8
([#23893](hashicorp/terraform-provider-azurerm#23893
`azurerm_windows_web_app_slot` - add support for dotnet 8
([#23893](hashicorp/terraform-provider-azurerm#23893
`azurerm_storage_account` - add `name` validation in custom diff
([#23799](https://github.com/hashicorp/terraform-provider-azurerm/issues/23799))&#xA;&#xA;BUG
FIXES:&#xA;&#xA;* authentication: fix a bug where auxiliary tenants were
not correctly authorized
([#24063](hashicorp/terraform-provider-azurerm#24063
`azurerm_app_configuration` - normalize location in `replica` block
([#24074](hashicorp/terraform-provider-azurerm#24074
`azurerm_cosmosdb_account` - cosmosdb version and capabilities can now
be updated at the same time
([#24029](hashicorp/terraform-provider-azurerm#24029
`azurerm_data_factory_flowlet_data_flow` - `source` and `sink`
properties are now optional
([#23987](hashicorp/terraform-provider-azurerm#23987
`azurerm_datadog_monitor_tag_rule` - correctly handle default rule
([#22806](hashicorp/terraform-provider-azurerm#22806
`azurerm_ip_group`: fixing a crash when `firewall_ids` and
`firewall_policy_ids` weren&#39;t parsed correctly from the API Response
([#24031](hashicorp/terraform-provider-azurerm#24031
`azurerm_nginx_deployment` - add default value of `20` for `capacity`
([#24033](https://github.com/hashicorp/terraform-provider-azurerm/issues/24033))&#xA;&#xA;&#xA;</pre>
            </details>
            <details>
                <summary>3.84.0</summary>
<pre>Changelog retrieved
from:&#xA;&#x9;https://github.com/hashicorp/terraform-provider-azurerm/releases/tag/v3.84.0&#xA;FEATURES:&#xA;&#xA;*
**New Data Source:** `azurerm_storage_containers`
([#24061](hashicorp/terraform-provider-azurerm#24061
**New Resource:** `azurerm_elastic_san`
([#23619](hashicorp/terraform-provider-azurerm#23619
**New Resource:**
`azurerm_key_vault_managed_hardware_security_module_role_assignment`
([#22332](hashicorp/terraform-provider-azurerm#22332
**New Resource:**
`azurerm_key_vault_managed_hardware_security_module_role_definition`
([#22332](https://github.com/hashicorp/terraform-provider-azurerm/issues/22332))&#xA;&#xA;ENHANCEMENTS:&#xA;&#xA;*
dependencies: updating mssql elasticpools from `v5.0` to
`2023-05-01-preview`&#xA;* dependencies: updating to
`v0.20231207.1122031` of `github.com/hashicorp/go-azure-sdk`
([#24149](hashicorp/terraform-provider-azurerm#24149
Data Source: `azurerm_storage_account` - export the primary and
secondary internet and microsoft hostnames for blobs, dfs, files,
queues, tables and web
([#23517](hashicorp/terraform-provider-azurerm#23517
Data Source: `azurerm_cosmosdb_account` - export the
`connection_strings`, `primary_sql_connection_string`,
`secondary_sql_connection_string`,
`primary_readonly_sql_connection_string`,
`secondary_readonly_sql_connection_string`,
`primary_mongodb_connection_string`,
`secondary_mongodb_connection_string`,
`primary_readonly_mongodb_connection_string`, and
`secondary_readonly_mongodb_connection_string` attributes
([#24129](hashicorp/terraform-provider-azurerm#24129
`azurerm_bot_service_azure_bot` - support for the
`public_network_access_enabled` property
([#24125](hashicorp/terraform-provider-azurerm#24125
`azurerm_container_app_environment` - support for the `workload_profile`
property
([#23478](hashicorp/terraform-provider-azurerm#23478
`azurerm_cosmosdb_cassandra_datacenter` - support for the
`seed_node_ip_addresses` property
([#24076](hashicorp/terraform-provider-azurerm#24076
`azurerm_firewall` - support for the `dns_proxy_enabled` property
([#20519](hashicorp/terraform-provider-azurerm#20519
`azurerm_kubernetes_cluster` - support for the `support_plan` property
and the `sku_tier` `Premium`
([#23970](hashicorp/terraform-provider-azurerm#23970
`azurerm_mssql_database` - support for `enclave_type` field
([#24054](hashicorp/terraform-provider-azurerm#24054
`azurerm_mssql_elasticpool` - support for `enclave_type` field
([#24054](hashicorp/terraform-provider-azurerm#24054
`azurerm_mssql_managed_instance` - support for more `vcores`: `6`, `10`,
`12`, `20`, `48`, `56`, `96`, `128`
([#24085](hashicorp/terraform-provider-azurerm#24085
`azurerm_redis_linked_server` - support for the property
`geo_replicated_primary_host_name`
([#23984](hashicorp/terraform-provider-azurerm#23984
`azurerm_storage_account` - expose the primary and secondary internet
and microsoft hostnames for blobs, dfs, files, queues, tables and web
([#23517](hashicorp/terraform-provider-azurerm#23517
`azurerm_synapse_role_assignment` - support for the `principal_type`
property
([#24089](hashicorp/terraform-provider-azurerm#24089
`azurerm_spring_cloud_build_deployment` - support for the
`application_performance_monitoring_ids` property
([#23969](hashicorp/terraform-provider-azurerm#23969
`azurerm_virtual_network_gateway` - support for the
`bgp_route_translation_for_nat_enabled`, `dns_forwarding_enabled`,
`ip_sec_replay_protection_enabled`, `remote_vnet_traffic_enabled`,
`virtual_wan_traffic_enabled`, `radius_server`,
`virtual_network_gateway_client_connection`, `policy_group`, and
`ipsec_policy` property
([#23220](https://github.com/hashicorp/terraform-provider-azurerm/issues/23220))&#xA;&#xA;BUG
FIXES:&#xA;&#xA;* `azurerm_application_insights_api_key` - prevent a bug
where multiple keys couldn&#39;t be created for an Application Insights
instance
([#23463](hashicorp/terraform-provider-azurerm#23463
`azurerm_container_registry` - the `network_rule_set.virtual_network`
property has been deprecated
([#24140](hashicorp/terraform-provider-azurerm#24140
`azurerm_hdinsight_hadoop_cluster` - set
`roles.edge_node.install_script_action.parameters` into state by
retrieving the value provided in the user config since this property
isn&#39;t returned by the API
([#23971](hashicorp/terraform-provider-azurerm#23971
`azurerm_kubernetes_cluster` - prevent a bug where maintenance window
start date was always recalculated and sent to the API
([#23985](hashicorp/terraform-provider-azurerm#23985
`azurerm_mssql_database` - will no longer send all long retention values
in payload unless set
([#24124](hashicorp/terraform-provider-azurerm#24124
`azurerm_mssql_managed_database` - will no longer send all long
retention values in payload unless set
([#24124](hashicorp/terraform-provider-azurerm#24124
`azurerm_mssql_server_microsoft_support_auditing_policy` - only include
storage endpoint in payload if set
([#24122](hashicorp/terraform-provider-azurerm#24122
`azurerm_mobile_network_packet_core_control_plane` - prevent a panic if
the HTTP Response is nil
([#24083](hashicorp/terraform-provider-azurerm#24083
`azurerm_storage_account` - revert plan time name validation `(#23799)`
([#24142](hashicorp/terraform-provider-azurerm#24142
`azurerm_web_application_firewall_policy` - split create and update
function to fix lifecycle - ignore changes
([#23412](https://github.com/hashicorp/terraform-provider-azurerm/issues/23412))&#xA;&#xA;&#xA;</pre>
            </details>
        </details>
<a
href="https://infra.ci.jenkins.io/job/terraform-jobs/job/azure/job/main/931/">Jenkins
pipeline link</a>
    </action>
</Actions>

---

<table>
  <tr>
    <td width="77">
<img src="https://www.updatecli.io/images/updatecli.png" alt="Updatecli
logo" width="50" height="50">
    </td>
    <td>
      <p>
Created automatically by <a
href="https://www.updatecli.io/">Updatecli</a>
      </p>
      <details><summary>Options:</summary>
        <br />
<p>Most of Updatecli configuration is done via <a
href="https://www.updatecli.io/docs/prologue/quick-start/">its
manifest(s)</a>.</p>
        <ul>
<li>If you close this pull request, Updatecli will automatically reopen
it, the next time it runs.</li>
<li>If you close this pull request and delete the base branch, Updatecli
will automatically recreate it, erasing all previous commits made.</li>
        </ul>
        <p>
Feel free to report any issues at <a
href="https://github.com/updatecli/updatecli/issues">github.com/updatecli/updatecli</a>.<br
/>
If you find this tool useful, do not hesitate to star <a
href="https://github.com/updatecli/updatecli/stargazers">our GitHub
repository</a> as a sign of appreciation, and/or to tell us directly on
our <a
href="https://matrix.to/#/#Updatecli_community:gitter.im">chat</a>!
        </p>
      </details>
    </td>
  </tr>
</table>

---------

Co-authored-by: Jenkins Infra Bot (updatecli) <60776566+jenkins-infra-bot@users.noreply.github.com>
Co-authored-by: Damien Duportal <damien.duportal@gmail.com>
Copy link

github-actions bot commented May 5, 2024

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 5, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.