Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Every terraform import command triggers reading all data sources – leads to module refactoring taking an inordinately long time #32385

Closed
andersthorbeck opened this issue Dec 13, 2022 · 25 comments
Labels
enhancement import Importing resources

Comments

@andersthorbeck
Copy link

Terraform Version

Terraform v1.3.6
on darwin_amd64
+ provider registry.terraform.io/hashicorp/azuread v2.31.0
+ provider registry.terraform.io/hashicorp/azurerm v3.34.0
+ provider registry.terraform.io/integrations/github v5.11.0

Use Cases

As Terraform root modules/states grow in size, they get to a point where they become too unwieldy and need to be split up until several smaller modules. In order to split up such modules, you need to run the terraform import command on hundreds of resources in the new module, and terraform state rm on the corresponding resources in the old module. Moreover, working as part of a larger team/organization, you want this migration to be completed as quickly as possible, to minimize the chances that others might be running interfering terraform apply commands against these two modules at the same time, which may lead to deletion of the resources you were trying to move.

Since some weeks or months ago, there seems to have been a change introduced where the terraform import command automatically refreshes the entire state before it actually performs the requested import. This did not use to be the case. When splitting very large Terraform states, this is very detrimental, as it triggers reads against remote state backends and providers for hundreds of resources, for every single resource to be imported. In other words, importing N resources has now become an O(N^2) operation. We recently attempted to split out a part of a large module into a smaller cohesive module, but aborted the attempt after the import script had taken 6 hours without an end in sight.

From my recollection, this automatic refresh as part of the terraform import has been observed on both the github, azurerm and azuread providers, so the behaviour seems to come from Terraform Core. I have not found any mention of this change in the Terraform Core CHANGELOG, so I cannot find specifically when and how the change was introduced, but I believe it was introduced at some point between at least 2022-05-03 and 2022-11-25 (probably closed to the later end).

Attempted Solutions

I haven't really found any good workarounds to this, short of accepting that splitting up and migrating parts of terraform states will take ages, and that you're susceptible to ruining your resources in the meantime. There doesn't seem to be any option to opt out of the automatic refresh on every single call to terraform import.

Proposal

I envision two possible solutions to this:

  1. Introduce a flag -no-refresh (name up for debate) to the terraform import command to disable the automatic refresh before attempting the import, and leave it up to the caller to have manually performed a refresh before the call to terraform import.
  2. Allow the terraform import command to take a list of resources to be imported, not just one at a time, and perform the automatic refresh only once per call to the command, not before each supplied resource. The resources could be supplied via accepting multiple (ADDR ID) argument pairs, or (perhaps more sensibly) via an input file where each line contains one ADDR ID argument pair.

References

I have not found any related GitHub issues or pull requests.

@andersthorbeck andersthorbeck added enhancement new new issue not yet triaged labels Dec 13, 2022
@kmoe kmoe added import Importing resources and removed new new issue not yet triaged labels Dec 13, 2022
@kmoe
Copy link
Member

kmoe commented Dec 13, 2022

Thanks for the issue. In the v1.3 release we made some internal changes to import that make it share more code with plan and fix some bugs. This was not expected to have any negative user-facing consequences, since import is usually a one-off operation. Now that import is constructing the plan graph, I think adding a -refresh=false option, like the corresponding option for plan, is perfectly reasonable. Thanks!

The second of your proposals is covered by #22219.

@kmoe kmoe self-assigned this Dec 13, 2022
@kmoe
Copy link
Member

kmoe commented Dec 14, 2022

Having now actually checked the code, it seems import is not running a full refresh:
https://github.com/hashicorp/terraform/blob/main/internal/terraform/graph_builder_plan.go#L302

@andersthorbeck why do you believe a full refresh is being run? Can you share trace logs?

It's possible that another v1.3-related change is the cause of the behaviour you are observing. Could it be that reading all of the data sources takes a long time?
Possibly related: #27934

@gregmoy
Copy link

gregmoy commented Dec 14, 2022

I'm seeing this in my scripts that I use to bulk task-def update my ECS clusters. I do a terraform state rm then a terraform import and it seems to be pulling the current state of everything on each invocation of import. This started happening after upgrading to from ~1.2 to 1.3.3

@gregmoy
Copy link

gregmoy commented Dec 14, 2022

@kmoe its a lot of Reading... for every asset then Still reading... then finally Read complete after 16s and gets worse as you have more stuff

@andersthorbeck
Copy link
Author

andersthorbeck commented Dec 15, 2022

Having now actually checked the code, it seems import is not running a full refresh: https://github.com/hashicorp/terraform/blob/main/internal/terraform/graph_builder_plan.go#L302

@andersthorbeck why do you believe a full refresh is being run? Can you share trace logs?

It's possible that another v1.3-related change is the cause of the behaviour you are observing. Could it be that reading all of the data sources takes a long time?
Possibly related: #27934

@kmoe Upon closer inspection, you're right, the terraform import seems to only trigger reading of data blocks, not resources. I interpreted this as a refresh, but maybe that was inaccurate. Nevertheless, the problem remains: in Terraform modules with a great number of data blocks, and especially transitive data blocks from child modules repeatedly used, the sum of all this reading takes a very long time.

I created a dummy example with the follow Terraform configuration (with the Azure subscription ID pseudonymized):

terraform {
  required_version = "1.3.6"

  backend "azurerm" {
    subscription_id      = "12345678-1234-1234-1234-1234567890ab"
    resource_group_name  = "thorbecka-clickops"
    storage_account_name = "thorbeckasandbox"
    container_name       = "tfstate"
    key                  = "import-refresh.tfstate"
  }

  required_providers {
    random = {
      source = "hashicorp/random"
    }
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "3.35.0"
    }
  }
}

provider "azurerm" {
  subscription_id = "12345678-1234-1234-1234-1234567890ab"
  features {}
}

data "azurerm_resource_group" "sandbox" {
  name = "thorbecka-clickops"
}

data "azurerm_storage_account" "sandbox" {
  name                = "thorbeckasandbox"
  resource_group_name = data.azurerm_resource_group.sandbox.name
}

resource "azurerm_storage_container" "dummy_data" {
  name                  = "dummydata"
  storage_account_name  = data.azurerm_storage_account.sandbox.name
  container_access_type = "private"
}

resource "random_integer" "foo" {
  min = 1
  max = 1000
}

resource "random_integer" "bar" {
  min = 1
  max = 1000
}

resource "random_integer" "baz" {
  min = 1
  max = 1000
}

resource "azurerm_storage_blob" "foo" {
  name                   = "foo.txt"
  storage_account_name   = data.azurerm_storage_account.sandbox.name
  storage_container_name = azurerm_storage_container.dummy_data.name
  type                   = "Block"
  source_content         = random_integer.foo.result
}

resource "azurerm_storage_blob" "bar" {
  name                   = "bar.txt"
  storage_account_name   = data.azurerm_storage_account.sandbox.name
  storage_container_name = azurerm_storage_container.dummy_data.name
  type                   = "Block"
  source_content         = random_integer.bar.result
}

resource "azurerm_storage_blob" "baz" {
  name                   = "baz.txt"
  storage_account_name   = data.azurerm_storage_account.sandbox.name
  storage_container_name = azurerm_storage_container.dummy_data.name
  type                   = "Block"
  source_content         = random_integer.baz.result
}

With random_integer.baz and azurerm_storage_blob.baz not present in the remote Terraform state, (but a file manually uploaded to the storage container under URL https://thorbeckasandbox.blob.core.windows.net/dummydata/baz.txt) I ran the terraform import commands:

~/c/s/t/import-refresh> terraform import random_integer.baz 123,1,1000
random_integer.baz: Importing from ID "123,1,1000"...
random_integer.baz: Import prepared!
  Prepared random_integer for import
random_integer.baz: Refreshing state... [id=123]
data.azurerm_resource_group.sandbox: Reading...
data.azurerm_resource_group.sandbox: Read complete after 1s [id=/subscriptions/9fc00512-5796-4772-b2a3-ded958e8064c/resourceGroups/thorbecka-clickops]
data.azurerm_storage_account.sandbox: Reading...
data.azurerm_storage_account.sandbox: Read complete after 1s [id=/subscriptions/9fc00512-5796-4772-b2a3-ded958e8064c/resourceGroups/thorbecka-clickops/providers/Microsoft.Storage/storageAccounts/thorbeckasandbox]

Import successful!

The resources that were imported are shown above. These resources are now in
your Terraform state and will henceforth be managed by Terraform.

~/c/s/t/import-refresh> echo "123" > baz.txt
~/c/s/t/import-refresh> # Upload baz.txt to the dummydata storage container at https://thorbeckasandbox.blob.core.windows.net/dummydata/baz.txt
~/c/s/t/import-refresh> terraform import azurerm_storage_blob.baz https://thorbeckasandbox.blob.core.windows.net/dummydata/baz.txt
data.azurerm_resource_group.sandbox: Reading...
data.azurerm_resource_group.sandbox: Read complete after 2s [id=/subscriptions/9fc00512-5796-4772-b2a3-ded958e8064c/resourceGroups/thorbecka-clickops]
data.azurerm_storage_account.sandbox: Reading...
data.azurerm_storage_account.sandbox: Read complete after 0s [id=/subscriptions/9fc00512-5796-4772-b2a3-ded958e8064c/resourceGroups/thorbecka-clickops/providers/Microsoft.Storage/storageAccounts/thorbeckasandbox]
azurerm_storage_blob.baz: Importing from ID "https://thorbeckasandbox.blob.core.windows.net/dummydata/baz.txt"...
azurerm_storage_blob.baz: Import prepared!
  Prepared azurerm_storage_blob for import
azurerm_storage_blob.baz: Refreshing state... [id=https://thorbeckasandbox.blob.core.windows.net/dummydata/baz.txt]

Import successful!

The resources that were imported are shown above. These resources are now in
your Terraform state and will henceforth be managed by Terraform.

As can be seen from the above logs, the data blocks are read for every terraform import. This is fine for small states, but greatly exacerbated for large states with loads of data blocks.

As a (redacted) real-world example, see the following, but with 10 x as many modules with nested data blocks, and multiplied by hundreds of resources to be imported.

data.azurerm_key_vault.kvredacted: Reading...
data.azurerm_key_vault.kvredacted: Read complete after 1s [id=/subscriptions/***/resourceGroups/rgredacted/providers/Microsoft.KeyVault/vaults/kvredacted]
data.azurerm_key_vault_secret.bringcloudplatform_github_token: Reading...
data.azurerm_key_vault_secret.bringcloudplatform_github_token: Read complete after 2s [id=https://kvredacted.vault.azure.net/secrets/secretredacted/0123456789abcdef0123456789abcdef]
module.repo["redacted01"].data.github_team.this["admins"]: Reading...
module.repo["redacted02"].data.github_team.this["contributors"]: Reading...
module.repo["redacted03"].data.github_team.this["contributors"]: Reading...
module.repo["redacted04"].data.github_team.this["admins"]: Reading...
module.repo["redacted05"].data.github_team.this["contributors"]: Reading...
module.repo["redacted06"].data.github_team.this["admins"]: Reading...
module.repo["redacted07"].data.github_team.this["admins"]: Reading...
module.repo["redacted08"].data.github_team.this["contributors"]: Reading...
module.repo["redacted02"].data.github_team.this["admins"]: Reading...
module.repo["redacted07"].data.github_team.this["contributors"]: Reading...
module.repo["redacted01"].data.github_team.this["admins"]: Still reading... [10s elapsed]
module.repo["redacted07"].data.github_team.this["admins"]: Still reading... [10s elapsed]
module.repo["redacted03"].data.github_team.this["contributors"]: Still reading... [10s elapsed]
module.repo["redacted04"].data.github_team.this["admins"]: Still reading... [10s elapsed]
module.repo["redacted05"].data.github_team.this["contributors"]: Still reading... [10s elapsed]
module.repo["redacted06"].data.github_team.this["admins"]: Still reading... [10s elapsed]
module.repo["redacted02"].data.github_team.this["contributors"]: Still reading... [10s elapsed]
module.repo["redacted08"].data.github_team.this["contributors"]: Still reading... [10s elapsed]
module.repo["redacted02"].data.github_team.this["admins"]: Still reading... [10s elapsed]
module.repo["redacted07"].data.github_team.this["contributors"]: Still reading... [10s elapsed]
module.repo["redacted01"].data.github_team.this["admins"]: Still reading... [20s elapsed]
module.repo["redacted06"].data.github_team.this["admins"]: Still reading... [20s elapsed]
module.repo["redacted03"].data.github_team.this["contributors"]: Still reading... [20s elapsed]
module.repo["redacted02"].data.github_team.this["contributors"]: Still reading... [20s elapsed]
module.repo["redacted07"].data.github_team.this["admins"]: Still reading... [20s elapsed]
module.repo["redacted05"].data.github_team.this["contributors"]: Still reading... [20s elapsed]
module.repo["redacted04"].data.github_team.this["admins"]: Still reading... [20s elapsed]
module.repo["redacted08"].data.github_team.this["contributors"]: Still reading... [20s elapsed]
module.repo["redacted02"].data.github_team.this["admins"]: Still reading... [20s elapsed]
module.repo["redacted07"].data.github_team.this["contributors"]: Still reading... [20s elapsed]
module.repo["redacted03"].data.github_team.this["contributors"]: Read complete after 20s [id=4240487]
module.repo["redacted09"].data.github_team.this["contributors"]: Reading...
module.repo["redacted01"].data.github_team.this["admins"]: Still reading... [30s elapsed]
module.repo["redacted04"].data.github_team.this["admins"]: Still reading... [30s elapsed]
module.repo["redacted05"].data.github_team.this["contributors"]: Still reading... [30s elapsed]
module.repo["redacted06"].data.github_team.this["admins"]: Still reading... [30s elapsed]
module.repo["redacted02"].data.github_team.this["contributors"]: Still reading... [30s elapsed]
module.repo["redacted07"].data.github_team.this["admins"]: Still reading... [30s elapsed]
module.repo["redacted08"].data.github_team.this["contributors"]: Still reading... [30s elapsed]
module.repo["redacted02"].data.github_team.this["admins"]: Still reading... [30s elapsed]
module.repo["redacted07"].data.github_team.this["contributors"]: Still reading... [30s elapsed]
module.repo["redacted09"].data.github_team.this["contributors"]: Still reading... [10s elapsed]
module.repo["redacted01"].data.github_team.this["admins"]: Read complete after 35s [id=4519846]
module.repo["redacted10"].data.github_team.this["admins"]: Reading...
module.repo["redacted07"].data.github_team.this["admins"]: Read complete after 37s [id=4519846]
module.repo["redacted05"].data.github_team.this["admins"]: Reading...
module.repo["redacted02"].data.github_team.this["contributors"]: Read complete after 40s [id=4240487]
module.repo["redacted11"].data.github_team.this["contributors"]: Reading...
module.repo["redacted06"].data.github_team.this["admins"]: Still reading... [40s elapsed]
module.repo["redacted05"].data.github_team.this["contributors"]: Still reading... [40s elapsed]
module.repo["redacted04"].data.github_team.this["admins"]: Still reading... [40s elapsed]
module.repo["redacted08"].data.github_team.this["contributors"]: Still reading... [40s elapsed]
module.repo["redacted02"].data.github_team.this["admins"]: Still reading... [40s elapsed]
module.repo["redacted07"].data.github_team.this["contributors"]: Still reading... [40s elapsed]
module.repo["redacted09"].data.github_team.this["contributors"]: Still reading... [20s elapsed]
module.repo["redacted04"].data.github_team.this["admins"]: Read complete after 42s [id=4519846]
module.repo["redacted12"].data.github_team.this["contributors"]: Reading...
module.repo["redacted06"].data.github_team.this["admins"]: Read complete after 44s [id=4519846]
module.repo["redacted13"].data.github_team.this["contributors"]: Reading...
module.repo["redacted10"].data.github_team.this["admins"]: Still reading... [10s elapsed]
module.repo["redacted05"].data.github_team.this["contributors"]: Read complete after 46s [id=4240487]
module.repo["redacted14"].data.github_team.this["admins"]: Reading...
module.repo["redacted05"].data.github_team.this["admins"]: Still reading... [10s elapsed]
module.repo["redacted08"].data.github_team.this["contributors"]: Read complete after 48s [id=4240487]
module.repo["redacted08"].data.github_team.this["admins"]: Reading...
module.repo["redacted11"].data.github_team.this["contributors"]: Still reading... [10s elapsed]
module.repo["redacted02"].data.github_team.this["admins"]: Still reading... [50s elapsed]
module.repo["redacted07"].data.github_team.this["contributors"]: Still reading... [50s elapsed]
module.repo["redacted02"].data.github_team.this["admins"]: Read complete after 50s [id=4519846]
module.repo["redacted15"].data.github_team.this["admins"]: Reading...

As mentioned in the issue description, the import script from whence this comparatively short redacted log snippet above was fetched ran for 6 hours before we cancelled it. Yes, restructuring the use of nested data blocks in child modules may solve some of this, but even taking that into account, the amount of wait-time per resource to import (which for module refactoring will be a great number) is untenable.

@andersthorbeck andersthorbeck changed the title Every terraform import command triggers a full state refresh – leads to module refactoring taking an inordinately long time Every terraform import command triggers reading all data sources – leads to module refactoring taking an inordinately long time Dec 15, 2022
@kmoe kmoe removed their assignment Dec 16, 2022
@kmoe
Copy link
Member

kmoe commented Dec 16, 2022

Thanks for the example - that makes sense. Leaving this issue open as it is well described.

@Tbohunek
Copy link

Hi @kmoe, any updates on when the terraform import -refresh=false might become available? Thanks :)

@mishabruml
Copy link

Hi @kmoe, any updates on when the terraform import -refresh=false might become available? Thanks :)

Yeah this would be very useful!

@crw
Copy link
Contributor

crw commented Feb 21, 2023

Thanks for your interest in this issue! This is just a reminder to please avoid "+1" comments, and to use the upvote mechanism (click or add the 👍 emoji to the original post) to indicate your support for this issue. Thanks again for the feedback!

As an addendum, I am not aware of a timeline for this issue to be resolved. Thanks again for your continued interest in this issue.

@openmonk
Copy link

openmonk commented May 26, 2023

Hitting the same problem with a given scenario:

terraform cloud markup project creates gcp project for other terraform projects to import into empty state, but that can not be done because TF tries to read secrets during import and we get errors because those secrets are not there.

@kieran-lowe
Copy link
Contributor

Hi @kmoe, @crw - any updates on when we could see this functionality added?

@crw
Copy link
Contributor

crw commented Jun 22, 2023

Please see the new feature released in 1.5,import blocks: https://developer.hashicorp.com/terraform/language/import. This will allow you to import many resources at the same time and only run the plan once, which should solve the performance problem as stated in the issue description. As such, the new import methodology will obviate this issue. Thanks very much for your feedback on this issue!

@crw crw closed this as completed Jun 22, 2023
@andersthorbeck
Copy link
Author

Fantastic and elegant solution! Tried it now, it works perfectly.
@crw Thanks for meeting this use case with a better solution than any of the ones I proposed!

@thatsk
Copy link

thatsk commented Jun 27, 2023

@andersthorbeck do we have any blog which shows with terraform 1.5 with your above example I am also seeing lagging in 1.4 but seems like this thread is saying that 1.5 solves issues so I want to try it too let me know the exact steps to follow to speed up ?

@thatsk
Copy link

thatsk commented Jun 27, 2023

I want to see a complex import block for multiple resources to be imported.

@andersthorbeck
Copy link
Author

andersthorbeck commented Jun 28, 2023

@thatsk The documentation was linked to in the comment which closed this issue: #32385 (comment).

Imports can now be done in the terraform configuration itself, via the new import block, as opposed to having to run a CLI command for every resource to be imported.

Importing multiple resources would be multiple import blocks, not a single import block.

@judithpatudith
Copy link
Contributor

In case it helps we have a tutorial showing how to import a resource, which covers some edge cases and exceptions: https://developer.hashicorp.com/terraform/tutorials/state/state-import

@onetwopunch
Copy link

Can the -refresh=false still be a thing though? There are several resources I've seen where the import block just doesn't work. Try it on any GCP IAM resource and the ID expects a condition_title, even if one doesn't exist and won't let you provide a blank one. The ability to prevent refresh on multiple import commands is useful even with the import block solution.

@apparentlymart
Copy link
Contributor

-refresh=false disables the refreshing of managed resources. Data resources don't get "refreshed" in the same sense: they must be read for every run, because otherwise Terraform wouldn't have any data for them. Data resources are for objects managed outside of Terraform which are therefore expected to change between runs.

(You may have noticed that Terraform does tend to leave behind results from reading data resources in the state after apply is complete, but those are there only to support unusual situations like debugging in terraform console. It isn't possible in general for Terraform to re-use data source results between runs because the object in the state might have been created by a different version of the provider and thus be unintelligble to the current version. Terraform is designed under the assumption that only managed resource data survives from one run to the next, whereas data resources are re-read fresh every time. Changing that assumption would be a much more significant undertaking than just changing the treatment of the -refresh=false option to also disable reading data sources.)

@kmoe
Copy link
Member

kmoe commented Sep 19, 2023

@onetwopunch, could you open a new issue (or two) for the following?

There are several resources I've seen where the import block just doesn't work

Which resources? It should work for all resources where terraform import works.

Try it on any GCP IAM resource and the ID expects a condition_title, even if one doesn't exist and won't let you provide a blank one

Does this only happen with the import block, or the terraform import command as well? Could you provide the config you tried?

@BarnabyShearer
Copy link

Whilst I understand that data objects need to be read or they have no data; I am not clear why they need data for an unrelated resource to be successfully imported? The common case of a CLI import should just check that the destination resource dose not have state; it doesn't use anything else from the existing state, plan or data resources?

@UntiIted
Copy link

UntiIted commented Nov 8, 2023

@kmoe this is not the same case as onetwopunch is referring to, but import can fail entirely because it tries to read data regardless of dependencies. Consider a scenario where a resource is created and later read as a data source (required to get IP from azurerm_public_ip). Terraform will try to immediately read on import and fail because obviously the resource and data source doesn't exist

@catalinmer
Copy link

For terraform import i noticed that newer version (1.4.7 in my case) has this behavior of conditioning the import operation to a successful plan. However this is not entirely the case in older versions of terraform, like v1.0.8. I was importing some vpc resources with lots of dependencies on subnets and routes and it just works in v1.0.8, while the exact same import fails in v1.4.7. Well my error was caused by "Invalid for_each argument".

I haven't tried to import all resources at once with v1.5+ because it is a complex single shot operation and I just prefer to see how the plan changes after one or few imports. And this is possible in v1.0.8

Might be a bit offtopic but if you're struggling with imports that fail when refreshing state, give it a try with an older terraform version.

@beatcracker
Copy link

As a workaround, you can comment/remove all resources except the ones you are trying to import. Worked for me in a pinch.

Copy link
Contributor

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 28, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement import Importing resources
Projects
None yet
Development

No branches or pull requests