Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent data sources from being aplied early #10670

Merged
merged 1 commit into from
Dec 12, 2016
Merged

Conversation

jbardin
Copy link
Member

@jbardin jbardin commented Dec 12, 2016

If a data source has explicit dependencies in depends_on, we can
assume the user has added those because of a dependency not tracked
directly in the config. If there are any entries in depends_on, don't
apply the data source early during Refresh.

Fixes #10603

If a data source has explicit dependencies in `depends_on`, we can
assume the user has added those because of a dependency not tracked
directly in the config. If there are any entries in `depends_on`, don't
apply the data source early during Refresh.
@apparentlymart
Copy link
Contributor

apparentlymart commented Dec 12, 2016

Having the depends_on mechanism just disable the early refresh mechanism entirely is stronger than I was expecting here, but I suppose it does make some sense.

This doesn't feel 100% right but I'm honestly having some trouble deciding if the problems I'm thinking about are truly issues or just hypotheticals. Curious as to what you think:

  • Consider the case where both the managed and data resource have already been created on a previous run and are now present in the state. The managed resource will get refreshed during walkRefresh, but with this change the data resource will not, even though the data it needs should, in principle, be available.
  • There are a few sub-cases of that case:
    • After refreshing, the managed resource has a "forces new resource" diff. In this case it seems pretty cut-and-dry that the data resource should be deferred until apply, so that the replacement happens first.
    • After refreshing, the managed resource has no diff at all. In this case we could in principle deal with the data resource during walkRefresh, except that during that walk we don't actually know whether the managed resource will have a diff.
    • After refreshing, the managed resource has a simple update diff that doesn't affect any of the attributes that get interpolated into the data resource. This one is the trickiest, since in principle the data source could be interacting with the managed resource "behind the scenes" and thus have a dependency that Terraform can't see.

The main reason this feels a little "off", I guess, is that were we to have a managed resource depending on another managed resource we would still refresh both of them during planRefresh, rather than waiting until apply time to read one of them. This, on the other hand, treats refreshing the data source as if it were an "apply" sort of thing rather than a "read" sort of thing.

With all of that said (mainly just thinking aloud; sorry!) it seems like what you did here should be fine but might be too conservative? But maybe we just start here and work backwards to being less conservative as real use-cases emerge...

@jbardin
Copy link
Member Author

jbardin commented Dec 12, 2016

Thanks @apparentlymart!

I see where you're going, and I think those are valid points in a world where we want to resolve all data resources as early as possible. However, I think when a field is added to depends_on in the config, it throws out all that nuance because by definition we can't really know what the actual dependency is that the user is trying to establish. All we can determine is that the state we want is guaranteed to be set after that dependency is applied.

I guess this does sort of turn the data resource back into a managed resource that is "applied", but I don't think there's a way around that if we don't know why the depends_on was added to the config in the first place. The more I think about this, the more I think there really is no middle ground, and depends_on can only be a binary switch between "early" and "late" binding of the data.

@mitchellh
Copy link
Contributor

I think this hammer is probably okay for now, but I can think of some heuristics we can introduce in the future to make this better. Namely: we can add helpers around DependsOn to give us rich type information around WHAT is being depended (whether it exists or not not our problem since that'll crop up in validation). Then, we can do this behavior only if we depend on a managed resource. Yeah?

Regardless, I think this issue makes something work that didn't work before and doesn't break anything that was already working [probably, or maybe in a very specific edge case above]. So, it is probably worthwhile.

@jbardin
Copy link
Member Author

jbardin commented Dec 12, 2016

Yeah, if we could introspect into the depends_on and determine if it won't have a diff during apply, then we could still refresh early. I this point though, I'd ask if that really matters? Since the dependency could require the data refresh be delayed until apply, it means the config has to work with that ordering anyway, so is there any benefit to sometimes short-circuiting that?

@apparentlymart
Copy link
Contributor

FWIW, fine by me to push ahead with this for now and await real-world use-cases for making it more clever. As I say, I was mainly just thinking aloud trying to figure out why this felt "weird".

@ghost
Copy link

ghost commented Apr 18, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 18, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

depends_on for data_source
3 participants