Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Private zonal example #310

Merged

Conversation

ideasculptor
Copy link
Contributor

Creates an example that demonstrates creating a network and a private, zonal GKE cluster.

Fixes #308

Exposed #309

@ideasculptor
Copy link
Contributor Author

When running the new test manually, it became clear that the destroy process is not corectly reversing the dependency chain between the network module and the gke module, resulting in the subnet deletes happening before the cluster is fully torn down. This causes destroy to fail. I don't have handy example code for setting up a manual delay via a provisioner or some other mechanism to correctly reverse those dependencies during destroy, so this will probably fail running tests, but it is otherwise correct when I run it locally.

// depends_on link. Tests use terraform 0.12.6, which does not have regex or regexall
network = reverse(split("/", data.google_compute_subnetwork.subnetwork.network))[0]

subnetwork = data.google_compute_subnetwork.subnetwork.name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this a reference to the gcp-network outputs? That might help with the dependency graph issue.

Copy link
Contributor Author

@ideasculptor ideasculptor Nov 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. When I used a direct dependency, it would always fail the first time, because the cluster would come up before the network was ready. Those lines were written that way specifically to eliminate that problem. I had to try a number of ways to string those dependencies together to get it to stop erroring out when creating both the network and the cluster in the same terraform operation - the problem doesn't exist when they are run separately.

I can't remember which variations my 3am brain attempted - every failed attempt is a really long test cycle, so it's entirely possible that I was addled enough from lack of sleep to have missed something obvious. If memory serves, if I used the constant cidr range as a string and just injected it into ipv4_master_networks_config, that caused dependencies in the gke module to fire before the network was ready. I'm reasonably certain my first solution would have been to grab the first value of subnets_ranges from the gcp-network module outputs, but maybe I went straight from constant value to complex hack without first trying all the obvious middle steps, though I certainly tried a few.

Because the network name is passed to the gcp-network outputs directly from an input value, dependencies in the gke module were allowed to fire with the network name before the network existed even when using the module output, since terraform's dependency graph wouldn't wait for first module to complete before starting work on resources in the second. You can't use the depends_on thing directly with modules, so I had no way to make the dependency work correctly without modifying one or both modules to manually provide a depends_on link between resources in each module and I've just spent far too much time and energy on this already.

My quick solution was to use the data source as an intermediary, but since the gke module can't accept the network name as a fully qualified link, I had to parse the network name out of that data source's attributes. That corrected the dependency ordering on creation, but breaks the destroy ordering, since the data source doesn't act as a barrier to the network module being torn down, so it gets torn down before the GKE cluster has gone away.

Copy link
Contributor

@morgante morgante Nov 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Network name should come from the network outputs though, not directly passing through the input: https://github.com/terraform-google-modules/terraform-google-network/blob/master/main.tf#L19

It definitely seems like using the output from the network module should order things correctly. BTW, in case you weren't aware, you can print the Terraform DAG: https://www.terraform.io/docs/commands/graph.html

This should be much faster to debug than actually spinning up the infra.

Copy link
Contributor Author

@ideasculptor ideasculptor Nov 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This speaks to the lack of real-world integration examples for the modules. On the one hand, real-world production usage will almost never bring up the network and the GKE cluster in the same terraform plan, but on the other, integration examples always will, so if those examples existed, the modules would already have handled the outputs being directly linked to inputs and failing to result in a valid dependency graph, and someone other than a user would have spotted the need for some form of computed pass-through or provided a null_resource (and corresponding consumer in other modules), which would serve as a block.

The pattern I've used in the past is for every module to include a null_resource.completion resource with an output that depends on it and a set of triggers that includes all computed resources in the dependency chain. Additionally, every module can receive an input which is used only as a trigger for a different null_resource.block resource, with every other resource in the module including a depends_on = [null_resource.block]. Feeding the null_resource.completion-based output of the first module to the trigger of the null_resource.block resource in the dependent module guarantees that nothing in the dependent module can execute until everything in the dependency has completed. If no value is passed through, no blockage happens.

First module:

resource "null_resource" "continuation" {
  depends_on = [google_compute_resource1.name, google_compute_resource2.name, ...]
}

output "block_until_ready" {
  value = "${null_resource.continuation.id}"
}

Dependent Module:

variable "block_until_ready" {
  type = string
  default = null
}

resource "null_resource" "block_until_ready" {
  triggers {
    # Explicitly wait on the passed in block_until_ready variable
    block_until_ready = "${var.block_until_ready}"
  }
}

resource "google_some_resource" "my_dependent_resource" {
  some_val = var.some_val
...
  depends_on = ["null_resource.block_until_ready"]
}

Copy link
Contributor

@morgante morgante Nov 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not following the issue though. The linked code shows that the network output isn't directly linked to the input.

The pattern I've used in the past is for every module to include a null_resource.completion resource with an output that depends on it and a set of triggers that includes all computed resources in the dependency chain. Additionally, every module can receive an input which is used only as a trigger for a different null_resource.block resource, with every other resource in the module including a depends_on = [null_resource.block]. Feeding the null_resource.completion-based output of the first module to the trigger of the null_resource.block resource in the dependent module guarantees that nothing in the dependent module can execute until everything in the dependency has completed. If no value is passed through, no blockage happens.

This is definitely an interesting pattern. It's quite artificial but probably a good hack to include.

Copy link
Contributor Author

@ideasculptor ideasculptor Nov 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your constant assumption that I'm an idiot who doesn't know what he is doing, rather than someone with a great deal of experience with both terraform and infrastructure automation, encountering actual real-world problems that your simple examples and mimimal documentation just don't account for, has burned through pretty much all of the goodwill that was allowing me to spend so much of my spare time improving your product for free. Honestly, is there ANYTHING in any of my significant contributions so far that makes you think I don't read documentation before asking for assistance? Maybe don't assume that to be the case every time I raise an issue that is more significant than a typo.

I don't have the big picture of where you are going with those modules, but that's only because it hasn't been laid out anywhere public where I can see it, so far as I am aware. If you put modules out there, publicly acknowledging that they are built in-house by Google, you have to assume people are going to try to make use of them, as they surely represent something akin to common best practices. If you don't provide meaningful guidance about how to do them as such, and also refuse to respond helpfully when people encounter problems, you are going to burn down whatever community your open source efforts are trying to build.

Honestly, in the immediate wake of a nearly 72 hour unscheduled service outage, maybe consider the kind of frustrations I might be experiencing when I still can't get things to work, and take a few minutes from your obviously very busy schedule to give an assist to someone who has shown no hesitation to devote significant time and energy into giving your team a helping hand?...

Copy link
Contributor Author

@ideasculptor ideasculptor Nov 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An hour or less of a single engineer's time doesn't seem like a huge sacrifice to make in exchange for the 100 or so hours that I have devoted just to correcting problems in these modules - the vast majority of which had nothing whatsoever to do with any kind of non-standard or mistaken use.

And maybe consider that if someone who is clearly pretty darn fluent in this stack is having difficulty intuiting your intent, someone with minimal terraform or GCP experience, looking to get a leg up on their own automation efforts by exploring the official module library, is going to have a much more difficult time of it, and that maybe some documentation or assistance is in order, rather than public shaming.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your constant assumption that I'm an idiot who doesn't know what he is doing, rather than someone with a great deal of experience with both terraform and infrastructure automation, encountering actual real-world problems that your simple examples and mimimal documentation just don't account for, has burned through pretty much all of the goodwill that was allowing me to spend so much of my spare time improving your product for free. Honestly, is there ANYTHING in any of my significant contributions so far that makes you think I don't read documentation before asking for assistance? Maybe don't assume that to be the case every time I raise an issue that is more significant than a typo.

I'm sorry you feel that way — I certainly think you're very skilled and appreciate all the work you've been doing. It's hugely helpful!

I don't have the big picture of where you are going with those modules, but that's only because it hasn't been laid out anywhere public where I can see it, so far as I am aware. If you put modules out there, publicly acknowledging that they are built in-house by Google, you have to assume people are going to try to make use of them, as they surely represent something akin to common best practices. If you don't provide meaningful guidance about how to do them as such, and also refuse to respond helpfully when people encounter problems, you are going to burn down whatever community your open source efforts are trying to build.

Agreed, I know we're not doing a good enough job here. In terms of our goals, this page is probably the best we have.

We absolutely do need to provide more and better end-to-end examples, and it's on our radar for what to build next. Sorry if my responses are too brusque thus far, I definitely recognize that need.

Honestly, in the immediate wake of a nearly 72 hour unscheduled service outage, maybe consider the kind of frustrations I might be experiencing when I still can't get things to work, and take a few minutes from your obviously very busy schedule to give an assist to someone who has shown no hesitation to devote significant time and energy into giving your team a helping hand?...

You're right: as you can imagine, there have been a lot of fires lately but we should still appreciate our contributors.

Copy link
Contributor Author

@ideasculptor ideasculptor Nov 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can promise you, when I used a direct link between the one module and the other, it errored out on the first run (and succeeded on the 2nd, if no destroy happened first) every single time. Feel free to try it. If I'm wrong, and it was certainly hour 17 or 18 of my work day at that point, it's a simple enough change to make. I've contributed my last PR for a good long while.

Copy link
Contributor

@morgante morgante Nov 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll give it a shot! It's just surprising that this doesn't work considering how it's coded but I certainly believe you've tried it.

@morgante morgante merged commit 2124137 into terraform-google-modules:master Nov 20, 2019
CPL-markus pushed a commit to WALTER-GROUP/terraform-google-kubernetes-engine that referenced this pull request Jul 15, 2024
…vate_zonal_example

Private zonal example
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide example of configuring Private Google Access
2 participants