Skip to content
This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

Missing dependency between DNS entries being created and running bootkube? #109

Closed
rata opened this issue Mar 5, 2020 · 6 comments
Closed
Labels
bug Something isn't working technical-debt Technical debt-related issues

Comments

@rata
Copy link
Member

rata commented Mar 5, 2020

On Packet (it might apply to other platforms) bootkube depends on this to start:

depends_on = [
module.bootkube,
null_resource.copy-controller-secrets,
null_resource.copy-assets-dir,

The DNS entries are created outside of the terraform module here:

{{- if .Config.DNS.Provider.Route53 }}
module "dns" {
source = "../lokomotive-kubernetes/dns/route53"
providers = {
aws = aws.default
}
entries = module.packet-{{.Config.ClusterName}}.dns_entries
aws_zone_id = "{{.Config.DNS.Provider.Route53.ZoneID}}"
}

While it is likely that the DNS entries are created before bootkube starts, as there is no dependency between them, I think it might happen that terraform starts bootkube before DNS entries are created. This might happen, for example, as terraform batches resource creation and DNS entries are delayed until other resources (let's say starting bootkube) is created.

I'm unsure of the effect of bootkube being started before DNS entries are created OR updated (if you are running a test cluster with the same name, DNS entries might just need to be updated). In the best case, this is not an issue (although I doubt bootkube start will finish correctly without being able to form the etcd cluster and, thus, starting the API server). But this might as well be an issue.

I'd like to check if this is an issue at all, first. But if it is, here are some ideas:

  • One hacky way to solve this might be for the controller code to take one "dummy" input and uses it in the "bootkube start", just to force a dependency there. Then, this input can be the output of the dns entries created
  • Another hacky way to fix this might be to run terraform apply -target for the dns entries first? This will create the controllers, but hopefully nothing else (i.e. not the "bootkube-start" null resource)
  • Not sure of other options? Rethink DNS creation?
@rata rata added bug Something isn't working needs/more-information Not enough information labels Mar 5, 2020
@rata
Copy link
Member Author

rata commented Mar 5, 2020

I think option number two (terrafrom apply -target to first create dns record before running bootkube) is used here ONLY if the provider is manual. If the provider is not manual, everything is applied in one go:

// If the provider isn't manual, apply everything in a single step.
if dnsProvider != dns.DNSManual {
return ex.Apply()
}

Therefore, if the provider is not manual, this race can exist.

If the provider is manual, the dns resources are created first:

// Get DNS entries (it forces the creation of the controller nodes).
arguments = append(arguments, fmt.Sprintf("-target=module.packet-%s.null_resource.dns_entries", c.ClusterName))
// Add worker nodes to speed things up.
for index := range c.WorkerPools {
arguments = append(arguments, fmt.Sprintf("-target=module.worker-pool-%d.packet_device.nodes", index))
}
// Create controller and workers nodes.
if err := ex.Execute(arguments...); err != nil {

Then, if all went fine, applies the rest:

// Finish deployment.
return ex.Apply()

I'd like to confirm my understanding is correct with someone. @invidian ? @mauriciovasquezbernal ?

rata added a commit that referenced this issue Mar 5, 2020
@rata
Copy link
Member Author

rata commented Mar 5, 2020

Actually, no, it affects both cases as the null resource is only created first. Sorry for the confusion (it's late, so maybe this message is wrong too :-P)

@invidian
Copy link
Member

invidian commented Mar 6, 2020

@rata I cannot find the GitHub reference to the discussion, but we discussed that with @mauriciovasquezbernal and we came to the conclusion, that bootkube start is resilient to the DNS propagation, as it will wait until the DNS records are available. And bootkube times out after 30 minutes, so if DNS records are not created by Terraform within 30 minutes, that's means something is wrong anyway. So I would say we are aware of that, but from extensive testing, that never caused some trouble.

I think with lokomotive-kubernetes merged into lokomotive repository, we could re-think the solution and make better use of apply -target to ensure proper sequence, even if we use Route53 DNS, yes.

BTW, did you hit some issue because of this setup?

Other thing is, I'd say it's rather low priority at the moment.

@mauriciovasquezbernal
Copy link
Member

Hi @rata, thanks for pointing this out. As confirmed by @invidian, we were aware of this at the moment of the implementation: kinvolk-archives/lokomotive-kubernetes#137 (comment).

We did some tests and found that it should not be a problem for bootkube as it waits some time for the entries to be ready (I did a manual experiment delaying the creation of DNS entries for 5 minutes and it was fine). I think it could be one of the things to improve when implementing the custom DNS support for all platforms.

@rata
Copy link
Member Author

rata commented Mar 6, 2020

@invidian

@rata I cannot find the GitHub reference to the discussion, but we discussed that with @mauriciovasquezbernal and we came to the conclusion, that bootkube start is resilient to the DNS propagation, as it will wait until the DNS records are available. And bootkube times out after 30 minutes, so if DNS records are not created by Terraform within 30 minutes, that's means something is wrong anyway. So I would say we are aware of that, but from extensive testing, that never caused some trouble.

Ohh, thanks for the context!

BTW, did you hit some issue because of this setup?

No, just need to change code that was doing the DNS manual implementation and thought: "hmm, this is broken or working by chance..." and opened the issue :-)

Other thing is, I'd say it's rather low priority at the moment.

Yeah, sure. Just want to keep track of the bugs we found. Won't solve it now unless is easier for what I need to do to solve :)

@mauriciovasquezbernal

We did some tests and found that it should not be a problem for bootkube as it waits some time for the entries to be ready (I did a manual experiment delaying the creation of DNS entries for 5 minutes and it was fine). I think it could be one of the things to improve when implementing the custom DNS support for all platforms.

Ohh, I see. Thanks a lot for the context and the links! Makes sense

@rata rata added technical-debt Technical debt-related issues and removed needs/more-information Not enough information labels Mar 6, 2020
@invidian invidian changed the title Packet: Missing dependency between DNS entries being created and running bootkube? Missing dependency between DNS entries being created and running bootkube? Nov 19, 2020
@iaguis
Copy link
Contributor

iaguis commented Jun 23, 2021

Closing since it didn't cause issues.

@iaguis iaguis closed this as completed Jun 23, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working technical-debt Technical debt-related issues
Projects
None yet
Development

No branches or pull requests

4 participants