Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASM Module Fails to Apply #626

Closed
PsychoSid opened this issue Aug 13, 2020 · 10 comments
Closed

ASM Module Fails to Apply #626

PsychoSid opened this issue Aug 13, 2020 · 10 comments
Assignees
Labels
triaged Scoped and ready for work waiting-response Waiting for issue author to respond.

Comments

@PsychoSid
Copy link

Every night I tear down my deployment and bring it up the following day (the names remain). I mention this as it might be due to previous credentials

Every day since the update to v0.11 modules the ASM module doesn't complete correctly.

The initial run fails with:-

module.asm.module.gke_hub_registration.null_resource.run_command[0] (local-exec): kubeconfig entry generated for anthos-gke.
module.asm.module.gke_hub_registration.null_resource.run_command[0] (local-exec): Waiting for membership to be created...
module.asm.module.gke_hub_registration.null_resource.run_command[0] (local-exec): .....done.
module.asm.module.gke_hub_registration.null_resource.run_command[0] (local-exec): Created a new membership [projects/<myproject>/locations/global/memberships/gke-asm-membership] for the cluster [gke-asm-membership]
module.asm.module.gke_hub_registration.null_resource.run_command[0]: Still creating... [10s elapsed]
module.asm.module.gke_hub_registration.null_resource.run_command[0]: Still creating... [20s elapsed]
module.asm.module.gke_hub_registration.null_resource.run_command[0] (local-exec): Error in installing the Connect Agent: Failed to apply Membership CR to cluster: error: error when retrieving current configuration of:
module.asm.module.gke_hub_registration.null_resource.run_command[0] (local-exec): Resource: "hub.gke.io/v1, Resource=memberships", GroupVersionKind: "hub.gke.io/v1, Kind=Membership"
module.asm.module.gke_hub_registration.null_resource.run_command[0] (local-exec): Name: "membership", Namespace: ""
module.asm.module.gke_hub_registration.null_resource.run_command[0] (local-exec): from server for: "STDIN": Get "https://34.89.229.66/apis/hub.gke.io/v1/memberships/membership?timeout=20s": dial tcp 34.89.229.66:443: connect: connection refused

An immediate attempt to re-apply also fails:-

Error: Error running command 'PATH=/google-cloud-sdk/bin:$PATH
.terraform/modules/asm/terraform-google-kubernetes-engine-11.0.0/modules/asm/scripts/gke_hub_registration.sh gke-asm-membership europe-west3-b anthos-gke ewogICJ0eXBlIjogInNlcnZpY2VfYWNjb3VudCIsCiAgInByb2plY3RfaWQiOiAiZ2Z0LWFudGhvcy1kZW1vIiwKICAicHJpdmF0ZV9rZXlfaWQiOiAiODU0Y2I0ZjI2NTAxYjFiYmFkMDI2YzVjODRjZDkxNDA0ZTUzODg3ZiIsCiAgInByaXZhdGVfa2V5IjogIi0tLS0tQkVHSU4gUFJJVkFURSBLRVktLS0tLVxuTUlJRXZRSUJBREFOQmdrcWhraUc5dzBCQVFFRkFBU0NCS2N3Z2dTakFnRUFBb0lCQVFDNFd6Wk5wT3hsTUxYUlxuQ05CNzVLei83enYzRkl5ZVZ5dnVKU0NHTHEzTlVHU1VTK2s2Z01hZzdQaWx5b1BsYVJIenZHclhBNlM2NXVxOFxuZkxIUjhScFR1eXJMT1pabkcwRUNDVklhRVBqby90OFJqQVNBMjNBeEcvSkNXdjcyWVNyNFB0Zk10T1ExNVpnc1xuT1V0RUYrSnowVmtGZTk0TUEza0xVOCt3bzVGMExQdE9Od3hlSHd4ckhqYjc4T0x1TnJXN2JZTm1MTWJSYjd3aVxud3c4TGNiWjlQVGJkMzBySTQvQ04zL0xjVVBpdDBnRktod3NjUzdPR21kbzZtVFQrQXpVVmNqd28yN3R3em9uM1xua25lSmVXdE9PS3dIWlQ0MzJ1eGlXNGc5ZUh3cS9uVll3T3RVQnlpZGVJTnA4NFZNOFVNWU9QL0VOdTdWUkNHTFxubGJOTGR0QjNBZ01CQUFFQ2dnRUFGVGhPZ2xacTdXVFRjTG1pZ2JnN0g0Unljd2kyL056TXppOFExTkVYcVV2SlxuSC9heTVFeUJVSEdtVnpMOXhwQzNBOGFheDZBQVBKRXEwTUpMbDM0NGlRM0FxYjY1ckttSzdJaVZIakg0N0p5MVxub1dmcjlzY0xYV081bVdDdSt6NEkrNlVFSXVocFlqaklzTUp4Z3VkNjVkamhkY3VocmVGU2MvYlVMNkZNTTBKdVxuRmFLN01hOE5VVDZvbHdKUysyTy9acHU3L0JsVUZnS1Y3cjcyZ3NQV2dyOHBFQlZwbzZqd0hMNSthTTdwSkdVV1xuUUdzcUtuc25yUE9UU2dZcHZmRHFua2s2dzF2S3J1L2QxaDZIRmgrRGFUQ1dnT00xZnp0OGpRblpkVXRaaEUyalxuWFVka0tvUjJkb01hVXovUVNlOGk3RWJoM2lHR1ZqYmZYbmEvN1ZhQW9RS0JnUURvSDA2UngySVRZcWhhK0kwNVxuc1p0UVlwOUhZc3o1MUtNanlzRlVEWGNXWlJuanQwM01hazl6S2lBdTExdmUwT1FpMHUyNWhXeENzNC9ibVorYlxucDhZMlpJaC9LTnV5SjFwa2gyLyt2Nyt1eGw2aTZaWWkxeVVNVTczS2czSXM4OUxiSjdzaXN4RTA1SWgwY28zRlxuSjVXY1VvY1IxckttdFdtRW95Uzh2TkF0bHdLQmdRRExVZ25sSnY1dnJwU1QyOG5QYk01WGVNVHFheVF6czZrRFxudmRZM0VyeCt5R0dqZ1FianJ6WG1SL3hFN3hNUmtuYjNyb3VBOFZxRjZZd2lDY1hFMXdXYnhyanZtS2xneEhqSlxuU0xBbVZZcGdmT3lYb2pSL0JIQUtqOHlCZDY5d093UlFxNzdLV0h3bmxBZVFwSTNDcW5GS2MzQ0hZMkhBbi9VSFxubDhtQjFHU1FJUUtCZ0FPbTBtNmFxMkZRc1FOVlc2dG5ydURSM0YyY0lVdGcyL3dwS1dkd0dzcUFacXJkYkZ6bFxuNDBBVmpwSU9FMFRyRmx1eDQ4bUNYdFNoeklhUTRTRHF2OFNGU2x4dHdSOEpYWE90YWNhaEw1dkpSUjNjL042cFxuY2N6QzJINkNHTjY2S3p3RllQMUh3ZUtLSWRkckllM1RGSmh6OStvQTdhaVB2QXc1SU0vVFRKY2JBb0dBT2c0Rlxue
DBtNEV2bWRjUTZyOUM5VVI4T3pMbUJEYVFQZXViUEY2OW5NdXNMS3BsNGNNbnovL2U0R0NVVGVoQUQzT1VlOFxuaTZmZXVpZnluSEYvNS9HaHAxWTV6aWdnRFFKc25zVERqMTZUY1hPYU5yM1pFWCtaNGxvbkFiekQrbDdQbjUwNlxua0JwdDhQc2lvZGxxcE8vNEExTXRDV3VHS1BORGl0UjdkRGZLTXlFQ2dZRUFoVmtkYzFNNGFzdWUyYnYyWTQzSFxuSGN6MVdpM1R6NFVxK0N6dkdkRVQ4Qkg1ZzdLTEVqd1o1NkpMZWF1L0dMRVZlOWpRMkJkOWQyTjI3ckV1aFFoN1xueStJcTFRbHZWc0NPeHk5WGdCWXcrSnRMZmxIREFLQ2dPZnF3TElRNU43VVhYTUYvQXR6aHdvcWR2N2g5Z054ZFxuOVFzdmpHN3kwdkxnV2F4bUZQOTNScGc9XG4tLS0tLUVORCBQUklWQVRFIEtFWS0tLS0tXG4iLAogICJjbGllbnRfZW1haWwiOiAiZ2tlLWh1Yi1zYUBnZnQtYW50aG9zLWRlbW8uaWFtLmdzZXJ2aWNlYWNjb3VudC5jb20iLAogICJjbGllbnRfaWQiOiAiMTA3NDc2NDU1MTI3NDY2MDI1NTk4IiwKICAiYXV0aF91cmkiOiAiaHR0cHM6Ly9hY2NvdW50cy5nb29nbGUuY29tL28vb2F1dGgyL2F1dGgiLAogICJ0b2tlbl91cmkiOiAiaHR0cHM6Ly9vYXV0aDIuZ29vZ2xlYXBpcy5jb20vdG9rZW4iLAogICJhdXRoX3Byb3ZpZGVyX3g1MDlfY2VydF91cmwiOiAiaHR0cHM6Ly93d3cuZ29vZ2xlYXBpcy5jb20vb2F1dGgyL3YxL2NlcnRzIiwKICAiY2xpZW50X3g1MDlfY2VydF91cmwiOiAiaHR0cHM6Ly93d3cuZ29vZ2xlYXBpcy5jb20vcm9ib3QvdjEvbWV0YWRhdGEveDUwOS9na2UtaHViLXNhJTQwZ2Z0LWFudGhvcy1kZW1vLmlhbS5nc2VydmljZWFjY291bnQuY29tIgp9Cg==
': exit status 1. Output: kubeconfig entry generated for anthos-gke.
ERROR: (gcloud.container.hub.memberships.register) Failed to check if the user is a cluster-admin: The connection to the server 34.89.229.66 was refused - did you specify the right host or port?

If I then run gcloud..get-credentials and re-apply everything is good.

Pretty sure the update doc I followed. Any ideas please, thanks.

@morgante
Copy link
Contributor

Interesting, looks like we might need to add a timeout between cluster creation and attempting to add to the hub. /cc @bharathkkb

@bharathkkb bharathkkb self-assigned this Aug 13, 2020
@bharathkkb
Copy link
Member

Hi @PsychoSid
What gcloud version are you on?

@PsychoSid
Copy link
Author

v305 which is the latest I believe

@PsychoSid
Copy link
Author

I looked at this again this morning when bringing up my cluster. It seemingly does need a wait as the cluster is "RECONCILING" if I wait until it's in "RUNNING" before rerunning the apply then it goes through just fine.

Thanks.

@bharathkkb
Copy link
Member

@PsychoSid I think it makes sense to wait for the cluster to reconcile before we proceed. We can probably target this once we have #611. I have also noticed that with smaller cluster sizes ASM installs tends to force a master reconciliation which might be why it enters in RECONCILING before creating the hub membership.

I tried a apply - destroy - apply cycle with this example which seemed to work, but happy to debug further if you can provide your config.

@PsychoSid
Copy link
Author

Thanks it's a 100% reproducible for me with my config/setup (it didn't happen with v0.10 - although v0.11 fixed my destroy issue !) I haven't included the .tfvars, or the backend type stuff here.
issue626.txt

Thanks

@bharathkkb
Copy link
Member

@PsychoSid We encountered something similar with ACM today where master was unavailable for around ~1m after the CRDs where applied producing a very similar dial tcp endpoint:443: connect: connection refused error.

I think having some kind of precondition check to make sure endpoint is available and if not a retry mechanism with a backoff might be the best approach. Happy to hear any thoughts or other ideas.

@bharathkkb
Copy link
Member

Hi @PsychoSid
I wanted to follow up regarding this. We had a regression fixed by #669 where we were not waiting for cluster to be ready, so I wanted to confirm if you were still seeing this with the latest on main.

@bharathkkb bharathkkb added the waiting-response Waiting for issue author to respond. label Oct 14, 2020
@PsychoSid
Copy link
Author

Hi @bharathkkb I haven't as I tend to use the module registry paths for sources. But will do. Many thanks.

@bharathkkb
Copy link
Member

Closing this out as it should be fixed by #669
Feel free to reopen if needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Scoped and ready for work waiting-response Waiting for issue author to respond.
Projects
None yet
Development

No branches or pull requests

3 participants