-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/asset/machines/aws: Only return available zones #1210
pkg/asset/machines/aws: Only return available zones #1210
Conversation
Zones can also be impaired, unavailable, or "information" [1,2]. I dunno what that last one is about. But we only want to put machines in available zones ;). This is a bit racy, because we could get different responses at both call-sites. Moving forward, we'll want to shift this request into a single available-zones asset. [1]: https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_DescribeAvailabilityZones.html [2]: https://www.terraform.io/docs/providers/aws/d/availability_zones.html#state
/retest |
All green, when you have a moment @abhinavdahiya, @staebler. |
Do we have an PR open or tracking issue... |
#1121 will get us closer. I can file a ticket for what will still be left after that later tonight. |
But "zone goes offline mid-asset-graph" is probably going to break automatic zone selection even once we're down to a single zone-listing call (we'll just be internally consistent when we die ;). |
Ok, I think #1045, #1121, and #1212 together will address the consolidation to a single available-zones request in Go. I still think we want this PR to get the filtering in Go (which we'll want forever) and Terraform (which we'll need until #1045 and #1121 together let us drop that Terraform zone request). |
I don't think we are quite there yet for dropping the Terraform zone request even after #1045 and #1121. I would like to see us send from Go to Terraform the union of the zones used in all of the machine pools. Then, we can get rid of the Terraform zone request. |
Do we need to modify the Go fetch of availability zones as well to only return the zones that have a state of "available"? |
…-release:4.0.0-0.6 Clayton pushed 4.0.0-0.nightly-2019-02-27-213933 to quay.io/openshift-release-dev/ocp-release:4.0.0-0.6. Extracting the associated RHCOS build: $ oc adm release info --pullspecs quay.io/openshift-release-dev/ocp-release:4.0.0-0.6 | grep machine-os-content machine-os-content registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-02-27-213933@sha256:1262533e31a427917f94babeef2774c98373409897863ae742ff04120f32f79b $ oc image info registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-02-26-125216@sha256:1262533e31a427917f94babeef2774c98373409897863ae742ff04120f32f79b | grep version version=47.330 that's the same machine-os-content image referenced from 4.0.0-0.5, which we used for installer v0.13.0. Renaming OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE gets us CI testing of the pinned release despite openshift/release@60007df2 (Use RELEASE_IMAGE_LATEST for CVO payload, 2018-10-03, openshift/release#1793). Also comment out regions which this particular RHCOS build wasn't pushed to, leaving only: $ curl -s https://releases-rhcos.svc.ci.openshift.org/storage/releases/maipo/47.330/meta.json | jq -r '.amis[] | .name' ap-northeast-1 ap-northeast-2 ap-south-1 ap-southeast-1 ap-southeast-2 ca-central-1 eu-central-1 eu-west-1 eu-west-2 eu-west-3 sa-east-1 us-east-1 us-east-2 us-west-1 us-west-2 I'd initially expected to export the pinning environment variables in release.sh, but I've put them in build.sh here because our continuous integration tests use build.sh directly and don't go through release.sh. Using the slick, new change-log generator from [1], here's everything that changed in the update payload: $ oc adm release info --changelog ~/.local/lib/go/src --changes-from quay.io/openshift-release-dev/ocp-release:4.0.0-0.5 quay.io/openshift-release-dev/ocp-release:4.0.0-0.6 # 4.0.0-0.6 Created: 2019-02-28 20:40:11 +0000 UTC Image Digest: `sha256:5ce3d05da3bfa3d0310684f5ac53d98d66a904d25f2e55c2442705b628560962` Promoted from registry.svc.ci.openshift.org/ocp/release:4.0.0-0.nightly-2019-02-27-213933 ## Changes from 4.0.0-0.5 ### Components * Kubernetes 1.12.4 ### New images * [pod](https://github.com/openshift/images) git [2f60da39](openshift/images@2f60da3) `sha256:c0d602467dfe0299ce577ba568a9ef5fb9b0864bac6455604258e7f5986d3509` ### Rebuilt images without code change * [cloud-credential-operator](https://github.com/openshift/cloud-credential-operator) git [01bbf372](openshift/cloud-credential-operator@01bbf37) `sha256:f87be09923a5cb081722634d2e0c3d0a5633ea2c23da651398d4e915ad9f73b0` * [cluster-autoscaler](https://github.com/openshift/kubernetes-autoscaler) git [d8a4a304](openshift/kubernetes-autoscaler@d8a4a30) `sha256:955413b82cf8054ce149bc05c18297a8abe9c59f9d0034989f08086ae6c71fa6` * [cluster-autoscaler-operator](https://github.com/openshift/cluster-autoscaler-operator) git [73c46659](openshift/cluster-autoscaler-operator@73c4665) `sha256:756e813fce04841993c8060d08a5684c173cbfb61a090ae67cb1558d76a0336e` * [cluster-bootstrap](https://github.com/openshift/cluster-bootstrap) git [05a5c8e6](openshift/cluster-bootstrap@05a5c8e) `sha256:dbdd90da7d256e8d49e4e21cb0bdef618c79d83f539049f89f3e3af5dbc77e0f` * [cluster-config-operator](https://github.com/openshift/cluster-config-operator) git [aa1805e7](openshift/cluster-config-operator@aa1805e) `sha256:773d3355e6365237501d4eb70d58cd0633feb541d4b6f23d6a5f7b41fd6ad2f5` * [cluster-dns-operator](https://github.com/openshift/cluster-dns-operator) git [ffb04ae9](openshift/cluster-dns-operator@ffb04ae) `sha256:ca15f98cc1f61440f87950773329e1fdf58e73e591638f18c43384ad4f8f84da` * [cluster-machine-approver](https://github.com/openshift/cluster-machine-approver) git [2fbc6a6b](openshift/cluster-machine-approver@2fbc6a6) `sha256:a66af3b1f4ae98257ab600d54f8c94f3a4136f85863bbe0fa7c5dba65c5aea46` * [cluster-node-tuned](https://github.com/openshift/openshift-tuned) git [278ee72d](openshift/openshift-tuned@278ee72) `sha256:ad71743cc50a6f07eba013b496beab9ec817603b07fd3f5c022fffbf400e4f4b` * [cluster-node-tuning-operator](https://github.com/openshift/cluster-node-tuning-operator) git [b5c14deb](openshift/cluster-node-tuning-operator@b5c14de) `sha256:e61d1fdb7ad9f5fed870e917a1bc8fac9ccede6e4426d31678876bcb5896b000` * [cluster-openshift-controller-manager-operator](https://github.com/openshift/cluster-openshift-controller-manager-operator) git [3f79b51b](openshift/cluster-openshift-controller-manager-operator@3f79b51) `sha256:8f3b40b4dd29186975c900e41b1a94ce511478eeea653b89a065257a62bf3ae9` * [cluster-svcat-apiserver-operator](https://github.com/openshift/cluster-svcat-apiserver-operator) git [547648cb](openshift/cluster-svcat-apiserver-operator@547648c) `sha256:e7c9323b91dbb11e044d5a1277d1e29d106d92627a6c32bd0368616e0bcf631a` * [cluster-svcat-controller-manager-operator](https://github.com/openshift/cluster-svcat-controller-manager-operator) git [9261f420](openshift/cluster-svcat-controller-manager-operator@9261f42) `sha256:097a429eda2306fcd49e14e4f5db8ec3a09a90fa29ebdbc98cc519511ab6fb5b` * [cluster-version-operator](https://github.com/openshift/cluster-version-operator) git [70c0232e](openshift/cluster-version-operator@70c0232) `sha256:7d59edff68300e13f0b9e56d2f2bc1af7f0051a9fbc76cc208239137ac10f782` * [configmap-reloader](https://github.com/openshift/configmap-reload) git [3c2f8572](openshift/configmap-reload@3c2f857) `sha256:32360c79d8d8d54cea03675c24f9d0a69877a2f2e16b949ca1d97440b8f45220` * [console-operator](https://github.com/openshift/console-operator) git [32ed7c03](openshift/console-operator@32ed7c0) `sha256:f8c07cb72dc8aa931bbfabca9b4133f3b93bc96da59e95110ceb8c64f3efc755` * [container-networking-plugins-supported](https://github.com/openshift/ose-containernetworking-plugins) git [f6a58dce](openshift/ose-containernetworking-plugins@f6a58dc) `sha256:c6434441fa9cc96428385574578c41e9bc833b6db9557df1dd627411d9372bf4` * [container-networking-plugins-unsupported](https://github.com/openshift/ose-containernetworking-plugins) git [f6a58dce](openshift/ose-containernetworking-plugins@f6a58dc) `sha256:bb589cf71d4f41977ec329cf808cdb956d5eedfc604e36b98cfd0bacce513ffc` * [coredns](https://github.com/openshift/coredns) git [fbcb8252](openshift/coredns@fbcb825) `sha256:2f1812a95e153a40ce607de9b3ace7cae5bee67467a44a64672dac54e47f2a66` * [docker-builder](https://github.com/openshift/builder) git [1a77d837](openshift/builder@1a77d83) `sha256:27062ab2c62869e5ffeca234e97863334633241089a5d822a19350f16945fbcb` * [etcd](https://github.com/openshift/etcd) git [a0e62b48](openshift/etcd@a0e62b4) `sha256:e4e9677d004f8f93d4f084739b4502c2957c6620d633e1fdb379c33243c684fa` * [grafana](https://github.com/openshift/grafana) git [58efe0eb](openshift/grafana@58efe0e) `sha256:548abcc50ccb8bb17e6be2baf050062a60fc5ea0ca5d6c59ebcb8286fc9eb043` * [haproxy-router](https://github.com/openshift/router) git [2c33f47f](openshift/router@2c33f47) `sha256:c899b557e4ee2ea7fdbe5c37b5f4f6e9f9748a39119130fa930d9497464bd957` * [k8s-prometheus-adapter](https://github.com/openshift/k8s-prometheus-adapter) git [815fa76b](openshift/k8s-prometheus-adapter@815fa76) `sha256:772c1b40b21ccaa9ffcb5556a1228578526a141b230e8ac0afe19f14404fdffc` * [kube-rbac-proxy](https://github.com/openshift/kube-rbac-proxy) git [3f271e09](openshift/kube-rbac-proxy@3f271e0) `sha256:b6de05167ecab0472279cdc430105fac4b97fb2c43d854e1c1aa470d20a36572` * [kube-state-metrics](https://github.com/openshift/kube-state-metrics) git [2ab51c9f](openshift/kube-state-metrics@2ab51c9) `sha256:611c800c052de692c84d89da504d9f386d3dcab59cbbcaf6a26023756bc863a0` * [libvirt-machine-controllers](https://github.com/openshift/cluster-api-provider-libvirt) git [7ff8b08f](openshift/cluster-api-provider-libvirt@7ff8b08) `sha256:6ab8749886ec26d45853c0e7ade3c1faaf6b36e09ba2b8a55f66c6cc25052832` * [multus-cni](https://github.com/openshift/ose-multus-cni) git [61f9e088](https://github.com/openshift/ose-multus-cni/commit/61f9e0886370ea5f6093ed61d4cfefc6dadef582) `sha256:e3f87811d22751e7f06863e7a1407652af781e32e614c8535f63d744e923ea5c` * [oauth-proxy](https://github.com/openshift/oauth-proxy) git [b771960b](openshift/oauth-proxy@b771960) `sha256:093a2ac687849e91671ce906054685a4c193dfbed27ebb977302f2e09ad856dc` * [openstack-machine-controllers](https://github.com/openshift/cluster-api-provider-openstack) git [c2d845b](openshift/cluster-api-provider-openstack@c2d845b) `sha256:f9c321de068d977d5b4adf8f697c5b15f870ccf24ad3e19989b129e744a352a7` * [operator-registry](https://github.com/operator-framework/operator-registry) git [0531400c](operator-framework/operator-registry@0531400) `sha256:730f3b504cccf07e72282caf60dc12f4e7655d7aacf0374d710c3f27125f7008` * [prom-label-proxy](https://github.com/openshift/prom-label-proxy) git [46423f9d](openshift/prom-label-proxy@46423f9) `sha256:3235ad5e22b6f560d447266e0ecb2e5655fda7c0ab5c1021d8d3a4202f04d2ca` * [prometheus](https://github.com/openshift/prometheus) git [6e5fb5dc](openshift/prometheus@6e5fb5d) `sha256:013455905e4a6313f8c471ba5f99962ec097a9cecee3e22bdff3e87061efad57` * [prometheus-alertmanager](https://github.com/openshift/prometheus-alertmanager) git [4617d550](openshift/prometheus-alertmanager@4617d55) `sha256:54512a6cf25cf3baf7fed0b01a1d4786d952d93f662578398cad0d06c9e4e951` * [prometheus-config-reloader](https://github.com/openshift/prometheus-operator) git [f8a0aa17](openshift/prometheus-operator@f8a0aa1) `sha256:244fc5f1a4a0aa983067331c762a04a6939407b4396ae0e86a1dd1519e42bb5d` * [prometheus-node-exporter](https://github.com/openshift/node_exporter) git [f248b582](openshift/node_exporter@f248b58) `sha256:390e5e1b3f3c401a0fea307d6f9295c7ff7d23b4b27fa0eb8f4017bd86d7252c` * [prometheus-operator](https://github.com/openshift/prometheus-operator) git [f8a0aa17](openshift/prometheus-operator@f8a0aa1) `sha256:6e697dcaa19e03bded1edf5770fb19c0d2cd8739885e79723e898824ce3cd8f5` * [service-catalog](https://github.com/openshift/service-catalog) git [b24ffd6f](openshift/service-catalog@b24ffd6) `sha256:85ea2924810ced0a66d414adb63445a90d61ab5318808859790b1d4b7decfea6` * [service-serving-cert-signer](https://github.com/openshift/service-serving-cert-signer) git [30924216](openshift/service-serving-cert-signer@3092421) `sha256:7f89db559ffbd3bf609489e228f959a032d68dd78ae083be72c9048ef0c35064` * [telemeter](https://github.com/openshift/telemeter) git [e12aabe4](openshift/telemeter@e12aabe) `sha256:fd518d2c056d4ab8a89d80888e0a96445be41f747bfc5f93aa51c7177cf92b92` ### [aws-machine-controllers](https://github.com/openshift/cluster-api-provider-aws) * client: add cluster-api-provider-aws to UserAgent for AWS API calls [openshift#167](openshift/cluster-api-provider-aws#167) * Drop the yaml unmarshalling [openshift#155](openshift/cluster-api-provider-aws#155) * [Full changelog](openshift/cluster-api-provider-aws@46f4852...c0c3b9e) ### [cli, deployer, hyperkube, hypershift, node, tests](https://github.com/openshift/ose) * Build OSTree using baked SELinux policy [#22081](https://github.com/openshift/ose/pull/22081) * NodeName was being cleared for `oc debug node/X` instead of set [#22086](https://github.com/openshift/ose/pull/22086) * UPSTREAM: 73894: Print the involved object in the event table [#22039](https://github.com/openshift/ose/pull/22039) * Publish CRD openapi [#22045](https://github.com/openshift/ose/pull/22045) * UPSTREAM: 00000: wait for CRD discovery to be successful once before [#22149](https://github.com/openshift/ose/pull/22149) * `oc adm release info --changelog` should clone if necessary [#22148](https://github.com/openshift/ose/pull/22148) * [Full changelog](openshift/ose@c547bc3...0cbcfc5) ### [cluster-authentication-operator](https://github.com/openshift/cluster-authentication-operator) * Add redeploy on serving cert and operator pod template change [openshift#75](openshift/cluster-authentication-operator#75) * Create the service before waiting for serving certs [openshift#84](openshift/cluster-authentication-operator#84) * [Full changelog](openshift/cluster-authentication-operator@78dd53b...35879ec) ### [cluster-image-registry-operator](https://github.com/openshift/cluster-image-registry-operator) * Enable subresource status [openshift#209](openshift/cluster-image-registry-operator#209) * Add ReadOnly flag [openshift#210](openshift/cluster-image-registry-operator#210) * do not setup ownerrefs for clusterscoped/cross-namespace objects [openshift#215](openshift/cluster-image-registry-operator#215) * s3: include operator version in UserAgent for AWS API calls [openshift#212](openshift/cluster-image-registry-operator#212) * [Full changelog](openshift/cluster-image-registry-operator@0780074...8060048) ### [cluster-ingress-operator](https://github.com/openshift/cluster-ingress-operator) * Adds info log msg indicating ns/secret used by DNSManager [openshift#134](openshift/cluster-ingress-operator#134) * Introduce certificate controller [openshift#140](openshift/cluster-ingress-operator#140) * [Full changelog](openshift/cluster-ingress-operator@1b4fa5a...09d14db) ### [cluster-kube-apiserver-operator](https://github.com/openshift/cluster-kube-apiserver-operator) * bump(*): fix installer pod shutdown and rolebinding [openshift#307](openshift/cluster-kube-apiserver-operator#307) * bump to fix early status [openshift#309](openshift/cluster-kube-apiserver-operator#309) * [Full changelog](openshift/cluster-kube-apiserver-operator@4016927...fa75c05) ### [cluster-kube-controller-manager-operator](https://github.com/openshift/cluster-kube-controller-manager-operator) * bump(*): fix installer pod shutdown and rolebinding [openshift#183](openshift/cluster-kube-controller-manager-operator#183) * bump to fix empty status [openshift#184](openshift/cluster-kube-controller-manager-operator#184) * [Full changelog](openshift/cluster-kube-controller-manager-operator@95f5f32...53ff6d8) ### [cluster-kube-scheduler-operator](https://github.com/openshift/cluster-kube-scheduler-operator) * Rotate kubeconfig [openshift#62](openshift/cluster-kube-scheduler-operator#62) * Don't pass nil function pointer to NewConfigObserver [openshift#65](openshift/cluster-kube-scheduler-operator#65) * [Full changelog](openshift/cluster-kube-scheduler-operator@50848b4...7066c96) ### [cluster-monitoring-operator](https://github.com/openshift/cluster-monitoring-operator) * *: Clean test invocation and documenation [openshift#267](openshift/cluster-monitoring-operator#267) * pkg/operator: fix progressing state of cluster operator [openshift#268](openshift/cluster-monitoring-operator#268) * jsonnet/main.jsonnet: Bump Prometheus to v2.7.1 [openshift#246](openshift/cluster-monitoring-operator#246) * OWNERS: Remove ironcladlou [openshift#204](openshift/cluster-monitoring-operator#204) * test/e2e: Refactor framework setup & wait for query logic [openshift#265](openshift/cluster-monitoring-operator#265) * jsonnet: Update dependencies [openshift#269](openshift/cluster-monitoring-operator#269) * [Full changelog](openshift/cluster-monitoring-operator@94b701f...3609aea) ### [cluster-network-operator](https://github.com/openshift/cluster-network-operator) * Update to be able to track both DaemonSets and Deployments [openshift#102](openshift/cluster-network-operator#102) * openshift-sdn: more service-catalog netnamespace fixes [openshift#108](openshift/cluster-network-operator#108) * [Full changelog](openshift/cluster-network-operator@9db4d03...15204e6) ### [cluster-openshift-apiserver-operator](https://github.com/openshift/cluster-openshift-apiserver-operator) * bump to fix status reporting [openshift#157](openshift/cluster-openshift-apiserver-operator#157) * [Full changelog](openshift/cluster-openshift-apiserver-operator@1ce6ac7...0a65fe4) ### [cluster-samples-operator](https://github.com/openshift/cluster-samples-operator) * use pumped up rate limiter, shave 30 seconds from startup creates [openshift#113](openshift/cluster-samples-operator#113) * [Full changelog](openshift/cluster-samples-operator@4726068...f001324) ### [cluster-storage-operator](https://github.com/openshift/cluster-storage-operator) * WaitForFirstConsumer in AWS StorageClass [openshift#12](openshift/cluster-storage-operator#12) * [Full changelog](openshift/cluster-storage-operator@dc42489...b850242) ### [console](https://github.com/openshift/console) * Add back OAuth configuration link in kubeadmin notifier [openshift#1202](openshift/console#1202) * Normalize display of <ResourceIcon> across browsers, platforms [openshift#1210](openshift/console#1210) * Add margin spacing so event info doesn't run together before truncating [openshift#1170](openshift/console#1170) * [Full changelog](openshift/console@a0b75bc...d10fb8b) ### [docker-registry](https://github.com/openshift/image-registry) * Bump k8s and OpenShift, use new docker-distribution branch [openshift#165](openshift/image-registry#165) * [Full changelog](openshift/image-registry@75a1fbe...afcc7da) ### [installer](https://github.com/openshift/installer) * data: route53 A records with SimplePolicy should not use health check [openshift#1308](openshift#1308) * bootkube.sh: do not hide problems with render [openshift#1274](openshift#1274) * data/bootstrap/files/usr/local/bin/bootkube: etcdctl from release image [openshift#1315](openshift#1315) * pkg/types/validation: Drop v1beta1 backwards compat hack [openshift#1251](openshift#1251) * pkg/asset/tls: self-sign etcd-client-ca [openshift#1267](openshift#1267) * pkg/asset/tls: self-sign aggregator-ca [openshift#1275](openshift#1275) * pkg/types/validation/installconfig: Drop nominal v1beta2 support [openshift#1319](openshift#1319) * Removing unused/deprecated security groups and ports. Updated AWS doc [openshift#1306](openshift#1306) * [Full changelog](openshift/installer@0208204...563f71f) ### [jenkins, jenkins-agent-maven, jenkins-agent-nodejs](https://github.com/openshift/jenkins) * recover from jenkins deps backleveling workflow-durable-task-step fro… [openshift#806](openshift/jenkins#806) * [Full changelog](openshift/jenkins@2485f9a...e4583ca) ### [machine-api-operator](https://github.com/openshift/machine-api-operator) * Rename labels from sigs.k8s.io to machine.openshift.io [openshift#213](openshift/machine-api-operator#213) * Remove clusters.cluster.k8s.io CRD [openshift#225](openshift/machine-api-operator#225) * MAO: Stop setting statusProgressing=true when resyincing same version [openshift#217](openshift/machine-api-operator#217) * Generate clientset for machine health check API [openshift#223](openshift/machine-api-operator#223) * [Full changelog](openshift/machine-api-operator@bf95d7d...34c3424) ### [machine-config-controller, machine-config-daemon, machine-config-operator, machine-config-server, setup-etcd-environment](https://github.com/openshift/machine-config-operator) * daemon: Only print status if os == RHCOS [openshift#495](openshift/machine-config-operator#495) * Add pod image to image-references [openshift#500](openshift/machine-config-operator#500) * pkg/daemon: stash the node object [openshift#464](openshift/machine-config-operator#464) * Eliminate use of cpu limits [openshift#503](openshift/machine-config-operator#503) * MCD: add ign validation check for mc.ignconfig [openshift#481](openshift/machine-config-operator#481) * [Full changelog](openshift/machine-config-operator@875f25e...f0b87fc) ### [operator-lifecycle-manager](https://github.com/operator-framework/operator-lifecycle-manager) * fix(owners): remove cross-namespace and cluster->namespace ownerrefs [openshift#729](operator-framework/operator-lifecycle-manager#729) * [Full changelog](operator-framework/operator-lifecycle-manager@1ac9ace...9186781) ### [operator-marketplace](https://github.com/operator-framework/operator-marketplace) * [opsrc] Do not delete csc during purge [openshift#117](operator-framework/operator-marketplace#117) * Remove Dependency on Owner References [openshift#118](operator-framework/operator-marketplace#118) * [Full changelog](operator-framework/operator-marketplace@7b53305...fedd694) [1]: openshift/origin#22030
Like this? |
Yeah, just like that. It boggles my mind that I made that comment. I can't understand how I could have missed that, particularly given that it is the majority of the changes in the PR. |
@@ -3,7 +3,9 @@ | |||
data "aws_region" "current" {} | |||
|
|||
// Fetch a list of available AZs | |||
data "aws_availability_zones" "azs" {} | |||
data "aws_availability_zones" "azs" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm wrong, but with #1121 the available check in go will make sure that the masterAZs chosen were available and az_to_<public/private>subnet
maps will make sure we only create machines in AZs chosen by installer
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm wrong, but with #1121 the available check in go...
Yup, the Terraform part of this is code that is hopefully going away soon. But we want the Go part of this long-term, and the Terraform code is still there now, so I think we might as well fix it too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well technically terraform should use the AZs given by the installer ie if the AZ goes out of service between asset generation and terraform call... it would be better if we fail with error that AZ is out of service rather than the AZ not found in the az_to_<public/private>subnet
maps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well technically terraform should use the AZs given by the installer...
Agreed.
... ie if the AZ goes out of service between asset generation and terraform call...
I think using Go-provided zones means we can drop this data "aws_availability_zones"
section altogether. Guarding against a zone going down is always going to be racy up through actually creating resources in the zone, so I'd rather save a few lines and the mental overhead and just have Terraform assume Go has already done any required validation and blindly consume what it was given.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using Go-provided zones means we can drop this
data "aws_availability_zones"
section altogether. Guarding against a zone going down is always going to be racy up through actually creating resources in the zone, so I'd rather save a few lines and the mental overhead and just have Terraform assume Go has already done any required validation and blindly consume what it was given.
I can get behind that change. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can get behind that change. :)
Did you want me to pick that up in this PR? Or submit it as a follow-up once this lands?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using Go-provided zones means we can drop this
data "aws_availability_zones"
section altogether. Guarding against a zone going down is always going to be racy up through actually creating resources in the zone, so I'd rather save a few lines and the mental overhead and just have Terraform assume Go has already done any required validation and blindly consume what it was given.I can get behind that change. :)
But that would need some changes esp https://jira.coreos.com/browse/CORS-902, that will push this PR to 4.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can get behind that change. :)
Did you want me to pick that up in this PR? Or submit it as a follow-up once this lands?
follow up is fine.
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: abhinavdahiya, wking The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This was originally part of avoiding broken zones, see e8921c3 (ci-operator/templates/openshift: Get e2e-aws out of us-east-1b, 2019-03-22, openshift#3204) and b717933 (ci-operator/templates/openshift/installer/cluster-launch-installer-*: Random AWS regions for IPI, 2020-01-23, openshift#6833). But the installer has had broken-zone avoidence since way back in openshift/installer@71aef620b6 (pkg/asset/machines/aws: Only return available zones, 2019-02-07, openshift/installer#1210) I dunno how reliably AWS sets 'state: impaired' and similar; it didn't seem to protect us from e8921c3. But we're getting ready to pivot to using multiple AWS accounts, which creates two issues with hard-coding region names in the step: 1. References by name are not stable between accounts. From the AWS docs [1]: To ensure that resources are distributed across the Availability Zones for a Region, we independently map Availability Zones to names for each AWS account. For example, the Availability Zone us-east-1a for your AWS account might not be the same location as us-east-1a for another AWS account. So "aah, us-east-1a is broken, let's use b and c instead" might apply to one account but not the other. And the installer does not currently accept zone IDs. 2. References by name may not exist in other accounts. From the AWS docs [1]: As Availability Zones grow over time, our ability to expand them can become constrained. If this happens, we might restrict you from launching an instance in a constrained Availability Zone unless you already have an instance in that Availability Zone. Eventually, we might also remove the constrained Availability Zone from the list of Availability Zones for new accounts. Therefore, your account might have a different number of available Availability Zones in a Region than another account. And it turns out that for some reason they sometimes don't name sequentially, e.g. our new account lacks us-west-1a: $ AWS_PROFILE=ci aws --region us-west-1 ec2 describe-availability-zones | jq -r '.AvailabilityZones[] | .ZoneName + " " + .ZoneId + " " + .State' | sort us-west-1a usw1-az3 available us-west-1b usw1-az1 available $ AWS_PROFILE=ci-2 aws --region us-west-1 ec2 describe-availability-zones | jq -r '.AvailabilityZones[] | .ZoneName + " " + .ZoneId + " " + .State' | sort us-west-1b usw1-az3 available us-west-1c usw1-az1 available I have no idea why they decided to do that, but we have to work with the world as it is ;). Removing the us-east-1 overrides helps reduce our exposure, although we are still vulnerable to (2) with the a/b default line. We'll do something about that in follow-up work. Leaving the "which zones?" decision up to the installer would cause it to try to set up each available zone, and that causes more API contention and resource consumption than we want. Background on that in 51c4a37 (ci-operator/templates/openshift: Explicitly set AWS availability zones, 2019-03-28, openshift#3285) and d87fffb (ci-operator/templates/openshift: Drop us-east-1c, 2019-04-26, openshift#3615), as well as the rejected/rotted-out [2]. [1]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html [2]: openshift/installer#1487
Zones can also be impaired, unavailable, or "information" (docs here and here). I dunno what that last one is about. But we only want to put machines in available zones ;).
This is a bit racy, because we could get different responses at both call-sites. Moving forward, we'll want to shift this request into a single available-zones asset.
Fixes #1209.