data/aws: use azs for master set in manifests #1121

staebler · 2019-01-23T19:37:49Z

These changes pass the availability zones to use for the masters set in 99_openshift-cluster-api_master-machines.yaml through to terraform. Prior to these changes the masters were always placed in the first 3 availability zones.

There is no validation done on the availability zones to verify that the are valid for the region.

Fix for https://bugzilla.redhat.com/show_bug.cgi?id=1662119.

staebler · 2019-01-23T19:39:21Z

/hold

Hold on #792 and #890.

staebler · 2019-02-07T17:29:33Z

/hold cancel

Dependent PRs (#792 and #890) have merged.

wking · 2019-02-08T07:29:10Z

data/data/aws/master/outputs.tf

-output "subnet_ids" {
-  value = "${var.subnet_ids}"
-}
-
 output "cluster_id" {
  value = "${var.cluster_id}"


This is, like subnet_ids above, also a useless "tell them what they told you" output. Looks like we've been dragging them around since e5c8b41 (platform/aws: add bootstrap node and step for joining it, 2018-02-13, coreos/tectonic-installer#2924).

data/data/aws/vpc/common.tf

staebler · 2019-02-08T20:01:43Z

/retest

Failing tests:

[Feature:Platform] Managed cluster should have no crashlooping pods in core namespaces over two minutes [Suite:openshift/conformance/parallel]

staebler · 2019-02-13T22:20:59Z

Rebased on top of #1045 in preparation for that merging.

pkg/tfvars/aws/aws.go

pkg/asset/cluster/tfvars.go

staebler · 2019-02-15T23:21:48Z

Made a WIP to add support for multiple masters in an availability zone.

staebler · 2019-02-20T15:13:58Z

I have removed the code that enforces the restriction that an availability zone cannot have multiple masters. There were no changes that needed to be made to the installer to support multiple masters in a zone.

The changes were made in cf3ff39 to 0a9ef3b.

staebler · 2019-02-21T14:07:56Z

/retest

abhinavdahiya · 2019-02-22T22:03:06Z

testing locally
[install-config.yaml]

...
controlPlane:
  name: master
  platform:
    aws:
      zones:
      - us-east-1b
      - us-east-1d
      - us-east-1f
...

looks like the bootstrap instance is still in the first availability zone...

$ AWS_PROFILE=openshift-dev aws ec2 describe-instances --filters Name=tag-key,Values=kubernetes.io/cluster/adahiya-0-88bqx | jq '.Reservations[].Instances[] | (.Tags[] | select(.Key == "Name") | .Value) + " " + .Placement.AvailabilityZone'
"adahiya-0-88bqx-master-1 us-east-1d"
"adahiya-0-88bqx-bootstrap us-east-1a"
"adahiya-0-88bqx-master-0 us-east-1b"
"adahiya-0-88bqx-master-2 us-east-1f"

@staebler is it possible to create bootstrap machine in the 0th AZ given for controlPlane

data/data/aws/master/main.tf

abhinavdahiya · 2019-02-22T22:07:13Z

data/data/aws/vpc/common.tf

+  public_subnet_ids       = "${aws_subnet.public_subnet.*.id}"
+  private_subnet_count    = "${local.new_az_count}"
+  public_subnet_count     = "${local.new_az_count}"
+  az_to_private_subnet_id = "${zipmap(local.new_subnet_azs, local.private_subnet_ids)}"


nit: This seems to be used only in output, why not do the calculation there itself ...

^^ totaly ignorable 😇

staebler · 2019-02-26T13:15:15Z

level=error msg="1 error occurred:"
level=error msg="\t* module.vpc.aws_route_table_association.route_net[1]: 1 error occurred:"
level=error msg="\t* aws_route_table_association.route_net.1: timeout while waiting for state to become 'success' (timeout: 5m0s)"

rate limiting

/retest

abhinavdahiya · 2019-02-26T17:47:08Z

/lgtm

wking · 2019-02-26T22:54:23Z

e2e-aws:


Flaky tests:

[sig-storage] ConfigMap should be consumable in multiple volumes in the same pod [NodeConformance] [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]
[sig-storage] Downward API volume should provide container's memory limit [NodeConformance] [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]

Failing tests:

[sig-storage] Dynamic Provisioning DynamicProvisioner should provision storage with different parameters [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: hostPath] [Testpattern: Dynamic PV (default fs)] subPath should support file as subpath [Suite:openshift/conformance/parallel] [Suite:k8s]

/retest

wking · 2019-02-27T06:27:51Z

Conflict with #1296.

openshift-bot · 2019-02-27T09:18:04Z

/retest

Please review the full test history for this PR and help us cut down flakes.

These changes pass the availability zones to use for the masters set in 99_openshift-cluster-api_master-machines.yaml through to terraform. Prior to these changes the masters were always placed in the first 3 availability zones. There is no validation done on the availability zones to verify that they are valid for the region. Fix for https://bugzilla.redhat.com/show_bug.cgi?id=1662119.

The bootstrap node was being placed in the first availability zone in the region. Now, place the bootstrap node in the same availability zone as the first master. Remove the local az_to_private_subnet_id variable from the vpc module as it is only used as an output from the module. The output value is now calculated at the place where the output value is defined. Remove the cluster_id output value from the vpc module as it is unused.

staebler · 2019-02-27T16:47:55Z

/test e2e-aws

abhinavdahiya · 2019-02-27T17:34:07Z

Conflict with #1296.

Sorry @staebler thanks for keeping up with rebase. 💯

/lgtm

openshift-ci-robot · 2019-02-27T17:34:18Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, staebler

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [abhinavdahiya,staebler]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

staebler · 2019-02-27T17:46:29Z

level=error msg="2 errors occurred:"
level=error msg="\t* module.vpc.aws_route_table_association.private_routing[3]: 1 error occurred:"
level=error msg="\t* aws_route_table_association.private_routing.3: timeout while waiting for state to become 'success' (timeout: 5m0s)"
level=error
level=error
level=error msg="\t* module.vpc.aws_route_table_association.private_routing[0]: 1 error occurred:"
level=error msg="\t* aws_route_table_association.private_routing.0: timeout while waiting for state to become 'success' (timeout: 5m0s)"

Rate limiting

/retest

abhinavdahiya · 2019-02-27T23:08:52Z

/retest

wking · 2019-02-28T06:34:25Z

e2e-aws:

Flaky tests:

[sig-scheduling] Multi-AZ Clusters should spread the pods of a replication controller across zones [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] Dynamic Provisioning DynamicProvisioner should provision storage with different parameters [Suite:openshift/conformance/parallel] [Suite:k8s]

Failing tests:

[sig-scheduling] Multi-AZ Clusters should spread the pods of a service across zones [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: aws] [Testpattern: Dynamic PV (default fs)] subPath should fail for new directories when readOnly specified in the volumeSource [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: aws] [Testpattern: Dynamic PV (default fs)] subPath should support existing directories when readOnly specified in the volumeSource [Suite:openshift/conformance/parallel] [Suite:k8s]

/retest

openshift@afa0b59 had moved the bootstrap node to private subnet based on openshift#1121 (comment), but we need the bootstrap node in public subnet to be able to ssh. The bootstrap node is accesible on ssh again. ```console $ ush core@18.215.154.240 Warning: Permanently added '18.215.154.240' (ECDSA) to the list of known hosts. Red Hat CoreOS 4.0 Beta WARNING: Direct SSH access to machines is not recommended. This node has been annotated with machineconfiguration.openshift.io/ssh=accessed --- This is the bootstrap node; it will be destroyed when the master is fully up. The primary service is "bootkube.service". To watch its status, run e.g. journalctl -b -f -u bootkube.service [core@ip-10-0-8-165 ~]$ ```

openshift-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 23, 2019

openshift-ci-robot requested review from smarterclayton and tomassedovic January 23, 2019 19:38

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 23, 2019

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 23, 2019

wking mentioned this pull request Jan 30, 2019

Remove public IPs from masters #1045

Merged

openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 7, 2019

openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 7, 2019

wking mentioned this pull request Feb 8, 2019

pkg/asset/machines/aws: Only return available zones #1210

Merged

wking reviewed Feb 8, 2019

View reviewed changes

data/data/aws/vpc/common.tf Outdated Show resolved Hide resolved

openshift-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Feb 8, 2019

openshift-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Feb 13, 2019

abhinavdahiya reviewed Feb 15, 2019

View reviewed changes

pkg/tfvars/aws/aws.go Outdated Show resolved Hide resolved

wking reviewed Feb 15, 2019

View reviewed changes

pkg/asset/cluster/tfvars.go Outdated Show resolved Hide resolved

openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 15, 2019

staebler changed the title ~~data/aws: use azs for master set in manifests~~ [WIP] data/aws: use azs for master set in manifests Feb 15, 2019

openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 15, 2019

staebler changed the title ~~[WIP] data/aws: use azs for master set in manifests~~ data/aws: use azs for master set in manifests Feb 20, 2019

openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 20, 2019

staebler mentioned this pull request Feb 21, 2019

ControlPlane master pool ignores configured zones #1290

Closed

abhinavdahiya reviewed Feb 22, 2019

View reviewed changes

data/data/aws/master/main.tf Show resolved Hide resolved

abhinavdahiya reviewed Feb 22, 2019

View reviewed changes

openshift-ci-robot assigned abhinavdahiya Feb 26, 2019

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 26, 2019

staebler added 2 commits February 27, 2019 09:54

openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Feb 27, 2019

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 27, 2019

wking mentioned this pull request Feb 27, 2019

data/aws/master: drop 2 unused outputs #1327

Merged

openshift-merge-robot merged commit 13d1158 into openshift:master Feb 28, 2019

abhinavdahiya mentioned this pull request Mar 1, 2019

data/aws: create bootstrap machine in first public subnet #1348

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data/aws: use azs for master set in manifests #1121

data/aws: use azs for master set in manifests #1121

staebler commented Jan 23, 2019 •

edited

Loading

staebler commented Jan 23, 2019

staebler commented Feb 7, 2019

wking Feb 8, 2019

staebler commented Feb 8, 2019

staebler commented Feb 13, 2019

staebler commented Feb 15, 2019

staebler commented Feb 20, 2019

staebler commented Feb 21, 2019

abhinavdahiya commented Feb 22, 2019

abhinavdahiya Feb 22, 2019

abhinavdahiya Feb 22, 2019

staebler commented Feb 26, 2019

abhinavdahiya commented Feb 26, 2019

wking commented Feb 26, 2019

wking commented Feb 27, 2019

openshift-bot commented Feb 27, 2019

staebler commented Feb 27, 2019

abhinavdahiya commented Feb 27, 2019

openshift-ci-robot commented Feb 27, 2019

staebler commented Feb 27, 2019

abhinavdahiya commented Feb 27, 2019

wking commented Feb 28, 2019

data/aws: use azs for master set in manifests #1121

data/aws: use azs for master set in manifests #1121

Conversation

staebler commented Jan 23, 2019 • edited Loading

staebler commented Jan 23, 2019

staebler commented Feb 7, 2019

wking Feb 8, 2019

Choose a reason for hiding this comment

staebler commented Feb 8, 2019

staebler commented Feb 13, 2019

staebler commented Feb 15, 2019

staebler commented Feb 20, 2019

staebler commented Feb 21, 2019

abhinavdahiya commented Feb 22, 2019

abhinavdahiya Feb 22, 2019

Choose a reason for hiding this comment

abhinavdahiya Feb 22, 2019

Choose a reason for hiding this comment

staebler commented Feb 26, 2019

abhinavdahiya commented Feb 26, 2019

wking commented Feb 26, 2019

wking commented Feb 27, 2019

openshift-bot commented Feb 27, 2019

staebler commented Feb 27, 2019

abhinavdahiya commented Feb 27, 2019

openshift-ci-robot commented Feb 27, 2019

staebler commented Feb 27, 2019

abhinavdahiya commented Feb 27, 2019

wking commented Feb 28, 2019

staebler commented Jan 23, 2019 •

edited

Loading