Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data/aws: use azs for master set in manifests #1121

Merged
merged 2 commits into from
Feb 28, 2019
Merged

data/aws: use azs for master set in manifests #1121

merged 2 commits into from
Feb 28, 2019

Conversation

staebler
Copy link
Contributor

@staebler staebler commented Jan 23, 2019

These changes pass the availability zones to use for the masters set in 99_openshift-cluster-api_master-machines.yaml through to terraform. Prior to these changes the masters were always placed in the first 3 availability zones.

There is no validation done on the availability zones to verify that the are valid for the region.

Fix for https://bugzilla.redhat.com/show_bug.cgi?id=1662119.

@openshift-ci-robot openshift-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 23, 2019
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 23, 2019
@staebler
Copy link
Contributor Author

/hold

Hold on #792 and #890.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 23, 2019
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 7, 2019
@staebler
Copy link
Contributor Author

staebler commented Feb 7, 2019

/hold cancel

Dependent PRs (#792 and #890) have merged.

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 7, 2019
output "subnet_ids" {
value = "${var.subnet_ids}"
}

output "cluster_id" {
value = "${var.cluster_id}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is, like subnet_ids above, also a useless "tell them what they told you" output. Looks like we've been dragging them around since e5c8b41 (platform/aws: add bootstrap node and step for joining it, 2018-02-13, coreos/tectonic-installer#2924).

data/data/aws/vpc/common.tf Outdated Show resolved Hide resolved
@openshift-ci-robot openshift-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Feb 8, 2019
@staebler
Copy link
Contributor Author

staebler commented Feb 8, 2019

/retest

Failing tests:

[Feature:Platform] Managed cluster should have no crashlooping pods in core namespaces over two minutes [Suite:openshift/conformance/parallel]

@openshift-ci-robot openshift-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Feb 13, 2019
@staebler
Copy link
Contributor Author

Rebased on top of #1045 in preparation for that merging.

pkg/asset/cluster/tfvars.go Outdated Show resolved Hide resolved
@openshift-ci-robot openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 15, 2019
@staebler staebler changed the title data/aws: use azs for master set in manifests [WIP] data/aws: use azs for master set in manifests Feb 15, 2019
@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 15, 2019
@staebler
Copy link
Contributor Author

Made a WIP to add support for multiple masters in an availability zone.

@staebler staebler changed the title [WIP] data/aws: use azs for master set in manifests data/aws: use azs for master set in manifests Feb 20, 2019
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 20, 2019
@staebler
Copy link
Contributor Author

I have removed the code that enforces the restriction that an availability zone cannot have multiple masters. There were no changes that needed to be made to the installer to support multiple masters in a zone.

The changes were made in cf3ff39 to 0a9ef3b.

@staebler
Copy link
Contributor Author

/retest

@abhinavdahiya
Copy link
Contributor

testing locally
[install-config.yaml]

...
controlPlane:
  name: master
  platform:
    aws:
      zones:
      - us-east-1b
      - us-east-1d
      - us-east-1f
...

looks like the bootstrap instance is still in the first availability zone...

$ AWS_PROFILE=openshift-dev aws ec2 describe-instances --filters Name=tag-key,Values=kubernetes.io/cluster/adahiya-0-88bqx | jq '.Reservations[].Instances[] | (.Tags[] | select(.Key == "Name") | .Value) + " " + .Placement.AvailabilityZone'
"adahiya-0-88bqx-master-1 us-east-1d"
"adahiya-0-88bqx-bootstrap us-east-1a"
"adahiya-0-88bqx-master-0 us-east-1b"
"adahiya-0-88bqx-master-2 us-east-1f"

@staebler is it possible to create bootstrap machine in the 0th AZ given for controlPlane

public_subnet_ids = "${aws_subnet.public_subnet.*.id}"
private_subnet_count = "${local.new_az_count}"
public_subnet_count = "${local.new_az_count}"
az_to_private_subnet_id = "${zipmap(local.new_subnet_azs, local.private_subnet_ids)}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This seems to be used only in output, why not do the calculation there itself ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^ totaly ignorable 😇

@staebler
Copy link
Contributor Author

level=error msg="1 error occurred:"
level=error msg="\t* module.vpc.aws_route_table_association.route_net[1]: 1 error occurred:"
level=error msg="\t* aws_route_table_association.route_net.1: timeout while waiting for state to become 'success' (timeout: 5m0s)"

rate limiting

/retest

@abhinavdahiya
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 26, 2019
@wking
Copy link
Member

wking commented Feb 26, 2019

e2e-aws:


Flaky tests:

[sig-storage] ConfigMap should be consumable in multiple volumes in the same pod [NodeConformance] [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]
[sig-storage] Downward API volume should provide container's memory limit [NodeConformance] [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]

Failing tests:

[sig-storage] Dynamic Provisioning DynamicProvisioner should provision storage with different parameters [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: hostPath] [Testpattern: Dynamic PV (default fs)] subPath should support file as subpath [Suite:openshift/conformance/parallel] [Suite:k8s]

/retest

@wking
Copy link
Member

wking commented Feb 27, 2019

Conflict with #1296.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

These changes pass the availability zones to use for the masters
set in 99_openshift-cluster-api_master-machines.yaml through to
terraform. Prior to these changes the masters were always placed in
the first 3 availability zones.

There is no validation done on the availability zones to verify that
they are valid for the region.

Fix for https://bugzilla.redhat.com/show_bug.cgi?id=1662119.
The bootstrap node was being placed in the first availability zone in the region.
Now, place the bootstrap node in the same availability zone as the first master.

Remove the local az_to_private_subnet_id variable from the vpc module
as it is only used as an output from the module. The output value is now calculated
at the place where the output value is defined.

Remove the cluster_id output value from the vpc module as it is unused.
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Feb 27, 2019
@staebler
Copy link
Contributor Author

/test e2e-aws

@abhinavdahiya
Copy link
Contributor

Conflict with #1296.

Sorry @staebler thanks for keeping up with rebase. 💯

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 27, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, staebler

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [abhinavdahiya,staebler]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@staebler
Copy link
Contributor Author

level=error msg="2 errors occurred:"
level=error msg="\t* module.vpc.aws_route_table_association.private_routing[3]: 1 error occurred:"
level=error msg="\t* aws_route_table_association.private_routing.3: timeout while waiting for state to become 'success' (timeout: 5m0s)"
level=error
level=error
level=error msg="\t* module.vpc.aws_route_table_association.private_routing[0]: 1 error occurred:"
level=error msg="\t* aws_route_table_association.private_routing.0: timeout while waiting for state to become 'success' (timeout: 5m0s)"

Rate limiting

/retest

@abhinavdahiya
Copy link
Contributor

/retest

@wking
Copy link
Member

wking commented Feb 28, 2019

e2e-aws:

Flaky tests:

[sig-scheduling] Multi-AZ Clusters should spread the pods of a replication controller across zones [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] Dynamic Provisioning DynamicProvisioner should provision storage with different parameters [Suite:openshift/conformance/parallel] [Suite:k8s]

Failing tests:

[sig-scheduling] Multi-AZ Clusters should spread the pods of a service across zones [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: aws] [Testpattern: Dynamic PV (default fs)] subPath should fail for new directories when readOnly specified in the volumeSource [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: aws] [Testpattern: Dynamic PV (default fs)] subPath should support existing directories when readOnly specified in the volumeSource [Suite:openshift/conformance/parallel] [Suite:k8s]

/retest

@openshift-merge-robot openshift-merge-robot merged commit 13d1158 into openshift:master Feb 28, 2019
abhinavdahiya added a commit to abhinavdahiya/installer that referenced this pull request Mar 1, 2019
openshift@afa0b59 had moved the bootstrap node to private
subnet based on openshift#1121 (comment), but we need the bootstrap node in public subnet to be able to ssh.

The bootstrap node is accesible on ssh again.
```console
$ ush core@18.215.154.240
Warning: Permanently added '18.215.154.240' (ECDSA) to the list of known hosts.
Red Hat CoreOS 4.0 Beta
WARNING: Direct SSH access to machines is not recommended.
This node has been annotated with machineconfiguration.openshift.io/ssh=accessed

---
This is the bootstrap node; it will be destroyed when the master is fully up.

The primary service is "bootkube.service". To watch its status, run e.g.

  journalctl -b -f -u bootkube.service
[core@ip-10-0-8-165 ~]$
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants