-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding capability for HA control plane testing with kops #6113
Conversation
b7af578
to
cc29ef3
Compare
/hold Need to do e2e test with this |
28d98c2
to
442df92
Compare
b722f2f
to
37f743c
Compare
kubetest/kops.go
Outdated
var zones []string | ||
|
||
// if zones is set to zero and gcp project is not set then pick random aws zone | ||
if *kopsZones == "" && gcpProject == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why check for gcpProject
? gce vs aws? maybe check for provider here
kubetest/kops.go
Outdated
} | ||
|
||
if len(zones) == 0 { | ||
return nil, fmt.Errorf("no zones found") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
errors.New
, same as below
kubetest/kops.go
Outdated
"--zones", strings.Join(k.zones, ","), | ||
"--override", "'cluster.spec.etcdClusters[*].version=3.1.10'", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we pinning etcd version for all the kops job now? (does etcd 3.1 work with older releases?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, what versions are we testing?? Do I have access to the version number for the test run?
kubetest/kops.go
Outdated
} | ||
} | ||
|
||
// TODO get the number of ec2 instances in the region and ensure that there are not too many running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: TODO(chrislovecnm): blah blah
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh - that would be cool. Having something like boskos that acts as a mutex. I wonder if we could "fake it" with tags or some cheap resource somehow...
/lint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@krzyzacy: 0 warnings.
In response to this:
/lint
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
2b1fc28
to
20e7e6b
Compare
kubetest/kops.go
Outdated
kopsDiskSize = flag.Int("kops-disk-size", 48, "Disk size to use for nodes and masters") | ||
kopsHA = flag.Bool("kops-ha", false, "(kops only) Run HA control plane") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make thiskops-master-count
and have it pass through to --master-count
?
Or kops-multizone
- as that's closer to what it is doing - picking distinct zones in the same region?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is kops-master-count, not multizone :) Yes I can do that
kubetest/kops.go
Outdated
kopsDiskSize = flag.Int("kops-disk-size", 48, "Disk size to use for nodes and masters") | ||
kopsHA = flag.Bool("kops-ha", false, "(kops only) Run HA control plane") | ||
kopsEtcd = flag.String("kops-etcd", "", "(kops only) Etcd Version") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And perhaps rename to kops-etcd-version
or should we just expose the override flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can move this to another PR if you like, but we need it for HA testing in general
kubetest/kops.go
Outdated
ha bool | ||
|
||
// kopsEtcd is the etcd version to run | ||
kopsEtcd string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for kops prefix here (in struct kops
). And I suggest etcdVersion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
kubetest/kops.go
Outdated
zones = *z | ||
} else { | ||
zones = strings.Split(*kopsZones, ",") | ||
if err := os.Setenv("ZONE", zones[0]); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take it this is no longer needed, as it isn't done in the other branch of the if statement. Remove (or add to the other if statement branch)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I frankly have no bloody idea wth this is doing, I refactored it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, then the problem is that it has to be done in both branches. Pull it out of the if statement?
kubetest/kops.go
Outdated
|
||
if len(zones) == 0 { | ||
return nil, errors.New("no zones found") | ||
} else if zones[0] == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate all zones in slice, not just the first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is refactoring of old code. I can do that in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see it in the diff? Can you point me to it?
kubetest/kops.go
Outdated
} | ||
zones = *z | ||
} else { | ||
zones = strings.Split(*kopsZones, ",") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the validation error message should be "kops-zones must be specified when not running on AWS"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again old code, we can follow-up with another PR if you like.
kubetest/kops.go
Outdated
} | ||
} | ||
|
||
// TODO get the number of ec2 instances in the region and ensure that there are not too many running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh - that would be cool. Having something like boskos that acts as a mutex. I wonder if we could "fake it" with tags or some cheap resource somehow...
kubetest/kops.go
Outdated
// getRandomAWSZones looks up all regions, and the availability zones for those regions. A random | ||
// region is then chosen and the AZ's for that region is returned. If HA is required a regions with 3 | ||
// or more zones is returned. | ||
func getRandomAWSZones(isHA bool) (*[]string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return []string, error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why? Would love to understand more.
kubetest/kops.go
Outdated
if err != nil { | ||
return nil, err | ||
} | ||
zones = *z |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd have getRandomAWSZones
return []string, and just check that if len(zones) == 0) { return nil, errors.New("could not pick zones") }
or similar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hrmmm I get what you are saying, but I would rather check for an error. That is why I made it a pointer. Can you live with it?
kubetest/kops_test.go
Outdated
) | ||
|
||
func Test_GetHARandomRegion(t *testing.T) { | ||
t.Skip("this test requires a valid aws account to run") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you wanted to write a test, you could split getRandomAWSZones into two functions - one that builds the map, the other that returns a list of zones with at least N members
kubetest/kops.go
Outdated
// TODO get the number of ec2 instances in the region and ensure that there are not too many running | ||
log.Printf("Launching cluster in region: %q", selectedRegion) | ||
|
||
return &selectedZones, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if selectedZones > 3 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we use all of them.
kubetest/kops.go
Outdated
} | ||
if len(overrides) != 0 { | ||
featureFlags = append(featureFlags, "SpecOverrideFlag") | ||
createArgs = append(createArgs, "--override", strings.Join(overrides, ",")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't join I don't think - not sure it's valid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need context
20e7e6b
to
51116ab
Compare
I'm fine with that. There will be a lag when they announce a new region, but that's not a big deal. |
For the --overrides: I'm not sure we accept |
kubetest/kops.go
Outdated
|
||
if len(zones) == 0 { | ||
return nil, errors.New("no zones found") | ||
} else if zones[0] == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see it in the diff? Can you point me to it?
kubetest/kops.go
Outdated
zones = *z | ||
} else { | ||
zones = strings.Split(*kopsZones, ",") | ||
if err := os.Setenv("ZONE", zones[0]); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, then the problem is that it has to be done in both branches. Pull it out of the if statement?
kubetest/kops.go
Outdated
@@ -273,17 +309,19 @@ func (k kops) Up() error { | |||
"--node-count", strconv.Itoa(k.nodes), | |||
"--node-volume-size", strconv.Itoa(k.diskSize), | |||
"--master-volume-size", strconv.Itoa(k.diskSize), | |||
// TODO chrislovecnm make this configurable | |||
"--master-size", "c4.large", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We override this in some tests (the ENA test IIRC), so this should be a flag if you want to default it to c4.large
kubetest/kops.go
Outdated
// az for each of the regions. AWS Go API does not allow us to make a single call | ||
zonesResults, err := ec2Session.DescribeAvailabilityZones(&ec2.DescribeAvailabilityZonesInput{}) | ||
if err != nil { | ||
return nil, fmt.Errorf("unable to call aws api DescribeAvailabilityZones: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think building the session is going to fail, it's going to be on your first "real" API call - i.e. here
kubetest/kops.go
Outdated
} | ||
// build a map of region and the regions az so that we can lookup AZ easily | ||
for _, z := range zonesResults.AvailabilityZones { | ||
existing, ok := regionMap[*z.RegionName] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can replace the whole block of the for loop
kubetest/kops.go
Outdated
} | ||
|
||
// initialize global pseudo random generator | ||
rand.Seed(time.Now().Unix()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You normally only call rand.Seed once, at the very start of your program (i.e. in main). Many languages don't even require you to call seed at all. It's unusual to seed repeatedly.
So - why are you seeding repeatedly is more the question :-)
kubetest/kops_test.go
Outdated
/* | ||
Per @justinsb | ||
TODO @chrislovecnm | ||
If you wanted to write a test, you could split getRandomAWSZones into two functions - one that builds the map, the other that returns a list of zones with at least N members |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just don't really see the point of these skipped test functions. We're going to find out PDQ if it doesn't work...
kubetest/kops.go
Outdated
kopsDiskSize = flag.Int("kops-disk-size", 48, "Disk size to use for nodes and masters") | ||
kopsMasterCount = flag.Int("kops-master-count", 1, "(kops only) Number of masters to run") | ||
kopsEtcdVersion = flag.String("kops-etcd", "", "(kops only) Etcd Version") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change flag to kops-etcd-version
?
kubetest/kops.go
Outdated
// getRandomAWSZones looks up all regions, and the availability zones for those regions. A random | ||
// region is then chosen and the AZ's for that region is returned. If HA is required a regions with 3 | ||
// or more zones is returned. | ||
func getRandomAWSZones(masterCount int) (*[]string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was unclear before - I suggest this should return ([]string, error)
, because []string is already a ref type, so it's already nullable.
kubetest/kops.go
Outdated
// initialize global pseudo random generator | ||
rand.Seed(time.Now().Unix()) | ||
|
||
// select a random region and zones |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You asked for more context on using rand.Perm.
Instead of looping through and removing the regions one by one, instead just use rand.Perm
to get a list of indexes. Those are the random order in which you will consider the regions in the slice. Then you don't need to do the removal, and you only need one rand call.
for _, i := rand.Perm(n) {
region := regions[i]
...
}
return nil, fmt.Errorf("unable to find region with %d zones", masterCount)
5d444a7
to
66d96e9
Compare
I will test, but I think it is ok either way Otherwise @justinsb PTAL |
We cannot merge this until we get the AWS quotas addressed. |
kubetest/kops.go
Outdated
kopsDiskSize = flag.Int("kops-disk-size", 48, "Disk size to use for nodes and masters") | ||
kopsPublish = flag.String("kops-publish", "", "Publish kops version to the specified gs:// path on success") | ||
kopsMasterType = flag.String("kops-master-type", "c4.large", "(kops only) master instance type") | ||
kopsMasterCount = flag.Int("kops-master-count", 1, "(kops only) Number of masters to run") | ||
kopsEtcdVersion = flag.String("kops-etcd", "", "(kops only) Etcd Version") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flag name should be kops-etcd-version I think
kubetest/kops.go
Outdated
if *kopsZones == "" && provider == "aws" { | ||
// Initialize global pseudo random generator. | ||
// We do not need to intialize it if kopsZones is passed in or we are running the other tests. | ||
// That is why I did not put it in main func. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should go into the main func, it's something that should be done once in every program and then we forget about it
if len(zones) == 0 { | ||
return nil, errors.New("no zones found") | ||
} else if zones[0] == "" { | ||
return nil, errors.New("zone cannot be a empty string") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we validate all the zones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A really good idea, but do you mind if we do it in another PR? I am wondering if that work is outside the scope of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, my plan is to not pass in zones long term. Always pick a random zone or zones with the new func. Because of this zones would be chosen programmatically.
kubetest/kops.go
Outdated
kopsDiskSize = flag.Int("kops-disk-size", 48, "Disk size to use for nodes and masters") | ||
kopsPublish = flag.String("kops-publish", "", "Publish kops version to the specified gs:// path on success") | ||
kopsMasterType = flag.String("kops-master-type", "c4.large", "(kops only) master instance type") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kops-master-size, for consistency with the master-size kops flag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think default should be empty, because otherwise you're changing the size that all tests run with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yah I understand that we are changing all tests, and it is a big change. My overall plan is to always select a random region and at least one random AZ. Because of that plan, we are going to hit far more regions that do not support m3 instance types. If we default it to c4.large it may help us with lack of m3 instances. What are your thoughts?
kubetest/kops.go
Outdated
etcdVersion string | ||
|
||
// masterType is the EC2 instance type for the master | ||
masterType string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
masterSize, for consistency with the kops flag
kubetest/kops.go
Outdated
|
||
// getRandomAWSZones looks up all regions, and the availability zones for those regions. A random | ||
// region is then chosen and the AZ's for that region is returned. If HA is required a regions with 3 | ||
// or more zones is returned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe replace " If HA is required a regions with 3 or more zones is returned." with "At least masterCount zones will be returned, all in the same region" or something like that?
kubetest/kops.go
Outdated
for _, i := range rand.Perm(len(awsRegions)) { | ||
ec2Session, err := getAWSEC2Session(awsRegions[i]) | ||
if err != nil { | ||
return selectedZones, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return nil, err
would be less confusing
kubetest/kops.go
Outdated
// az for a region. AWS Go API does not allow us to make a single call | ||
zoneResults, err := ec2Session.DescribeAvailabilityZones(&ec2.DescribeAvailabilityZonesInput{}) | ||
if err != nil { | ||
return nil, fmt.Errorf("unable to call aws api DescribeAvailabilityZones: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Include region in message
kubetest/kops.go
Outdated
} | ||
} | ||
|
||
return selectedZones, fmt.Errorf("unable to find region with %d zones", masterCount) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nil, fmt.Error...
I think
kubetest/kops_test.go
Outdated
import "testing" | ||
|
||
/// | ||
// TODO:(@chrislovecnm) we should mock out the aws interface so we do not make a aws API call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should do one of (1) just remove the skipped tests, (2) implement the mock, (3) make the function testable by splitting it up. I'm not sure there's much to test in option 3, TBH.
/retest |
90dfa88
to
bfe00e8
Compare
@justinsb PTAL I think there are two points that we may disagree on.
I should have done all of your other updates, please let me know if I missed anything. |
LGTM |
bfe00e8
to
94c4698
Compare
@krzyzacy do you have time to deploy this on Tuesday? |
@chrislovecnm can you squash? |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: chrislovecnm, krzyzacy The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these OWNERS Files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@krzyzacy: 0 warnings.
In response to this:
@chrislovecnm can you squash?
/lgtm
/hold
/lint
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Added a new flag for kops-ha. Also, since we now may run in regions that may not have m3 instances, updated the master type to c4 instances. In order to test HA control plain a minimum of 3 AZs must exist. Implemented a func that queries all regions, and then queries the list of that regions AZ. A random region is then selected. If running in ha mode, a region with a minimum of 3 AZ is selected. The new zone capability can be run in both ha and non-ha modes. If --kops-zones is not set, and the test run is on AWS, the new code is used. You can override the new code by setting the zone as done previously. Added a unit test, that requires a live AWS account for testing the new random region selection. These tests are skipped since they require a live AWS account.
94c4698
to
75e5c46
Compare
/hold cancel |
Added a new flag for kops-ha. Also, since we now may run in regions that
may not have m3 instances, updated the master type to c4 instances.
In order to test HA control plain a minimum of 3 AZs must exist.
Implemented a
func
that queries all regions, and then queries the listof that regions AZ. A random region is then selected. If running in ha
mode, a region with a minimum of 3 AZ is selected. The new zone
capability can be run in both ha and non-ha modes. If --kops-zones is
not set, and the test run is on AWS, the new code is used. You can
override the new code by setting the zone as done previously.
Added a unit test, that requires a live AWS account for testing the new
random region selection. These tests are skipped since they require a
live AWS account.