Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix master regex when running multiple clusters #58561

Merged
merged 1 commit into from
Jan 24, 2018

Conversation

jesseshieh
Copy link
Contributor

@jesseshieh jesseshieh commented Jan 20, 2018

What this PR does / why we need it:
I'm running two Kubernetes clusters on GCE. One for production and one for staging. The instance prefix I use for production is kubernetes and for staging it's staging-kubernetes. This caused a problem when running kube-up.sh for production because when it tries to find all instances which match kubernetes(-...)? it finds both the production and staging instances. This probably results in multiple problems, but the most noticeable one for me was that INITIAL_ETCD_CLUSTER was incorrect and so etcd wouldn't start up correctly so the api server doesn't start up correctly so nothing else starts up. I tested this manually and it seems to work for me, but I didn't write an automated test.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Release note:

Fixes bug finding master replicas in GCE when running multiple Kubernetes clusters

I'm running two Kubernetes clusters on GCE. One for production and one for staging. The instance prefix I use for production is `kubernetes` and for staging it's `staging-kubernetes`. This caused a problem when running `kube-up.sh` for production because when it tries to find all instances which match `kubernetes(-...)?` it finds both the production and staging instances. This probably results in multiple problems, but the most noticeable one for me was that I`NITIAL_ETCD_CLUSTER` was incorrect and so etcd wouldn't start up correctly so the api server doesn't start up correctly so nothing else starts up. I tested this manually and it seems to work for me, but I didn't write an automated test.
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 20, 2018
@resouer
Copy link
Contributor

resouer commented Jan 20, 2018

/ok-to-test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jan 20, 2018
@@ -1868,7 +1868,7 @@ function get-master-replicas-count() {
# Prints regexp for full master machine name. In a cluster with replicated master,
# VM names may either be MASTER_NAME or MASTER_NAME with a suffix for a replica.
function get-replica-name-regexp() {
echo "${MASTER_NAME}(-...)?"
echo "^${MASTER_NAME}(-...)?"
Copy link
Contributor

@resouer resouer Jan 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use kubernetes-staging in your own env? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point :) Had I known I was gonna run into this issue I would have totally done that! I'm just hoping this will help someone else down the line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand this change. What's the motivation of it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote a little bit about it in the PR description above, but basically, I'm running a production cluster and staging cluster and I have GCE instances named kubernetes-master-001, kubernetes-master-002, etc and more GCE instances named staging-kubernetes-master-001, staging-kubernetes-master-002, etc.

When running kube-up.sh to create another production master, the INITIAL_ETCD_CLUSTER contains the production and staging instances when it should only be production instances. As a result, etcd fails to startup with an errors like member count unequal -> the api server fails to startup -> nothing else works and the newly created master is broken.

This change makes sure that the INITIAL_ETCD_CLUSTER only contains the production master instances.

@jesseshieh
Copy link
Contributor Author

/assign @jszczepkowski

@jesseshieh
Copy link
Contributor Author

@wojtek-t any thoughts on this? I think it should be pretty harmless and could help some people down the line in similar situations as mine.

@wojtek-t
Copy link
Member

OK - that makes sense.

/lgtm
/approve no-issue

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 24, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jesseshieh, wojtek-t

Associated issue requirement bypassed by: wojtek-t

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 24, 2018
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

3 similar comments
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Jan 24, 2018

@jesseshieh: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-e2e-gke-gci f9e43f3 link /test pull-kubernetes-e2e-gke-gci

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-github-robot
Copy link

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit 6e65c23 into kubernetes:master Jan 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants