Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Take maintenance mode into account while choosing coordinators and read maintenance mode information from special key space #1652

Merged
merged 8 commits into from
Jun 2, 2023

Conversation

sbodagala
Copy link
Contributor

@sbodagala sbodagala commented May 30, 2023

Description

Extend the logic that selects/verifies the validity of current coordinators to take maintenance mode information into account.

Read maintenance mode information from special keyspace.

Type of change

  • Other (extends the existing functionality)

Discussion

Are there any design details that you would like to discuss further?

No.

Testing

Please describe the tests that you ran to verify your changes. Unit tests?
Manual testing?

Ran the test, locally, that has been added in this PR.

Do we need to perform additional testing once this is merged, or perform in a larger testing environment?

Yes (nightly regression tests will need to be run).

Documentation

Did you update relevant documentation within this repository?

No.

If this change is adding new functionality, do we need to describe it in our user manual?

Maybe (at a later point).

If this change is adding or removing subreconcilers, have we updated the core technical design doc to reflect that?

No (maybe at a later point).

If this change is adding new safety checks or new potential failure modes, have we documented and how to debug potential issues?

N/A (I think).

Follow-up

Are there any follow-up issues that we should pursue in the future?

No.

Does this introduce new defaults that we should re-evaluate in the future?

No.

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr-kind on Linux CentOS 7

  • Commit ID: 1386f3d
  • Duration 0:05:42
  • Result: ❌ FAILED
  • Error: Error while executing command: make -C e2e kind-setup. Reason: exit status 2
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 1386f3d
  • Duration 3:04:32
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 4b180bc
  • Duration 3:14:45
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr-kind on Linux CentOS 7

  • Commit ID: 4b180bc
  • Duration 4:10:07
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

Copy link
Member

@johscheuer johscheuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM 👍 Can you please add a comment on the new variable. Do you think it makes sense to add an e2e test case for this into the operator test suite to make sure the system behaves correct?

api/v1beta2/foundationdb_status.go Show resolved Hide resolved
@sbodagala
Copy link
Contributor Author

sbodagala commented May 31, 2023

Do you think it makes sense to add an e2e test case for this into the operator test suite to make sure the system behaves correct?

Yes. How about we do that in a follow up radar/PR (that allows this PR to be merged and also avoids the need to make an image with these changes)?

Copy link
Member

@johscheuer johscheuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it makes sense to add an e2e test case for this into the operator test suite to make sure the system behaves correct?
Yes. How about we do that in a follow up radar/PR (that allows this PR to be merged and also avoids the need to make an image with these changes)?

That's fine for me but you don't have to build any images, the e2e pipeline will build the operator image based on the changes in the current branch e.g. would include your changes.

@sbodagala
Copy link
Contributor Author

Do you think it makes sense to add an e2e test case for this into the operator test suite to make sure the system behaves correct?
Yes. How about we do that in a follow up radar/PR (that allows this PR to be merged and also avoids the need to make an image with these changes)?

That's fine for me but you don't have to build any images, the e2e pipeline will build the operator image based on the changes in the current branch e.g. would include your changes.

No, I meant if we like to run a real-cluster test locally (over a PR that includes changes to the operator code) we will need to build an image?

@sbodagala
Copy link
Contributor Author

Ok, let's merge this PR and write the test in a follow up PR. Thanks!

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 7f9afc4
  • Duration 3:37:29
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr-kind on Linux CentOS 7

  • Commit ID: 7f9afc4
  • Duration 4:09:56
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@johscheuer
Copy link
Member

Do you think it makes sense to add an e2e test case for this into the operator test suite to make sure the system behaves correct?
Yes. How about we do that in a follow up radar/PR (that allows this PR to be merged and also avoids the need to make an image with these changes)?

That's fine for me but you don't have to build any images, the e2e pipeline will build the operator image based on the changes in the current branch e.g. would include your changes.

No, I meant if we like to run a real-cluster test locally (over a PR that includes changes to the operator code) we will need to build an image?

The e2e test are using real clusters, so you're talking about doing manual testing?

@sbodagala
Copy link
Contributor Author

sbodagala commented Jun 1, 2023

Do you think it makes sense to add an e2e test case for this into the operator test suite to make sure the system behaves correct?
Yes. How about we do that in a follow up radar/PR (that allows this PR to be merged and also avoids the need to make an image with these changes)?

That's fine for me but you don't have to build any images, the e2e pipeline will build the operator image based on the changes in the current branch e.g. would include your changes.

No, I meant if we like to run a real-cluster test locally (over a PR that includes changes to the operator code) we will need to build an image?

The e2e test are using real clusters, so you're talking about doing manual testing?

Yes (running the test locally). It may take a couple of iterations to get a test to work correctly so running a test locally could be helpful (rather than relying on CI pipeline for those iterations).

@sbodagala
Copy link
Contributor Author

Anyway, added a test to "test_operator/operator.go".

@sbodagala sbodagala changed the title Take maintenance mode into account while choosing coordinators Take maintenance mode into account while choosing coordinators and read maintenance mode information from special key space Jun 1, 2023
@sbodagala
Copy link
Contributor Author

sbodagala commented Jun 1, 2023

Integration checks failed below, not sure what the problem is though. For example, https://github.com/FoundationDB/fdb-kubernetes-operator/actions/runs/5147753153/jobs/9268556581?pr=1652 says "Process exited with error code", not sure what that really means.

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: c319531
  • Duration 3:00:13
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@sbodagala
Copy link
Contributor Author

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: c319531
  • Duration 3:00:13
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

This is because of this failure:

TESTS FAILED SEE THESE LOGS:

/codebuild/output/src147849312/src/github.com/FoundationDB/fdb-kubernetes-operator/logs/test_operator.log

In test ""maintenance mode is on" the cluster status is returning an empty maintenance mode string, causing the test to fail.

@sbodagala
Copy link
Contributor Author

Disabled the maintenance related test, so this PR can be merged. I will investigate the test failure and re-enable it in a follow up PR.

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: a35e71c
  • Duration 2:32:47
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr-kind on Linux CentOS 7

  • Commit ID: c319531
  • Duration 4:10:03
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 73ffb1c
  • Duration 2:59:20
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr-kind on Linux CentOS 7

  • Commit ID: 73ffb1c
  • Duration 4:07:01
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr-kind on Linux CentOS 7

  • Commit ID: a35e71c
  • Duration 4:07:39
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: da01126
  • Duration 3:01:03
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 45990ab
  • Duration 3:04:02
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr-kind on Linux CentOS 7

  • Commit ID: da01126
  • Duration 4:06:27
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr-kind on Linux CentOS 7

  • Commit ID: 45990ab
  • Duration 4:06:52
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@johscheuer johscheuer closed this Jun 2, 2023
@johscheuer johscheuer reopened this Jun 2, 2023
@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr-kind on Linux CentOS 7

  • Commit ID: 45990ab
  • Duration 0:05:50
  • Result: ❌ FAILED
  • Error: Error while executing command: make -C e2e kind-setup. Reason: exit status 2
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@johscheuer johscheuer merged commit 336cf41 into FoundationDB:main Jun 2, 2023
@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 45990ab
  • Duration 2:58:56
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

ammolitor pushed a commit that referenced this pull request Jun 19, 2023
…ad maintenance mode information from special key space (#1652)

* - Take maintenance mode into account while choosing coordinators.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants