Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protect controller from becoming unschedulable #214

Merged
merged 3 commits into from
Jul 20, 2022

Conversation

gthao313
Copy link
Member

@gthao313 gthao313 commented Jun 21, 2022

Issue number:
#14

Description of changes:

Author: Tianhao Geng <tianhg@amazon.com>
Date:   Tue Jun 21 03:03:50 2022 +0000

    Make the controller a high-priority

    As part of Protect controller from becoming unschedulable, this approach
    means to give controller a high priority so that controller may be scheduled
    sooner than Pods with lower priority if its scheduling requirements are met.

commit c7716c2109894f41c2b2ab960c5cfcd8c1f5b489
Author: Tianhao Geng <tianhg@amazon.com>
Date:   Tue Jun 21 03:00:36 2022 +0000

    Update the node which hosts controller last

    As part of Protect controller from becoming unschedulable, this part can
    make brupop Update the node which hosts controller last. Therefore, this
    approach can reduce the effect when controller pods is dead as much as possible.

commit 7eff6e627247d58c574bf1c105a065d43a2bebe3
Author: Tianhao Geng <tianhg@amazon.com>
Date:   Mon Jun 20 21:48:47 2022 +0000

    redesign the responsibilities of controller and agent

    As part of prevent controller from becoming unschedualable, we redesign
    the responsibilities of controler and agent. Basically new design makes
    agent be able to complete a node update even lack controller's signal.

Testing done:

  • Test controller state machine
  • Test if the node which hosts controller pod update last
  • Make sure controller pod has desired priority value.

The node which hosts controller pod is ip-192-168-131-7.us-west-2.compute.internal The priority class value is1000000`

Name:                 brupop-controller-deployment-8484df4fb4-vcjxb
Namespace:            brupop-bottlerocket-aws
Priority:             100000
Priority Class Name:  brupop-controller-high-priority
Node:                  ip-192-168-131-7.us-west-2.compute.internal/192.168.129.76
[kubectl -n brupop-bottlerocket-aws get brs
NAME                                                STATE   VERSION   TARGET STATE               TARGET VERSION   CRASH COUNT
brs-ip-192-168-131-7.us-west-2.compute.internal     Idle    1.6.1     Idle                       <no value>       0
brs-ip-192-168-134-90.us-west-2.compute.internal    Idle    1.6.1     Idle                       <no value>       0
brs-ip-192-168-137-157.us-west-2.compute.internal   Idle    1.6.1     StagedAndPerformedUpdate   1.8.0            0

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

@gthao313 gthao313 changed the title Controller update Protect controller from becoming unschedulable Jun 21, 2022
@gthao313 gthao313 force-pushed the controller-update branch 4 times, most recently from ce6500d to 1310320 Compare June 24, 2022 23:21
@gthao313 gthao313 marked this pull request as ready for review June 24, 2022 23:25
@gthao313 gthao313 force-pushed the controller-update branch 3 times, most recently from a116aea to f601ceb Compare June 27, 2022 22:31
agent/src/agentclient.rs Outdated Show resolved Hide resolved
agent/src/agentclient.rs Show resolved Hide resolved
@@ -47,3 +47,6 @@ pub const CONTROLLER_DEPLOYMENT_NAME: &str = "brupop-controller-deployment";
pub const CONTROLLER_SERVICE_NAME: &str = "brupop-controller-server"; // The name for the `svc` fronting the controller.
pub const CONTROLLER_INTERNAL_PORT: i32 = 8080; // The internal port on which the the controller service is hosted.
pub const CONTROLLER_SERVICE_PORT: i32 = 80; // The k8s service port hosting the controller service.
pub const BRUPOP_CONTROLLER_PRIORITY_CLASS: &str = "brupop-controller-high-priority";
pub const BRUPOP_CONTROLLER_PREEMPTION_POLICY: &str = "Never";
pub const BRUPOP_CONTROLLER_PRIORITY_VALUE: i32 = 1000000; // The controller priority class value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this priority value determined?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value can be set between 0 to 1 billion. The reasons are:

  1. I assume controller pod is needed to come back soon than other pods except critical pods like system related. I went through priorityclass document, and they usually set 1 million to present a high priority value but not critical.
  2. We don't know what priority strategies the customers are using to set up their k8s resources. I don't want to make controller take precedence over customers' critical k8s resources. After looking a lot priorityclass github examples, they usually set their high priority and critical values larger than 10 million. So for more safety I downgrade the value to 1 million here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to elaborate a bit on the constant in a comment.


shadows.sort_by(|a, b| a.compare_crash_count(b));
for brs in shadows {
sort_shadows(shadows.clone(), &get_associated_bottlerocketshadow_name()?)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't actually sort the vector, since we are cloning it as input rather than using a mutable reference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I remember I changed it to let sorted_shadows = sort_shadows(shadows.clone(), &get_associated_bottlerocketshadow_name()?)?;. Maybe I was doing wrong with rebase. Thanks I'll fix it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be more idiomatic here to use a mutable reference as input to the function. You can just return an empty Result<()> to indicate whether something or not went wrong.

shadows.sort_by(|a, b| a.compare_crash_count(b));

// move the shadow which associated bottlerocketshadow node hosts controller pod to the last
for (position, brs) in shadows.iter().enumerate() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Consider using shadows.iter().position(). This would also allow you to use an Option rather than having to use usize::MAX as a sentinel value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, i just remembered why I didn't use shadows.iter().position(). When I was trying to do like this, the position function showed it was unable to handle result value as input. Actually the position function can't handle result which is brs.metadata.name.as_ref().context(error::MissingBottlerocketShadowName)?

let associated_brs_node_position = shadows.iter().position(|brs| brs.metadata.name.as_ref().context(error::MissingBottlerocketShadowName)? == associated_brs_name).context("")?

Copy link
Contributor

@cbgbt cbgbt Jul 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you have kube::ResourceExt imported, you could just do brs.name() which technically panics if the name is missing. Not great, but there are likely bigger problems if the name is missing from objects.

Alternatively, if the name is missing you can probably skip the item rather than returning early:

let associated_brs_node_position = shadows
    .iter()
    .position(|brs| brs.metadata.name.as_ref() == Some(associated_brs_name))

I think it's best to lean on Option which the compiler can check rather than using a sentinel value, even though we can logically deduce that your sentinel would work.

controller/src/controller.rs Outdated Show resolved Hide resolved
@gthao313 gthao313 requested review from cbgbt and rpkelly July 6, 2022 02:13
@gthao313 gthao313 force-pushed the controller-update branch 2 times, most recently from f26ec8d to ba26dfd Compare July 12, 2022 19:50
shadows.sort_by(|a, b| a.compare_crash_count(b));
for brs in shadows {
sort_shadows(&mut shadows, &get_associated_bottlerocketshadow_name()?)?;
for brs in shadows.drain(..) {
Copy link
Member Author

@gthao313 gthao313 Jul 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update to use drain

/// logic1: sort shadows by crash count
/// logic2: move the shadow which associated bottlerocketshadow node hosts controller pod to the last
#[instrument(skip())]
fn sort_shadows(shadows: &mut Vec<BottlerocketShadow>, associated_brs_name: &str) -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think this function cannot return an error value. You can probably remove the return type.

As part of prevent controller from becoming unschedualable, we redesign
the responsibilities of controler and agent. Basically new design makes
agent be able to complete a node update even lack controller's signal.
As part of Protect controller from becoming unschedulable, this part can
make brupop Update the node which hosts controller last. Therefore, this
approach can reduce the effect when controller pods is dead as much as possible.
As part of Protect controller from becoming unschedulable, this approach
means to give controller a high priority so that controller may be scheduled
sooner than Pods with lower priority if its scheduling requirements are met.
@@ -37,4 +37,7 @@ pub enum Error {

#[snafu(display("Controller failed due to internal assertion issue: '{}'", source))]
Assertion { source: serde_plain::Error },

#[snafu(display("Unable to get host controller pod node name: {}", source))]
GetNodeName { source: std::env::VarError },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note here this GetNodeName didn't show up in the adjust error module, you might want to reimplement this error when you merge two PRs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah! I would like to merge this PR first and then solve the conflict on the PR you mentioned above. Thanks!

@gthao313 gthao313 merged commit c6f1838 into bottlerocket-os:develop Jul 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants