Protect controller from becoming unschedulable #214

gthao313 · 2022-06-21T03:10:38Z

Issue number:
#14

Description of changes:

Author: Tianhao Geng <tianhg@amazon.com>
Date:   Tue Jun 21 03:03:50 2022 +0000

    Make the controller a high-priority

    As part of Protect controller from becoming unschedulable, this approach
    means to give controller a high priority so that controller may be scheduled
    sooner than Pods with lower priority if its scheduling requirements are met.

commit c7716c2109894f41c2b2ab960c5cfcd8c1f5b489
Author: Tianhao Geng <tianhg@amazon.com>
Date:   Tue Jun 21 03:00:36 2022 +0000

    Update the node which hosts controller last

    As part of Protect controller from becoming unschedulable, this part can
    make brupop Update the node which hosts controller last. Therefore, this
    approach can reduce the effect when controller pods is dead as much as possible.

commit 7eff6e627247d58c574bf1c105a065d43a2bebe3
Author: Tianhao Geng <tianhg@amazon.com>
Date:   Mon Jun 20 21:48:47 2022 +0000

    redesign the responsibilities of controller and agent

    As part of prevent controller from becoming unschedualable, we redesign
    the responsibilities of controler and agent. Basically new design makes
    agent be able to complete a node update even lack controller's signal.

Testing done:

Test controller state machine
Test if the node which hosts controller pod update last
Make sure controller pod has desired priority value.

The node which hosts controller pod is ip-192-168-131-7.us-west-2.compute.internal The priority class value is1000000`

Name:                 brupop-controller-deployment-8484df4fb4-vcjxb
Namespace:            brupop-bottlerocket-aws
Priority:             100000
Priority Class Name:  brupop-controller-high-priority
Node:                  ip-192-168-131-7.us-west-2.compute.internal/192.168.129.76

[kubectl -n brupop-bottlerocket-aws get brs
NAME                                                STATE   VERSION   TARGET STATE               TARGET VERSION   CRASH COUNT
brs-ip-192-168-131-7.us-west-2.compute.internal     Idle    1.6.1     Idle                       <no value>       0
brs-ip-192-168-134-90.us-west-2.compute.internal    Idle    1.6.1     Idle                       <no value>       0
brs-ip-192-168-137-157.us-west-2.compute.internal   Idle    1.6.1     StagedAndPerformedUpdate   1.8.0            0

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

agent/src/agentclient.rs

cbgbt · 2022-06-29T15:47:08Z

models/src/constants.rs

@@ -47,3 +47,6 @@ pub const CONTROLLER_DEPLOYMENT_NAME: &str = "brupop-controller-deployment";
 pub const CONTROLLER_SERVICE_NAME: &str = "brupop-controller-server"; // The name for the `svc` fronting the controller.
 pub const CONTROLLER_INTERNAL_PORT: i32 = 8080; // The internal port on which the the controller service is hosted.
 pub const CONTROLLER_SERVICE_PORT: i32 = 80; // The k8s service port hosting the controller service.
+pub const BRUPOP_CONTROLLER_PRIORITY_CLASS: &str = "brupop-controller-high-priority";
+pub const BRUPOP_CONTROLLER_PREEMPTION_POLICY: &str = "Never";
+pub const BRUPOP_CONTROLLER_PRIORITY_VALUE: i32 = 1000000; // The controller priority class value


How is this priority value determined?

The value can be set between 0 to 1 billion. The reasons are:

I assume controller pod is needed to come back soon than other pods except critical pods like system related. I went through priorityclass document, and they usually set 1 million to present a high priority value but not critical.

We don't know what priority strategies the customers are using to set up their k8s resources. I don't want to make controller take precedence over customers' critical k8s resources. After looking a lot priorityclass github examples, they usually set their high priority and critical values larger than 10 million. So for more safety I downgrade the value to 1 million here.

It might be good to elaborate a bit on the constant in a comment.

cbgbt · 2022-06-29T22:20:42Z

controller/src/controller.rs


-        shadows.sort_by(|a, b| a.compare_crash_count(b));
-        for brs in shadows {
+        sort_shadows(shadows.clone(), &get_associated_bottlerocketshadow_name()?)?;


This won't actually sort the vector, since we are cloning it as input rather than using a mutable reference.

Yeah, I remember I changed it to let sorted_shadows = sort_shadows(shadows.clone(), &get_associated_bottlerocketshadow_name()?)?;. Maybe I was doing wrong with rebase. Thanks I'll fix it.

I think it would be more idiomatic here to use a mutable reference as input to the function. You can just return an empty Result<()> to indicate whether something or not went wrong.

cbgbt · 2022-06-29T22:24:07Z

controller/src/controller.rs

+    shadows.sort_by(|a, b| a.compare_crash_count(b));
+
+    // move the shadow which associated bottlerocketshadow node hosts controller pod to the last
+    for (position, brs) in shadows.iter().enumerate() {


nit: Consider using shadows.iter().position(). This would also allow you to use an Option rather than having to use usize::MAX as a sentinel value.

Oh, i just remembered why I didn't use shadows.iter().position(). When I was trying to do like this, the position function showed it was unable to handle result value as input. Actually the position function can't handle result which is brs.metadata.name.as_ref().context(error::MissingBottlerocketShadowName)?

let associated_brs_node_position = shadows.iter().position(|brs| brs.metadata.name.as_ref().context(error::MissingBottlerocketShadowName)? == associated_brs_name).context("")?

Since you have kube::ResourceExt imported, you could just do brs.name() which technically panics if the name is missing. Not great, but there are likely bigger problems if the name is missing from objects.

Alternatively, if the name is missing you can probably skip the item rather than returning early:

let associated_brs_node_position = shadows .iter() .position(|brs| brs.metadata.name.as_ref() == Some(associated_brs_name))

I think it's best to lean on Option which the compiler can check rather than using a sentinel value, even though we can logically deduce that your sentinel would work.

controller/src/controller.rs

gthao313 · 2022-07-12T19:51:16Z

controller/src/controller.rs

-        shadows.sort_by(|a, b| a.compare_crash_count(b));
-        for brs in shadows {
+        sort_shadows(&mut shadows, &get_associated_bottlerocketshadow_name()?)?;
+        for brs in shadows.drain(..) {


Update to use drain

cbgbt · 2022-07-13T04:54:17Z

controller/src/controller.rs

+/// logic1: sort shadows by crash count
+/// logic2: move the shadow which associated bottlerocketshadow node hosts controller pod to the last
+#[instrument(skip())]
+fn sort_shadows(shadows: &mut Vec<BottlerocketShadow>, associated_brs_name: &str) -> Result<()> {


nit: I think this function cannot return an error value. You can probably remove the return type.

As part of prevent controller from becoming unschedualable, we redesign the responsibilities of controler and agent. Basically new design makes agent be able to complete a node update even lack controller's signal.

As part of Protect controller from becoming unschedulable, this part can make brupop Update the node which hosts controller last. Therefore, this approach can reduce the effect when controller pods is dead as much as possible.

As part of Protect controller from becoming unschedulable, this approach means to give controller a high priority so that controller may be scheduled sooner than Pods with lower priority if its scheduling requirements are met.

somnusfish · 2022-07-19T23:42:56Z

controller/src/error.rs

@@ -37,4 +37,7 @@ pub enum Error {

    #[snafu(display("Controller failed due to internal assertion issue: '{}'", source))]
    Assertion { source: serde_plain::Error },
+
+    #[snafu(display("Unable to get host controller pod node name: {}", source))]
+    GetNodeName { source: std::env::VarError },


Just a note here this GetNodeName didn't show up in the adjust error module, you might want to reimplement this error when you merge two PRs.

Yeah! I would like to merge this PR first and then solve the conflict on the PR you mentioned above. Thanks!

gthao313 force-pushed the controller-update branch from 215c3e2 to df4a57a Compare June 21, 2022 18:15

gthao313 changed the title ~~Controller update~~ Protect controller from becoming unschedulable Jun 21, 2022

gthao313 force-pushed the controller-update branch 4 times, most recently from ce6500d to 1310320 Compare June 24, 2022 23:21

gthao313 marked this pull request as ready for review June 24, 2022 23:25

gthao313 requested review from somnusfish, cbgbt and arnaldo2792 June 27, 2022 18:15

gthao313 force-pushed the controller-update branch 3 times, most recently from a116aea to f601ceb Compare June 27, 2022 22:31

cbgbt reviewed Jun 29, 2022

View reviewed changes

gthao313 force-pushed the controller-update branch from f601ceb to 5820a09 Compare July 6, 2022 02:12

gthao313 requested review from cbgbt and rpkelly July 6, 2022 02:13

gthao313 force-pushed the controller-update branch 2 times, most recently from f26ec8d to ba26dfd Compare July 12, 2022 19:50

gthao313 commented Jul 12, 2022

View reviewed changes

cbgbt approved these changes Jul 13, 2022

View reviewed changes

gthao313 added 3 commits July 13, 2022 19:00

redesign the responsibilities of controller and agent

c2e314a

As part of prevent controller from becoming unschedualable, we redesign the responsibilities of controler and agent. Basically new design makes agent be able to complete a node update even lack controller's signal.

update the node which hosts controller last

1fdba8a

As part of Protect controller from becoming unschedulable, this part can make brupop Update the node which hosts controller last. Therefore, this approach can reduce the effect when controller pods is dead as much as possible.

make the controller a high-priority

cb71fd3

As part of Protect controller from becoming unschedulable, this approach means to give controller a high priority so that controller may be scheduled sooner than Pods with lower priority if its scheduling requirements are met.

gthao313 force-pushed the controller-update branch from ba26dfd to cb71fd3 Compare July 13, 2022 21:34

cbgbt approved these changes Jul 14, 2022

View reviewed changes

somnusfish reviewed Jul 19, 2022

View reviewed changes

somnusfish approved these changes Jul 20, 2022

View reviewed changes

gthao313 merged commit c6f1838 into bottlerocket-os:develop Jul 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Protect controller from becoming unschedulable #214

Protect controller from becoming unschedulable #214

gthao313 commented Jun 21, 2022 •

edited

Loading

cbgbt Jun 29, 2022

gthao313 Jun 29, 2022

cbgbt Jul 11, 2022

cbgbt Jun 29, 2022

gthao313 Jun 29, 2022

cbgbt Jul 11, 2022

cbgbt Jun 29, 2022

gthao313 Jul 6, 2022

cbgbt Jul 11, 2022 •

edited

Loading

gthao313 Jul 12, 2022 •

edited

Loading

cbgbt Jul 13, 2022

somnusfish Jul 19, 2022

gthao313 Jul 19, 2022

Protect controller from becoming unschedulable #214

Protect controller from becoming unschedulable #214

Conversation

gthao313 commented Jun 21, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cbgbt Jul 11, 2022 • edited Loading

Choose a reason for hiding this comment

gthao313 Jul 12, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gthao313 commented Jun 21, 2022 •

edited

Loading

cbgbt Jul 11, 2022 •

edited

Loading

gthao313 Jul 12, 2022 •

edited

Loading