Skip to content
This repository has been archived by the owner on Apr 4, 2023. It is now read-only.

Add CreateNodePool and ScaleOut actions for Cassandra #256

Merged
merged 2 commits into from
Mar 27, 2018

Conversation

wallrj
Copy link
Member

@wallrj wallrj commented Feb 22, 2018

These become the only changes supported by the Cassandra controller until ScaleIn and CassandraUpgrade actions are implemented in followup branches.

Fixes: #253

Release note:

NONE

@wallrj wallrj changed the title WIP: Cassandra Actions Add CreateNodePool and ScaleOut actions for Cassandra Feb 26, 2018
@jetstack-ci-bot
Copy link
Contributor

@wallrj PR needs rebase

@jetstack-ci-bot
Copy link
Contributor

@wallrj PR needs rebase

wallrj added a commit to wallrj/navigator that referenced this pull request Feb 28, 2018
* In jetstack#256 I want to refactor nodepool.Sync so that its only responsibility is to update the NodePool status.
* And the seed labelling seems like it is a separate concern that can live in its own module.
* With its own unit tests.
wallrj added a commit to wallrj/navigator that referenced this pull request Feb 28, 2018
* In jetstack#256 I want to refactor nodepool.Sync so that its only responsibility is to update the NodePool status.
* And the seed labelling seems like it is a separate concern that can live in its own module.
* With its own unit tests.
wallrj added a commit to wallrj/navigator that referenced this pull request Feb 28, 2018
* In jetstack#256 I want to refactor nodepool.Sync so that its only responsibility is to update the NodePool status.
* And the seed labelling seems like it is a separate concern that can live in its own module.
* With its own unit tests.
wallrj added a commit to wallrj/navigator that referenced this pull request Feb 28, 2018
* In jetstack#256 I want to refactor nodepool.Sync so that its only responsibility is to update the NodePool status.
* And the seed labelling seems like it is a separate concern that can live in its own module.
* With its own unit tests.
wallrj added a commit to wallrj/navigator that referenced this pull request Feb 28, 2018
* In jetstack#256 I want to refactor nodepool.Sync so that its only responsibility is to update the NodePool status.
* And the seed labelling seems like it is a separate concern that can live in its own module.
* With its own unit tests.
jetstack-ci-bot added a commit that referenced this pull request Mar 1, 2018
Automatic merge from submit-queue.

Move seed pod labelling to a separate controller

* In #256 I want to refactor nodepool.Sync so that its only responsibility is to update the NodePool status.
* And the seed labelling seems like it is a separate concern that can live in its own module.
* With its own unit tests.

**Release note**:
```release-note
NONE
```
@wallrj
Copy link
Member Author

wallrj commented Mar 2, 2018

rebased

@wallrj
Copy link
Member Author

wallrj commented Mar 2, 2018

/retest

@wallrj wallrj mentioned this pull request Mar 6, 2018
Copy link
Contributor

@kragniz kragniz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally looks good, I'm happy to merge

@@ -44,7 +44,7 @@ type CassandraClusterSpec struct {
// CassandraClusterNodePool describes a node pool within a CassandraCluster.
type CassandraClusterNodePool struct {
Name string `json:"name"`
Replicas int64 `json:"replicas"`
Replicas int32 `json:"replicas"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these changed to int32s?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to match the type of StatefulSet.Status.ReadyReplicas.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing corresponding change in the internal api version

@munnerz munnerz added this to the v0.1 milestone Mar 13, 2018
@munnerz munnerz requested review from munnerz and kragniz and removed request for kragniz March 13, 2018 10:34
@munnerz munnerz removed their request for review March 13, 2018 10:34
if actual.Namespace != expected.Namespace {
t.Errorf("Namespace %q != %q", expected.Namespace, actual.Namespace)
}
if expected.Replicas != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if expected.Replicas == nil but actual.Spec.Replicas != nil?

@@ -118,3 +120,46 @@ func StatefulSet(c StatefulSetConfig) *apps.StatefulSet {
},
}
}

func AssertStatefulSetMatches(t *testing.T, expected StatefulSetConfig, actual *apps.StatefulSet) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we should put assertions into this package. generate.AssertStatefulSetMatches doesn't really 'flow' when read out loud.

Perhaps create internal/test/util/assert and call assert.StatefulSetMatches?

Also, given this is used for unit tests, can we just use reflect.DeepEqual instead? This method could easily return true for two very different statefulsets with its current implementation, which if reused could cause confusion (and also, if this function is extended in future to check more fields, it may cause tests to fail, thus making this not very reusable)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed that assertion and simplified the tests.

@@ -44,7 +44,7 @@ type CassandraClusterSpec struct {
// CassandraClusterNodePool describes a node pool within a CassandraCluster.
type CassandraClusterNodePool struct {
Name string `json:"name"`
Replicas int64 `json:"replicas"`
Replicas int32 `json:"replicas"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing corresponding change in the internal api version

ss := nodepool.StatefulSetForCluster(a.Cluster, a.NodePool)
_, err := s.Clientset.AppsV1beta1().StatefulSets(ss.Namespace).Create(ss)
if k8sErrors.IsAlreadyExists(err) {
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should return an error here always if err != nil (these actions shouldn't be idempotent, in favour of nextAction choosing what to execute next sensibly)

Copy link
Member Author

@wallrj wallrj Mar 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few good reasons for them to be idempotent:

  1. the CassandraCluster.Status.NodePools map (upon which this action is calculated) may not have been updated.
  2. future actions (e.g. ScaleIn) may perform a more complex sequence of steps which need to be "ratcheted" through, by repeated calls to ScaleIn.Execute, e.g.
  • Decommission node
  • Decrease the statefulset replica count.
    In which case each of those intermediate steps must be idempotent and in which case the final step may as well be idempotent.
  1. The lister used to calculate the action may not have the most up-to-date resources, so it appears there isn't a StatefulSet, until you attempt to create it, which fails with AlreadyExists (although I might be making that up, because the current implementation doesn't check the lister first)

On the whole, it makes sense to me that these actions should all be idempotent, but happy to discuss further.

if ss.Spec.Replicas == nil || *ss.Spec.Replicas < a.NodePool.Replicas {
ss.Spec.Replicas = &a.NodePool.Replicas
_, err = s.Clientset.AppsV1beta1().StatefulSets(ss.Namespace).Update(ss)
if err == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For convention with everywhere else, let's return err if it does not equal nil, and if not log the event and return nil.

i.e.

if err != nil {
    return err
}

<log event>
return nil

@@ -34,77 +36,65 @@ func NewControl(
}
}

func (e *defaultCassandraClusterNodepoolControl) removeUnusedStatefulSets(
func (e *defaultCassandraClusterNodepoolControl) clusterStatefulSets(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you collapse the method signature to a single line?

}
} else {
delete(cluster.Status.NodePools, np.Name)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you refactor out this else? (turn it into an if with continue after it). Reduces the amount of nesting going on here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole function has been simplified and merged in a separate PR #299

cluster.Status.NodePools[np.Name] = nps
}
if nps.ReadyReplicas != ss.Status.ReadyReplicas {
nps.ReadyReplicas = ss.Status.ReadyReplicas
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took a little while to spot that this is where we actually set the fields in the node pool status - perhaps create a function statusForNodePool to handle this? (which returns a NodePoolStatus structure)

// (the statefulset has been removed by a DeleteNodePool action - not yet implemented)
func (e *defaultCassandraClusterNodepoolControl) Sync(cluster *v1alpha1.CassandraCluster) error {
if cluster.Status.NodePools == nil {
cluster.Status.NodePools = map[string]v1alpha1.CassandraClusterNodePoolStatus{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% sure how maps are ordered when serialised - perhaps we shouldn't be using a map here at all in favour of a slice that has a name field?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This version of the function was just wrong.
Instead, in #299 I re-populate the CassandraCluster.Status.NodePools each time, based on the statefulsets that are owned by the cluster, and which have the expected name.

}
// 20% chance of ScaleOut
if rand.Intn(4) == 0 {
np.Replicas++
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will increase it by 1 ever result in it being particularly large? Should we not just set this to some random value output by the rng?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(maybe you are following some best practice I am not aware of though!)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the initial value is between 0-5 (see code above) so it should never be more than 6.
This is a way to simulate an occasional difference between the Spec.NodePool.Replicas and Status.NodePools.ReadyReplicas

@jetstack-ci-bot
Copy link
Contributor

@wallrj PR needs rebase

wallrj added a commit to wallrj/navigator that referenced this pull request Mar 22, 2018
…ned by the cluster

* Update E2E tests to check the CassandraCluster status rather than checking statefulset status.
* Issue an API UpdateStatus with the CassandraCluster.Status after running all control.Sync methods.
* Modify the RBAC role to allow the controller to update CassandraCluster.Status
* Add a statefulset informer callback to trigger a controller sync when our statefulsets change.

Extracted from: jetstack#256
Part of: jetstack#253
@wallrj
Copy link
Member Author

wallrj commented Mar 24, 2018

Noticed a weird E2E test error:

W0324 14:27:19.643] + kubectl --namespace=test-cassandra-1521900884-833 exec cass-test-ringnodes-0 -- bash -c 'kill -SIGSTOP -- $(ps -u cassandra -o pid=) && ps faux'
I0324 14:27:20.340] rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:262: starting container process caused "process_linux.go:86: adding pid 17816 to cgroups caused \"failed to write 17816 to cgroup.procs: write /sys/fs/cgroup/cpu,cpuacct/kubepods/poddc1388be-2f6e-11e8-8210-5254001464f2/f148ee0881cb5644368505fc324414543c01c6046f6e9850c52cc4f48ce72801/cgroup.procs: invalid argument\""

Maybe caused by opencontainers/runc#1326 or moby/moby#31230

@jetstack-bot
Copy link
Collaborator

jetstack-bot commented Mar 27, 2018

@wallrj: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
navigator-e2e-v1-10 414be89 link /test e2e v1.10

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

* These become the only changes supported by the Cassandra controller.
* ScaleIn and CassandraUpgrade actions will be implemented in followup branches.
* Some initial documentation on supported configuration changes.

Fixes: jetstack#253
@munnerz
Copy link
Contributor

munnerz commented Mar 27, 2018

/lgtm
/approve

@jetstack-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: munnerz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jetstack-bot jetstack-bot merged commit d0e60bd into jetstack:master Mar 27, 2018
@wallrj wallrj deleted the 286-cassandra-actions branch March 27, 2018 18:11
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants