Allow configuring number of seed nodes per nodepool #264

kragniz · 2018-02-27T16:47:02Z

This adds a new field to cassandra nodepools, seeds, which controls the number of seed nodes in that nodepool. This defaults to 1, and cannot be greater than the number of replicas.

Allow configuring number of seed nodes per nodepool

kragniz · 2018-02-27T16:47:46Z

I'll add an e2e test for this once #258 is merged

kragniz · 2018-02-28T11:22:12Z

/retest

jetstack-ci-bot · 2018-02-28T22:22:37Z

@kragniz PR needs rebase

munnerz · 2018-03-01T10:21:38Z

hack/e2e.sh

-        <(envsubst \
-              '$NAVIGATOR_IMAGE_REPOSITORY:$NAVIGATOR_IMAGE_TAG:$NAVIGATOR_IMAGE_PULLPOLICY:$CASS_NAME:$CASS_REPLICAS:$CASS_CQL_PORT' \
-              < "${SCRIPT_DIR}/testdata/cass-cluster-test.template.yaml")
+    apply_cassandracluster


Should we fail the test if this fails? (like we do above)

munnerz · 2018-03-01T10:22:03Z

hack/e2e.sh

-        <(envsubst \
-              '$NAVIGATOR_IMAGE_REPOSITORY:$NAVIGATOR_IMAGE_TAG:$NAVIGATOR_IMAGE_PULLPOLICY:$CASS_NAME:$CASS_REPLICAS:$CASS_CQL_PORT' \
-              < "${SCRIPT_DIR}/testdata/cass-cluster-test.template.yaml")
+    apply_cassandracluster


Should we fail the test if this fails? (like we do above)

munnerz · 2018-03-01T10:23:02Z

hack/e2e.sh

@@ -325,6 +320,13 @@ function test_cassandracluster() {
        fail_test "Second cassandra node did not become ready"
    fi

+    seed_label=$(kubectl get pods --namespace "${namespace}" \
+                 cass-${CASS_NAME}-ringnodes-1 \
+                 -o jsonpath='{.metadata.labels.seed}')


(may not be something that's true because of this PR, but we should always be using namespaced labels e.g. navigator.jetstack.io/cassandra-seed=true)

👍 this needs rebasing when #270 lands, which makes the label navigator.jetstack.io/cassandra-seed

munnerz · 2018-03-01T10:25:24Z

pkg/apis/navigator/v1alpha1/defaults.go

@@ -7,3 +7,13 @@ import (
 func addDefaultingFuncs(scheme *runtime.Scheme) error {
 	return RegisterDefaults(scheme)
 }
+
+func SetDefaults_CassandraClusterSpec(spec *CassandraClusterSpec) {


Why default the cassandraclusterspec if we're only going to then iterate over each node pool in it anyway?

Surely this is better to be a SetDefaults_CassandraClusterNodePool function?

This leads onto my next question.. what if a user wants to ensure there are 0 seeds in a given node pool (as they have seeds in other node pools). Is this something we want to disallow?

Yep, it'd be cleaner with SetDefaults_CassandraClusterNodePool.

We need to ensure that there's more than 0 seeds across all nodepools. We could reduce the minimum to 0 in a particular nodepool if we find a use-case for it (I can't find a reason to do it)

Cool - I've no issue requiring minimum one seed per node pool for now. We can revisit it in future if needed like you say.

I've opened #272 so we remember

munnerz · 2018-03-01T10:26:27Z

pkg/apis/navigator/validation/cassandra.go

+
+	if np.Seeds < 0 {
+		allErrs = append(allErrs,
+			field.Invalid(fldPath.Child("seeds"), np.Seeds, "number of seeds must be 1 or greater"),


... must be greater than or equal to 1

munnerz · 2018-03-01T10:27:37Z

pkg/controllers/cassandra/nodepool/nodepool.go

+		}
+
+		// only label if the current label is incorrect
+		if pod.Labels["seed"] != desiredLabel {


yep, not something created by this PR, but seed isn't an acceptable label imo. difficult for a user to understand why it is there, and could be removed by accident.

wallrj

Thanks @kragniz

Looks good.

A few questions:

I wonder if administrators will know how many seeds they need.
Did you consider just hard coding it to 3 seeds per nodepool?
Or perhaps making it a seedsPerNodes attribute...which might allow the administrator to ask for e.g. 1 seed per 10 C* nodes?
And in that case, perhaps it could be a cluster wide setting rather than a per-nodepool setting?
Is there any problem with having different numbers of seeds per rack / DC?

In addition I left a few comments and questions below. Please answer or address those before merging.

wallrj · 2018-03-06T11:31:33Z

pkg/apis/navigator/v1alpha1/types.go

@@ -64,6 +64,11 @@ type CassandraClusterNodePool struct {
 	// in this nodepool. If this is not set, a default will be selected.
 	// +optional
 	Datacenter string `json:"datacenter"`
+
+	// Seeds specifies the number of seed nodes to alocate in this nodepool. By


typo "alocate"

wallrj · 2018-03-06T11:43:00Z

hack/e2e.sh

@@ -187,6 +187,15 @@ if [[ "test_elasticsearchcluster" = "${TEST_PREFIX}"* ]]; then
    kube_delete_namespace_and_wait "${ES_TEST_NS}"
 fi

+function apply_cassandracluster() {


❔ The other bash functions have parameters (of sorts) so it might be nice to be consistent. On the other hand it looks like we're moving away from bash based E2E tests to happy for you to leave this for now.

And it'd be nice-to-have a check that all the environment variables that are about to be substituted are actually set.

Let's fix these e2e things when we move to ginkgo

wallrj · 2018-03-06T11:47:49Z

hack/e2e.sh

+    if ! apply_cassandracluster
+    then
+        fail_test "Failed to apply cassandracluster"
+    fi


❔ personally, I wouldn't bother with the fail_test part here. I don't regard "applying the desired configuration" as an E2E test....just an implementation detail of the test....so I wouldn't bother adding a specific test error message.
Also, if this command fails, then all bets are off and we should probably just exit the test so that it's easy to debug the problem.
But if you prefer to leave as-is, until we land the gingko test branch, I'm happy with that.

wallrj · 2018-03-06T11:50:04Z

pkg/apis/navigator/v1alpha1/defaults.go

+	// default to 1 seed if not specified
+	if np.Seeds == 0 {
+		np.Seeds = 1
+	}


Doesn't the validation prevent seeds: 0 ?

wallrj · 2018-03-06T11:53:54Z

pkg/apis/navigator/validation/cassandra.go

+		allErrs = append(allErrs,
+			field.Invalid(fldPath.Child("seeds"), np.Seeds, "number of seeds must be greater than or equal to 1"),
+		)
+	}


The message says "must be greater than or equal to 1" but the test is np.Seeds < 0

wallrj · 2018-03-06T11:56:38Z

pkg/controllers/cassandra/seedlabeller/control.go

+		pod, err := c.pods.Pods(cluster.Namespace).Get(fmt.Sprintf("%s-%d", set.Name, i))
+		if err != nil {
+			glog.Warningf("Couldn't get stateful set pod: %v", err)
+			return nil


❔ I'm not sure about this. In other controllers we would return this error, but I can see why we'd want to continue and attempt to label other pods if one of the Gets fails.

Could we collect the errors and return an aggregate error at the end?

See k8s.io/apimachinery/pkg/util/errors NewAggregate

wallrj · 2018-03-06T11:58:07Z

pkg/controllers/cassandra/seedlabeller/control.go

+		set, err := c.statefulSetLister.StatefulSets(cluster.Namespace).Get(setName)
+		if err != nil {
+			glog.Warningf("Couldn't get stateful set: %v", err)
+			return nil


Maybe an aggregate error here too.

wallrj · 2018-03-06T12:00:02Z

pkg/controllers/cassandra/seedlabeller/control_test.go

+			navObjects:  []runtime.Object{cluster},
+			cluster:     cluster,
+			assertions: func(t *testing.T, state *controllers.State) {
+				CheckSeedLabel(pod2.Name, "", pod2.Namespace, t, state)


❓ Add a test for deleting label when the seeds value is reduced.

kragniz · 2018-03-09T10:48:22Z

I've changed Seeds to be an *int64, avoiding the awkward validation/defaulting for a value of 0

wallrj · 2018-03-13T09:40:08Z

That 1.7 E2E test failure is a bit weird.

W0309 11:02:52.659] ++ kubectl run in-cluster-cmd-855 --namespace=test-cassandra-1520593007-14636 --image=cassandra:latest --restart=Never --rm --stdin=true --attach=true --quiet -- /usr/bin/cqlsh cass-test-cql 9042 --debug '--execute=SELECT * FROM space1.testtable1'
W0309 11:02:59.367] Connection error: ('Unable to connect to any servers', {'10.0.0.127': error(None, "Tried connecting to [('10.0.0.127', 9042)]. Last error: timed out")})
W0309 11:02:59.907] + actual=
W0309 11:02:59.907] + grep --quiet testvalue1
W0309 11:02:59.909] + local exit_code=1
W0309 11:02:59.909] ++ date +%s
W0309 11:02:59.910] + local current_time=1520593379
W0309 11:02:59.910] + local remaining_time=205
W0309 11:02:59.911] + [[ 205 -le 0 ]]
W0309 11:02:59.911] + local sleep_time=10
W0309 11:02:59.911] + [[ 205 -lt 10 ]]
W0309 11:02:59.911] + sleep 10
W0309 11:03:09.912] + stdout_matches testvalue1 cql_connect test-cassandra-1520593007-14636 cass-test-cql 9042 --debug '--execute=SELECT * FROM space1.testtable1'
W0309 11:03:09.912] + local expected=testvalue1
W0309 11:03:09.912] + shift
W0309 11:03:09.912] + local actual
W0309 11:03:09.912] ++ cql_connect test-cassandra-1520593007-14636 cass-test-cql 9042 --debug '--execute=SELECT * FROM space1.testtable1'
W0309 11:03:09.912] ++ local namespace=test-cassandra-1520593007-14636
W0309 11:03:09.912] ++ shift
W0309 11:03:09.913] ++ in_cluster_command test-cassandra-1520593007-14636 cassandra:latest /usr/bin/cqlsh cass-test-cql 9042 --debug '--execute=SELECT * FROM space1.testtable1'
W0309 11:03:09.913] ++ local namespace=test-cassandra-1520593007-14636
W0309 11:03:09.913] ++ shift
W0309 11:03:09.913] ++ local image=cassandra:latest
W0309 11:03:09.913] ++ shift
W0309 11:03:09.913] ++ kubectl run in-cluster-cmd-18383 --namespace=test-cassandra-1520593007-14636 --image=cassandra:latest --restart=Never --rm --stdin=true --attach=true --quiet -- /usr/bin/cqlsh cass-test-cql 9042 --debug '--execute=SELECT * FROM space1.testtable1'
W0309 15:03:13.029] + actual=
W0309 15:03:13.030] + grep --quiet testvalue1
W0309 15:03:13.033] + local exit_code=1
W0309 15:03:13.034] ++ date +%s
W0309 15:03:13.035] + local current_time=1520607793
W0309 15:03:13.035] + local remaining_time=-14209
W0309 15:03:13.035] + [[ -14209 -le 0 ]]
W0309 15:03:13.035] + return 1
W0309 15:03:13.035] + fail_test 'Cassandra data was lost'
W0309 15:03:13.035] + FAILURE_COUNT=1
W0309 15:03:13.035] + echo 'TEST FAILURE: Cassandra data was lost'

The command kubectl run in-cluster-cmd-18383 --namespace=test-cassandra-1520593007-14636 --image=cassandra:latest --restart=Never --rm --stdin=true --attach=true --quiet -- /usr/bin/cqlsh cass-test-cql 9042 --debug '--execute=SELECT * FROM space1.testtable1' appears to have hung from 11:03:09 until 15:03:13

jetstack-ci-bot · 2018-03-13T21:42:26Z

@kragniz PR needs rebase

munnerz · 2018-03-21T15:15:19Z

pkg/util/util.go

@@ -11,3 +11,7 @@ func CalculateQuorum(num int32) int32 {
 	}
 	return (num / 2) + 1
 }
+
+func Int64Ptr(i int64) *int64 {


Can be removed since #296

jetstack-bot · 2018-03-27T16:48:43Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: wallrj

Assign the PR to them by writing /assign @wallrj in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jetstack-bot · 2018-03-27T17:17:25Z

@kragniz: The following tests failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
navigator-quick-verify	`3541a6d`	link	`/test verify`
navigator-e2e-v1-8	`3541a6d`	link	`/test e2e v1.8`
navigator-e2e-v1-7	`3541a6d`	link	`/test e2e v1.7`
navigator-e2e-v1-9	`3541a6d`	link	`/test e2e v1.9`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

munnerz · 2018-03-27T18:24:19Z

Should we merge this now, or wait until the actions stuff has merged and redesign this for that new structure? (targeting 0.2)

jetstack-bot · 2018-03-28T04:22:21Z

@kragniz: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jetstack-bot added the release-note label Feb 27, 2018

jetstack-bot requested review from munnerz and wallrj February 27, 2018 16:47

jetstack-bot added the size/M label Feb 27, 2018

kragniz force-pushed the configurable-seed-number branch from 5d712ff to 179ba9c Compare February 28, 2018 10:55

jetstack-bot added size/L and removed size/M labels Feb 28, 2018

jetstack-ci-bot added the needs-rebase label Feb 28, 2018

munnerz reviewed Mar 1, 2018

View reviewed changes

kragniz mentioned this pull request Mar 1, 2018

Allow 0 seeds in a cassandra nodePool #272

Open

kragniz force-pushed the configurable-seed-number branch 2 times, most recently from 2302e30 to f3210eb Compare March 5, 2018 16:12

jetstack-ci-bot removed the needs-rebase label Mar 5, 2018

kragniz force-pushed the configurable-seed-number branch 3 times, most recently from d2e1867 to 10b6cc6 Compare March 5, 2018 17:42

wallrj suggested changes Mar 6, 2018

View reviewed changes

kragniz force-pushed the configurable-seed-number branch from 10b6cc6 to 92fbfe0 Compare March 8, 2018 12:57

jetstack-ci-bot added the needs-rebase label Mar 13, 2018

munnerz reviewed Mar 21, 2018

View reviewed changes

kragniz added 4 commits March 27, 2018 17:17

Add seeds field

3e8861a

Run make generate

b51f489

int -> int64

27f1263

Add validation to number of seeds

0748d0c

kragniz added 8 commits March 27, 2018 17:20

Default number of seeds to 1

0689c07

Add e2e test for seeds=2

8f6d472

Move defaults to v1alpha1

1b0dc0a

Rebase on top of new seed controller

4a9bdea

Add tests

8503ea9

Make seeds field type *int64

507aae3

Fix typo

e13293f

Rebase

3541a6d

kragniz force-pushed the configurable-seed-number branch from 8988673 to 3541a6d Compare March 27, 2018 16:48

jetstack-bot removed the needs-rebase label Mar 27, 2018

munnerz added this to the v0.1 milestone Mar 27, 2018

munnerz assigned kragniz Mar 27, 2018

jetstack-bot added the needs-rebase label Mar 28, 2018

munnerz modified the milestones: v0.1, v0.2 Apr 3, 2018

wallrj removed this from the v0.2 milestone May 15, 2018

Allow configuring number of seed nodes per nodepool #264

Are you sure you want to change the base?

Allow configuring number of seed nodes per nodepool #264

Conversation

kragniz commented Feb 27, 2018

kragniz commented Feb 27, 2018

kragniz commented Feb 28, 2018

jetstack-ci-bot commented Feb 28, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kragniz Mar 1, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wallrj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kragniz commented Mar 9, 2018

wallrj commented Mar 13, 2018

jetstack-ci-bot commented Mar 13, 2018

Choose a reason for hiding this comment

jetstack-bot commented Mar 27, 2018

jetstack-bot commented Mar 27, 2018

munnerz commented Mar 27, 2018

jetstack-bot commented Mar 28, 2018

kragniz Mar 1, 2018 •

edited

Loading