Skip to content
This repository has been archived by the owner on Apr 4, 2023. It is now read-only.

Pilot executes cassandra directly, bypassing the container image entry point #347

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@
.generate_exes
.get_deps
bin/
**/.test/
90 changes: 87 additions & 3 deletions Gopkg.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

17 changes: 14 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@ CMDS := controller apiserver pilot-elasticsearch pilot-cassandra
GOPATH ?= /tmp/go
GOFLAGS ?= "-a"

# Path of the Navigator Cassandra Pilot for integration tests
TEST_ASSET_NAVIGATOR_PILOT_CASSANDRA ?= "${CURDIR}/navigator-pilot-cassandra_linux_amd64"

help:
# all - runs verify, build and docker_build targets
# test - runs go_test target
Expand All @@ -32,7 +35,7 @@ help:

all: verify build docker_build

test: go_test
test: go_test test_integration

.run_e2e:
${HACK_DIR}/prepare-e2e.sh
Expand All @@ -44,7 +47,7 @@ build: $(CMDS)

generate: .generate_files

verify: .hack_verify dep_verify go_verify helm_verify
verify: .hack_verify dep_verify go_verify helm_verify test_integration

.hack_verify:
@echo Running repo-infra verify scripts
Expand Down Expand Up @@ -90,7 +93,7 @@ $(CMDS):
go_build: $(CMDS)

go_test:
go test -v $$(go list ./... | grep -v '/vendor/')
go test -v $$(go list ./... | grep -v -e '/vendor/' -e 'github.com/jetstack/navigator/test/')

go_fmt:
./hack/verify-lint.sh
Expand All @@ -105,3 +108,11 @@ go_fmt:
# Helm targets
helm_verify:
helm lint contrib/charts/*

.download_integration_test_binaries:
$(HACK_DIR)/download-integration-test-binaries.sh
touch .download_integration_test_binaries

test_integration: .download_integration_test_binaries apiserver pilot-cassandra
TEST_ASSET_NAVIGATOR_PILOT_CASSANDRA=$(TEST_ASSET_NAVIGATOR_PILOT_CASSANDRA) \
go test $(GO_TEST_ARGS) -v ./test/integration/...
33 changes: 33 additions & 0 deletions docs/cassandra.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ All the C* nodes (pods) in a ``nodepool`` have the same configuration and the fo

.. include:: configure-scheduler.rst

.. _availability-zones-cassandra:

Cassandra Across Multiple Availability Zones
--------------------------------------------

Expand Down Expand Up @@ -240,6 +242,37 @@ Navigator will add C* nodes, one at a time, until the desired number of nodes is

You can look at ``CassandraCluster.Status.NodePools[<nodepoolname>].ReadyReplicas`` to see the current number of healthy C* nodes in each ``nodepool``.

Pilots and Cassandra Docker Images
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

By default, Navigator will use the `Cassandra Docker images from DockerHub <https://hub.docker.com/_/cassandra/>`_.
It will use an image with a tag matching the supplied ``CassandraCluster.Spec.Version`` field.
If you prefer to use your own container image you should configure the ``CassandraCluster.Spec.Image`` fields.

Navigator installs a ``navigator-pilot-cassandra`` executable into each Pod at the path ``/pilot``.
This ``pilot`` process connects to the API server to:
get extra configuration settings,
report the status of this C* node, and to
perform leader election of a single pilot in the cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) Bulleted list 😄


The ``pilot`` overrides the following keys in the default ``/etc/cassandra/cassandra.yaml`` file:

* ``cluster_name``: This will be set to match the name of the corresponding ``CassandraCluster`` resource in the API server.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought: is this safe to do, given two clusters may have the same name (in two different namespaces)

* ``listen_address`` / ``listen_interface`` / ``broadcast_address`` / ``rpc_address`` / ``broadcast_rpc_address``: These keys will be set to ``null``.
This ensures that Cassandra process listens and communicates using the IP address currently associated with the fully qualified domain name of the Pod.
This is important if the Pod is moved to another node and is assigned a different IP address.
Removing these settings from the Configuration file ensures that Cassandra uses the most recent IP address that Kubernetes has assigned to the Pod and that other C* nodes in the cluster are notified of the change of IP address.
* ``seed_provider``: This is set to ``io.jetstack.cassandra.KubernetesSeedProvider`` which allows Cassandra to look up the seed IP addresses from a Kubernetes service.
The ``seed_provider.*.seeds`` sub key will be emptied.
This is to avoid the risk of nodes mistakenly joining the cluster as seeds if the seed provider service is temporarily unavailable.

The ``pilot`` also overwrites ``cassandra-rackdc.properties`` with values derived from the ``CassandraCluster.Spec.Nodepools`` (see :ref:`availability-zones-cassandra`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nodepools -> NodePools


Finally the ``pilot`` executes ``/usr/sbin/cassandra`` directly.

.. note::
The default entry point (e.g. `/docker-entrypoint.sh <https://github.com/docker-library/cassandra/blob/master/3.11/docker-entrypoint.sh>`_ is ignored.

Supported Versions
------------------

Expand Down
2 changes: 2 additions & 0 deletions docs/quick-start/cassandra-cluster.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,5 @@ spec:
pilotImage:
repository: "quay.io/jetstack/navigator-pilot-cassandra"
tag: "v0.1.0"
securityContext:
runAsUser: 999
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per discussion earlier - what happens if this field is not specified on a Cassandra resource?

If we are not careful, this PR could become a breaking change as older versions of Navigator handled setting the uid automatically (i.e. via the entrypoint script) before forking to the cassandra subprocess.

36 changes: 36 additions & 0 deletions hack/download-integration-test-binaries.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
#!/bin/bash
#
# Download the binaries needed by the sigs.k8s.io/testing_frameworks/integration package.
# etcd, kube-apiserver, kubectl
# XXX: There is already a script to do this:
# * sigs.k8s.io/testing_frameworks/integration/scripts/download-binaries.sh
# But it currently downloads kube-apiserver v1.10.0-alpha.1 which doesn't support ``CustomResourceSubresources``.
# See https://github.com/kubernetes-sigs/testing_frameworks/issues/44

set -o errexit
set -o nounset
set -o pipefail
set -o xtrace

# Close stdin
exec 0<&-

ROOT_DIR="$(git rev-parse --show-toplevel)"

ETCD_VERSION=v3.2.10
ETCD_URL="https://storage.googleapis.com/etcd"

KUBE_VERSION_URL="https://storage.googleapis.com/kubernetes-release/release/stable-1.10.txt"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we pin this to a specific patch release of 1.10? Or otherwise, clearly print out the exact version returned here in the test logs, so we can debug failures caused by patch version changes.

KUBE_VERSION=$(curl --fail --silent "${KUBE_VERSION_URL}")
KUBE_BIN_URL="https://storage.googleapis.com/kubernetes-release/release/${KUBE_VERSION}/bin/linux/amd64"

ASSETS_DIR="${ROOT_DIR}/vendor/sigs.k8s.io/testing_frameworks/integration/assets/bin"

mkdir -p "${ASSETS_DIR}"

curl --fail --silent ${ETCD_URL}/${ETCD_VERSION}/etcd-${ETCD_VERSION}-linux-amd64.tar.gz | \
tar --extract --gzip --directory="${ASSETS_DIR}" --strip-components=1 --wildcards '*/etcd'
curl --fail --silent --output "${ASSETS_DIR}/kube-apiserver" "${KUBE_BIN_URL}/kube-apiserver"
curl --fail --silent --output "${ASSETS_DIR}/kubectl" "${KUBE_BIN_URL}/kubectl"

chmod +x ${ASSETS_DIR}/*
5 changes: 3 additions & 2 deletions hack/libe2e.sh
Original file line number Diff line number Diff line change
Expand Up @@ -131,11 +131,12 @@ function signal_cassandra_process() {
local pod="${2}"
local signal="${3}"

# Send STOP signal to all the cassandra user's processes
# Send a signal to the Cassandra Daemon process.
kubectl \
--namespace="${namespace}" \
exec "${pod}" -- \
bash -c "kill -${signal}"' -- $(ps -u cassandra -o pid=) && ps faux'
bash -c \
"kill -${signal}"' -- $(ps -o pid,cmd= | awk "/org.apache.cassandra.service.CassandraDaemon$/{print \$1}")'
}

function simulate_unresponsive_cassandra_process() {
Expand Down
2 changes: 2 additions & 0 deletions hack/testdata/cass-cluster-test.template.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,5 @@ spec:
repository: "${NAVIGATOR_IMAGE_REPOSITORY}/navigator-pilot-cassandra"
tag: "${NAVIGATOR_IMAGE_TAG}"
pullPolicy: "${NAVIGATOR_IMAGE_PULLPOLICY}"
securityContext:
runAsUser: 999
53 changes: 53 additions & 0 deletions internal/test/util/testfs/testfs.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
package testfs

import (
"fmt"
"io/ioutil"
"os"
"path"
"path/filepath"
"testing"

"github.com/stretchr/testify/require"
)

type TestFs struct {
t *testing.T
d string
}

func New(t *testing.T) *TestFs {
d := fmt.Sprintf(".test/%s", t.Name())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for specifying a custom base, instead of relying on ioutil to pick the correct OS-specific tmp dir?

d, err := filepath.Abs(d)
require.NoError(t, err)
err = os.RemoveAll(d)
if err != nil && !os.IsNotExist(err) {
t.Fatalf("Error while removing old test directory: %s", err)
}

err = os.MkdirAll(d, os.ModePerm)
require.NoError(t, err)

return &TestFs{
t: t,
d: d,
}
}

func (tfs *TestFs) TempPath(name string) string {
outPath := path.Join(tfs.d, name)
tmpFile, err := ioutil.TempFile(tfs.d, name)
require.NoError(tfs.t, err)
err = tmpFile.Close()
require.NoError(tfs.t, err)
err = os.Rename(tmpFile.Name(), outPath)
require.NoError(tfs.t, err)
return outPath
}

func (tfs *TestFs) TempDir(name string) string {
outPath := path.Join(tfs.d, name)
err := os.MkdirAll(outPath, os.ModePerm)
require.NoError(tfs.t, err)
return outPath
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reasoning for this package? It seems to wrap ioutil, but with some extras in there?
It'd be great if we can just use ioutil for this, to save having additional code to maintain.

https://golang.org/pkg/io/ioutil/#TempDir

Loading