Skip to content
This repository has been archived by the owner on Apr 4, 2023. It is now read-only.

Pilot executes cassandra directly, bypassing the container image entry point #347

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

wallrj
Copy link
Member

@wallrj wallrj commented May 8, 2018

  • Cassandra Pilot now executes cassandra directly, bypassing the Docker image entrypoint
  • Added a configuration API with which to override configuration values in yaml and properties config files.
  • This replaces the experimental search / replace code for seed_provider and snitch.
  • This also the use of environment variables to configure the Cassandra node. Instead, those configuration options are now supplied as Pilot command line arguments and are written to the config files in a pre-start pilot hook.
  • Added an integration test so that i can see this mechanism working, without having to wait for e2e tests.

Fixes: #346, #251

Release note:

NONE

@jetstack-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: munnerz

Assign the PR to them by writing /assign @munnerz in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@wallrj wallrj force-pushed the 346-exec-cassandra branch 4 times, most recently from 144c3d2 to 381c7ea Compare June 6, 2018 09:31
@wallrj wallrj changed the title WIP: Pilot executes cassandra directly, bypassing the container image entry point Pilot executes cassandra directly, bypassing the container image entry point Jun 6, 2018
* to allow integration testing of pilot process without having to set up a
  leader election config map.
@wallrj wallrj force-pushed the 346-exec-cassandra branch 2 times, most recently from 8cfae12 to add5b6c Compare June 7, 2018 10:33
* An integration test to launch the Cassandra Pilot and ensure that it writes
  configuration files before launching the cassandra sub-process.
* Exec the pilot with leader-election disabled.
* A script to download suitable integration test binaries.
* A helper to create a .kube/config for the pilot to use.
* Remove the environment variables that were previously used to modify the
  configuration.
@jetstack-bot
Copy link
Collaborator

@wallrj: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@@ -179,5 +180,7 @@ func (o *Options) Pilot() (*GenericPilot, error) {
func (o *Options) AddFlags(flags *pflag.FlagSet) {
flags.StringVar(&o.PilotName, "pilot-name", "", "The name of this Pilot. If not specified, an auto-detected name will be used.")
flags.StringVar(&o.PilotNamespace, "pilot-namespace", "", "The namespace the corresponding Pilot resource for this Pilot exists within.")
flags.BoolVar(&o.LeaderElect, "leader-elect", true, ""+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty nervous exposing this option to users - it should be defined by the Pilot (e.g. elasticsearch-pilot etc) whether leader election needs to be performed.
Disabling this could cause fundamental breakages to how the Pilot operates, so I don't think it should be a user exposed option.

Is this because you don't use the leader election primitives in the cassandra pilot? If so, can you change this to only be configurable by the Pilot code (i.e. not a user facing option)?

err := os.MkdirAll(outPath, os.ModePerm)
require.NoError(tfs.t, err)
return outPath
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reasoning for this package? It seems to wrap ioutil, but with some extras in there?
It'd be great if we can just use ioutil for this, to save having additional code to maintain.

https://golang.org/pkg/io/ioutil/#TempDir

}

func New(t *testing.T) *TestFs {
d := fmt.Sprintf(".test/%s", t.Name())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for specifying a custom base, instead of relying on ioutil to pick the correct OS-specific tmp dir?

ETCD_VERSION=v3.2.10
ETCD_URL="https://storage.googleapis.com/etcd"

KUBE_VERSION_URL="https://storage.googleapis.com/kubernetes-release/release/stable-1.10.txt"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we pin this to a specific patch release of 1.10? Or otherwise, clearly print out the exact version returned here in the test logs, so we can debug failures caused by patch version changes.

)
t.Log("stdout", stdout)
t.Log("stderr", stderr)
require.NoError(t, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From here up to the start of the function should be in some shared function - this is code that is not specific to this test, or Cassandra.

Also, we should add some comments to this explaining why we are registering a CRD, and what is and isn't okay to test whilst using a CRD (because a lot of code for e.g. validation won't be run, we should clearly define bounds for what is and isn't okay to test). As we spoke about before, I think we should look to remove this code ASAP in favour of launching navigator-apiserver itself.

Also, given what we're trying to test here, would it be possible to implement this whole test as a unit test? I'm not sure why we need to bring up the entire process to verify that a config file is written ❓

This ``pilot`` process connects to the API server to:
get extra configuration settings,
report the status of this C* node, and to
perform leader election of a single pilot in the cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) Bulleted list 😄


The ``pilot`` overrides the following keys in the default ``/etc/cassandra/cassandra.yaml`` file:

* ``cluster_name``: This will be set to match the name of the corresponding ``CassandraCluster`` resource in the API server.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought: is this safe to do, given two clusters may have the same name (in two different namespaces)

The ``seed_provider.*.seeds`` sub key will be emptied.
This is to avoid the risk of nodes mistakenly joining the cluster as seeds if the seed provider service is temporarily unavailable.

The ``pilot`` also overwrites ``cassandra-rackdc.properties`` with values derived from the ``CassandraCluster.Spec.Nodepools`` (see :ref:`availability-zones-cassandra`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nodepools -> NodePools

name = "github.com/stretchr/testify"
packages = ["assert","require"]
revision = "12b6f73e6084dad08a7c6e575284b177ecafbc71"
version = "v1.2.1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come we need testify? We've not used it in any of our other tests so far

name = "github.com/fsnotify/fsnotify"
packages = ["."]
revision = "c2828203cd70a50dcccfb2761f8b1f8ceef9a8e9"
version = "v1.4.7"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea what introduces a need for fsnotify? 🤔 viper?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants