Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K8SSAND-995 ⁃ Make the management-api heap size configurable #212

Closed
jsanda opened this issue Oct 20, 2021 · 6 comments · Fixed by #224
Closed

K8SSAND-995 ⁃ Make the management-api heap size configurable #212

jsanda opened this issue Oct 20, 2021 · 6 comments · Fixed by #224
Assignees
Labels
enhancement New feature or request

Comments

@jsanda
Copy link
Contributor

jsanda commented Oct 20, 2021

What is missing?
The heap size for the management-api had been hard coded to either 128M or 256M. Changes were made to make it configurable via an env var. The default was reduced to 16M which helped with the resource challenges in e2e tests. The default setting caused some OutOfMemoryErrors in non-test environments. The default has been reverted back to 128M.

I was doing some manual testing and having problems with my Cassandra pods becoming unstable. Then I remembered about the heap size change. After increasing the memory limits things became stable again in my cluster.

Why do we need it?
We need to make the management-api's heap size configurable so that we can reduce the memory requirements of the cassandra container in dev/test enviromments. This may be generally useful as well in non-test environments as folks look to tune things.

Goals
The expectation is that properties are added to the K8ssandraCluster spec for configuring the heap size. The fact that the heap size is actually configured via env vars on the cassandra container should be considered an implementation detail with which the user need not be concerned.

To be consistent with the approach taken thus far it probably makes sense to make the heap size configurable both at the cluster and dc levels. This means properties should be added in both CassandraClusterTemplate and in CassandraDatacenterTemplate with the latter taking precedence over the former.

We can implement this without any changes in cass-operator by configuring the heap size via the PodTemplateSpec property. My preference however would be to first implement the above changes in cass-operator and then have k8ssandra-operator simply set the new property in the CassandraDatacenterSpec.

  • K8ssandra Operator version:

v1.0.0-alpha.1

┆Issue is synchronized with this Jira Task by Unito
┆epic: Cassandra Configuration
┆fixVersions: k8ssandra-operator-v1.0.0
┆friendlyId: K8SSAND-995
┆priority: Medium

@jsanda jsanda added the enhancement New feature or request label Oct 20, 2021
@sync-by-unito
Copy link

sync-by-unito bot commented Nov 1, 2021

➤ Jeff DiNoto commented:

Notes

  • Will need to make this configurable within cass-operator first
  • Then move to expose within k8ssnandra-operator
  • Question: Is this also an issue with 1.x? Do we need an additional issue to cover this within the k8ssandra charts?

@sync-by-unito sync-by-unito bot changed the title Make the management-api heap size configurable K8SSAND-995 ⁃ Make the management-api heap size configurable Nov 3, 2021
@Miles-Garnsey
Copy link
Member

Miles-Garnsey commented Nov 18, 2021

cass-operator

This was easier to solve than I expected. Preliminary thoughts on the cass-operator side of things:

  • I can see that the heap size is set via an environment variable in the Cassandra/management API container which is then picked up by docker-entrypoint.sh at bootstrap.
  • My personal feeling is that this can be easily set via the below fragment of CassandraDatacenter. This actually seems like the most rational way to do it, considering that this setting will normally be configured only as a part of testing/advanced tuning. I'd definitely be hesitant to add this to the CRD.
  • The configuration might be better documented, although honestly if folks are going hunting for the mgmt API heap size specifically, the container entrypoint is the logical place to find it - and everything is quite clear in there.

Is there any reason why this is an inappropriate solution?

apiVersion: cassandra.datastax.com/v1beta1
kind: CassandraDatacenter
metadata:
  name: dc1
  namespace: cass-operator
spec:
  podTemplateSpec:
    spec:
      containers:
        - name: "cassandra"
          imagePullPolicy: IfNotPresent
          env:
          - name: MGMT_API_HEAP_SIZE
            value: 128m

@jsanda
Copy link
Contributor Author

jsanda commented Nov 18, 2021

Configuring via env vars is what I had in mind :)

@Miles-Garnsey
Copy link
Member

Miles-Garnsey commented Nov 19, 2021

K8ssandra operator

If we can configure the env var via the podTemplateSpec in cass-operator, we need to figure out how to pass that configuration down to the CassandraDatacenter resource from k8ssandra-operator.

I can see here that we have a CassandraDatacenterTemplate, but the problem is that this actually isn't actually a template for a CassandraDatacenter resource in the way that a podTemplateSpec is a template for a podSpec within a statefulset.

I'd do the following to clarify the schema and allow for better composition of the objects:

  1. Rename CassandraDatacenterTemplate k8ssandraComponentConfig but otherwise leave it the same in the short term.
  2. To the new k8ssandraComponentConfig add a CassandraDatacenterTemplateSpec (or CassDCTemplateSpec for brevity) which will then have a nested podTemplateSpec to which we can add the required environment variables.
  3. Review the CassandraDatacenterTemplate schema to make sure we are using reasonable merge keys where required.
  4. Add logic to strategic merge patch our base configuration for the CassDC with that defined in the CassDCTemplateSpec provided by the user.
  5. In the webhook, use deep comparisons between the user provided CassDCTemplateSpec and our required values (e.g., if we need to control additionalSeeds or something) to ensure that the user's configuration can't break our functionality.

I appreciate that the above proposal risks hitting known issues with kubebuilder. I am hoping that by turning off descriptions for nested objects we can avoid these, or that we can find another workaround if that doesn't work (perhaps SSA can play a role).

If all else fails, I suggest that we either

  1. Allow a user to define a CassDCTemplateSpec via a configMap, and add a reference to that configMap into the k8ssandraDatacenterTemplate.
  2. Define a new CR CassDCTemplate and allow the user to provide a reference to an existing CassDCTemplate to the k8ssandraDatacenterTemplate.

Option 2 is a more permanent solution, if we anticipate that issues around max CRD size will be persistent once SSA changes have settled.

@jsanda
Copy link
Contributor Author

jsanda commented Nov 19, 2021

I went ahead and updated the description. After rereading it I realized it was lacking some detail.

As we discussed in k8ssandra/k8ssandra#1143 I am not in favor of exposing the PodTemplateSpec. We made the conscious decision to not expose it in k8ssandra 1.x, and we are following the same approach for k8ssandra-operator. With this in mind, the CassandraDatacenterTemplate type isn't necessarily intended to be a full template for a CassandraDatacenterSpec. It might be eventually, and we are definitely going to add more properties to it; however, the motivation for adding more properties won't be to simply fully expose the underlying CassandraDatacenterSpec. In fact, there are some properties I really don't want to expose at all such as,

  • RollingRestartRequested - Doesn't belong in the spec
  • DseWorkloads - DSE is irrelevant
  • ForceUpgradeRacks - Doesn't belong in the spec

Take a look at how the heap is configured for Stargate. We should be able to do something very similar for the management-api.

I think it would be ideal to first expose the property in cass-operator as it might be generally beneficial to users. If we go ahead and implement it through the PodTemplateSpec property, it will be much, much less obvious to users.

@Miles-Garnsey
Copy link
Member

Miles-Garnsey commented Nov 19, 2021

In fact, there are some properties I really don't want to expose at all such as, ...

I very much agree that all of these things are evil and should not be in the CassandraDatacenter API. However, my approach would be to remove them from the CassandraDatacenter rather than creating some wholly new thing which tries to sweep deficiencies in CassandraDatacenter under the rug. I am happy to implement this work if someone raises an epic for it.

I am concerned that creating a new struct to configure environment variables in pods from within the K8ssandraClusterSpec breaks encapsulation twice over. First when you break through the CassandraDatacenter interface, and again when you break through podTemplateSpec interface.

I believe this will lead to poor maintainability and a poor user experience.

I think it would be ideal to first expose the property in cass-operator as it might be generally beneficial to users.

This appears to contradict your earlier comment?

(NB: it should go without saying that k8ssandra 1.x was a Helm chart and k8ssandra 2.x is an operator, and that any decisions which were taken in the former should have no bearing upon design approaches in the latter.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants