K8SSAND-1460 ⁃ Cluster- and dc-level templates should be deep-merged #525

adutra · 2022-04-15T08:41:59Z

What happened?

Consider the following K8ssandraCluster:

apiVersion: k8ssandra.io/v1alpha1
kind: K8ssandraCluster
metadata:
  name: test
spec:
  # Cluster level
  stargate:
    size: 1
    heapSize: 512Mi
  cassandra:
    datacenters:
      - metadata:
          name: dc1
        size: 1
        # DC level
        stargate:
          size: 1
          racks:
            # Rack level
            - name: rack1

If I deploy this CRD, the observed heap size for Stargate nodes is 256Mi (the default).

Did you expect to see something different?

I would expect the heap size of all Stargate nodes to be 512Mi.

This is because the Coalesce methods do not merge fields, they simply pick the first non-nil template. In the above example, the Rack template wins over the DC template and the cluster template, so it gets applied, and its heap size is 256Mi (the default).

Note that many e2e are wrongly conceived because of that. They try to set the heap size cluster-wide, yet it is not applied because there is a DC or Rack template somewhere.

How to reproduce it (as minimally and precisely as possible):

Use the above CRD.

Environment

K8ssandra Operator version: 1.0.0

┆Issue is synchronized with this Jira Task by Unito
┆friendlyId: K8SSAND-1460
┆priority: Medium

The text was updated successfully, but these errors were encountered:

adutra · 2022-04-18T14:43:58Z

After some investigation around available merging libraries, it turns out that there is only one that is widely used:

https://github.com/imdario/mergo

It's even used by Kubernetes itself. I suggest that we use it to merge DC and cluster templates. It has a few shortcomings though:

Doesn't merge maps of structs (claims this isn't possible – but I'm sure it is)
Doesn't merge slices – cannot emulate "strategic merge patch" behavior
Does in-place modifications (not a problem per se, but one needs to be careful not to create unwanted changes in CRD specs)

I think that even with the above limitations, this library will greatly simplify merging Stargate, Reaper, Cassandra templates, and Image resources.

Notes:

As an exercice, I created another merging library: https://github.com/adutra/goalesce – it does not have any of the above shortcomings, and has 100% test coverage. I suggest however that we refrain from using it in K8ssandra for now, but we can switch to it later.
We won't be able to use Mergo (or Goalesce) right now for Cassandra templates because the 2 types (cluster and DC) are disjoint. See CassandraDatacenterOptions set in one struct that is shared by the Cl… #508 for a fix.

jsanda · 2022-04-25T03:50:11Z

@adutra can you please describe the behavior you are proposing? It will be helpful to have it documented in the issue.

adutra · 2022-04-25T10:47:41Z

@jsanda with the changes in this PR we switch from an all-or-nothing kind of merge to a recursive merge for Stargate templates.

This is done with the Mergo library. The rule of thumb for Mergo is: zero values in the destination are overwritten by non-zero values from the source.

For maps, this means that keys are merged:

src := &StargateTemplate{
	NodeSelector: map[string]string{"k1": "v1", "k2": "v2"},
}
dest := &StargateTemplate{
	NodeSelector: map[string]string{"k2": "v2-bis", "k3": "v3"},
}
mergo.Merge(dest, src)
fmt.Printf("%v\n", dest.NodeSelector)
// map[k1:v1 k2:v2-bis k3:v3]

For structs, this means that fields are merged:

src := &StargateTemplate{
	ContainerImage: &images.Image{
		Registry:   "reg1",
		Repository: "repo1",
		Name:       "img1",
	},
}
dest := &StargateTemplate{
	ContainerImage: &images.Image{
		Name: "img2",
		Tag:  "latest",
	},
}
mergo.Merge(dest, src)
fmt.Printf("%v\n", dest.ContainerImage)
// reg1/repo1/img2:latest

For slices however, Mergo does not have the ability to merge them, so it's still all-or-nothing:

src := &StargateTemplate{
	Tolerations: []corev1.Toleration{{
		Key:      "k1",
		Operator: corev1.TolerationOpExists,
	}},
}
dest := &StargateTemplate{
	Tolerations: []corev1.Toleration{{
		Key:      "k2",
		Operator: corev1.TolerationOpEqual,
	}},
}
mergo.Merge(dest, src)
fmt.Printf("%+v\n", dest.Tolerations)
// [{Key:k2 Operator:Equal Value: Effect: TolerationSeconds:<nil>}]

As discussed previously with @Miles-Garnsey, we think that Mergo is a good choice to start with, but might not be enough because of certain limitations. I created another merge library: Goalesce – it doesn't have such limitations. We might want to explore it later on.

adutra · 2022-04-28T11:56:43Z

FYI @Miles-Garnsey and I are now considering Goalesce rather then Mergo.

The key issue is around the *bool type, which we would like to merge with 3-valued logic semantics.

Also, Goalesce has merge-by-key semantics for slices, which could be beneficial in a few places.

adutra · 2022-04-29T14:55:05Z

Capturing here some discussions that happened offline. The PR is in good shape and will hopefully get approved soon; but there are some areas of improvement that I suggest should be tackled in follow-up PRs:

Now that CassandraDatacenterOptions set in one struct that is shared by the Cl… #508 is merged, we could next look into changing the current Coalesce function in datacenter.go. The issue is that the function returns a different type from the inputs, DatacenterConfig. Currently neither Mergo nor Goalesce can do that. We'd need to embed a DatacenterOptions in the DatacenterConfig struct.
We could look into customizing the merge behavior of things like StargateTemplate.Tolerations for instance. I think we could use merge-by-key semantics (a.k.a strategic merge) to merge this slice. There may be other good candidates for that in the API.
We need to add proper documentation explaining the rules that are used when merging templates. It's probably best to do this once all the work around merging stuff is complete.

Miles-Garnsey · 2022-06-28T02:50:01Z

I'm just coming back to this now as I can see it has been set high priority. I'm finally coming up for air from the CDC work and need to resync with you @adutra. Tell me if my understanding is correct on two points:

We don't yet have consistent in-place/allocate-new behaviour for this function's outputs. The blocker here is that an allocate-new approach requires a deepcopy function, and we're blocked on that because we require the implementation of Floyd's algorithm to detect cycles.
One proposed solution is just to turn cycle detection off. This is fine for all of our use cases (we won't have cycles in our CRs) but does risk an infinite loop (and likely stack overflow) if misapplied.

I'm happy to work on a cycle detection algorithm here, but that may take a bit of time, and I want to talk about the way I'm proposing to do it.

Is there a third path here which implements a very basic counter to allow us to set maximum copy depth, set that relatively high, and return an error when the depth bound is breached? I feel like we'd want this even if we had cycle detection available, because deep structs might create some nasty performance hits.

adutra · 2022-06-28T07:24:05Z

The blocker here is that an allocate-new approach requires a deepcopy function, and we're blocked on that because we require the implementation of Floyd's algorithm to detect cycles.

@Miles-Garnsey we do need deepcopy, and I was working on it until end of May: adutra/goalesce#25.

Deepcopy is not related to cycle detection though, and is not blocked by it. We don't have proper cycle detection yet, but as you pointed out, we don't need it for K8ssandra.

I'd be more than happy if you could work on cycle detection. But I wonder if we shouldn't finish deepcopy first? As the work I was doing there changes a lot the project files.

Miles-Garnsey · 2022-06-29T04:55:38Z

But I wonder if we shouldn't finish deepcopy first?

I'm happy to pick up your work on that then. I'll start right away and see how much I can get through before something else pops up and takes my attention away.

adutra · 2022-11-25T08:47:37Z

Update: Goalesce now has a deep-copy utility and its deep-merge utility was rebuilt on top of it. Cycle detection is still clunky though, but we don't need that feature for the operator.

I am going to broaden the scope of this issue to address all cases where we merge a cluster-level and a dc-level template:

DatacenterOptions
StargateTemplate
TelemetrySpec

It will also address the special case of Image where a deep-merge can be used to merge image defaults with the actual image characteristics present in the CRD.

adutra self-assigned this Apr 15, 2022

sync-by-unito bot changed the title ~~Stargate templates should be coalesced field by field~~ K8SSAND-1460 ⁃ Stargate templates should be coalesced field by field Apr 15, 2022

adejanovski added the zh:In-Progress Issues in the ZenHub pipeline 'In-Progress' label Apr 15, 2022

adutra mentioned this issue Apr 18, 2022

CassandraDatacenterOptions set in one struct that is shared by the Cl… #508

Merged

5 tasks

adutra mentioned this issue Apr 18, 2022

Stargate templates should be coalesced field by field #526

Closed

5 tasks

jsanda added the bug Something isn't working label Apr 25, 2022

adejanovski added zh:Review Issues in the ZenHub pipeline 'Review' and removed zh:In-Progress Issues in the ZenHub pipeline 'In-Progress' labels Apr 29, 2022

adutra removed their assignment Jun 2, 2022

adejanovski added zh:In-Progress Issues in the ZenHub pipeline 'In-Progress' and removed zh:Review Issues in the ZenHub pipeline 'Review' labels Jun 2, 2022

adejanovski removed the zh:In-Progress Issues in the ZenHub pipeline 'In-Progress' label Sep 7, 2022

adejanovski added the in-progress Issues in the state 'in-progress' label Nov 17, 2022

adejanovski assigned adutra Nov 17, 2022

adejanovski added review Issues in the state 'review' in-progress Issues in the state 'in-progress' and removed in-progress Issues in the state 'in-progress' review Issues in the state 'review' labels Nov 22, 2022

adutra mentioned this issue Nov 25, 2022

Use a merge library to merge cluster- and dc-level specs #780

Merged

5 tasks

adutra changed the title ~~K8SSAND-1460 ⁃ Stargate templates should be coalesced field by field~~ K8SSAND-1460 ⁃ Cluster- and dc-level templates should be deep-merged Nov 25, 2022

adutra mentioned this issue Nov 25, 2022

K8SSAND-1821 ⁃ Ability to merge cluster- and dc- level configs #715

Closed

adutra closed this as completed in #780 Dec 1, 2022

adejanovski added done Issues in the state 'done' and removed in-progress Issues in the state 'in-progress' labels Dec 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

K8SSAND-1460 ⁃ Cluster- and dc-level templates should be deep-merged #525

K8SSAND-1460 ⁃ Cluster- and dc-level templates should be deep-merged #525

adutra commented Apr 15, 2022 •

edited by sync-by-unito bot

Loading

adutra commented Apr 18, 2022 •

edited

Loading

jsanda commented Apr 25, 2022

adutra commented Apr 25, 2022

adutra commented Apr 28, 2022

adutra commented Apr 29, 2022

Miles-Garnsey commented Jun 28, 2022 •

edited

Loading

adutra commented Jun 28, 2022

Miles-Garnsey commented Jun 29, 2022

adutra commented Nov 25, 2022

K8SSAND-1460 ⁃ Cluster- and dc-level templates should be deep-merged #525

K8SSAND-1460 ⁃ Cluster- and dc-level templates should be deep-merged #525

Comments

adutra commented Apr 15, 2022 • edited by sync-by-unito bot Loading

adutra commented Apr 18, 2022 • edited Loading

jsanda commented Apr 25, 2022

adutra commented Apr 25, 2022

adutra commented Apr 28, 2022

adutra commented Apr 29, 2022

Miles-Garnsey commented Jun 28, 2022 • edited Loading

adutra commented Jun 28, 2022

Miles-Garnsey commented Jun 29, 2022

adutra commented Nov 25, 2022

adutra commented Apr 15, 2022 •

edited by sync-by-unito bot

Loading

adutra commented Apr 18, 2022 •

edited

Loading

Miles-Garnsey commented Jun 28, 2022 •

edited

Loading