Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K8SSAND-1460 ⁃ Cluster- and dc-level templates should be deep-merged #525

Closed
adutra opened this issue Apr 15, 2022 · 9 comments · Fixed by #780
Closed

K8SSAND-1460 ⁃ Cluster- and dc-level templates should be deep-merged #525

adutra opened this issue Apr 15, 2022 · 9 comments · Fixed by #780
Assignees
Labels
bug Something isn't working done Issues in the state 'done'

Comments

@adutra
Copy link
Contributor

adutra commented Apr 15, 2022

What happened?

Consider the following K8ssandraCluster:

apiVersion: k8ssandra.io/v1alpha1
kind: K8ssandraCluster
metadata:
  name: test
spec:
  # Cluster level
  stargate:
    size: 1
    heapSize: 512Mi
  cassandra:
    datacenters:
      - metadata:
          name: dc1
        size: 1
        # DC level
        stargate:
          size: 1
          racks:
            # Rack level
            - name: rack1

If I deploy this CRD, the observed heap size for Stargate nodes is 256Mi (the default).

Did you expect to see something different?

I would expect the heap size of all Stargate nodes to be 512Mi.

This is because the Coalesce methods do not merge fields, they simply pick the first non-nil template. In the above example, the Rack template wins over the DC template and the cluster template, so it gets applied, and its heap size is 256Mi (the default).

Note that many e2e are wrongly conceived because of that. They try to set the heap size cluster-wide, yet it is not applied because there is a DC or Rack template somewhere.

How to reproduce it (as minimally and precisely as possible):

Use the above CRD.

Environment

  • K8ssandra Operator version: 1.0.0

┆Issue is synchronized with this Jira Task by Unito
┆friendlyId: K8SSAND-1460
┆priority: Medium

@adutra adutra self-assigned this Apr 15, 2022
@sync-by-unito sync-by-unito bot changed the title Stargate templates should be coalesced field by field K8SSAND-1460 ⁃ Stargate templates should be coalesced field by field Apr 15, 2022
@adejanovski adejanovski added the zh:In-Progress Issues in the ZenHub pipeline 'In-Progress' label Apr 15, 2022
@adutra
Copy link
Contributor Author

adutra commented Apr 18, 2022

After some investigation around available merging libraries, it turns out that there is only one that is widely used:

https://github.com/imdario/mergo

It's even used by Kubernetes itself. I suggest that we use it to merge DC and cluster templates. It has a few shortcomings though:

  • Doesn't merge maps of structs (claims this isn't possible – but I'm sure it is)
  • Doesn't merge slices – cannot emulate "strategic merge patch" behavior
  • Does in-place modifications (not a problem per se, but one needs to be careful not to create unwanted changes in CRD specs)

I think that even with the above limitations, this library will greatly simplify merging Stargate, Reaper, Cassandra templates, and Image resources.

Notes:

@jsanda jsanda added the bug Something isn't working label Apr 25, 2022
@jsanda
Copy link
Contributor

jsanda commented Apr 25, 2022

@adutra can you please describe the behavior you are proposing? It will be helpful to have it documented in the issue.

@adutra
Copy link
Contributor Author

adutra commented Apr 25, 2022

@jsanda with the changes in this PR we switch from an all-or-nothing kind of merge to a recursive merge for Stargate templates.

This is done with the Mergo library. The rule of thumb for Mergo is: zero values in the destination are overwritten by non-zero values from the source.

  • For maps, this means that keys are merged:
src := &StargateTemplate{
	NodeSelector: map[string]string{"k1": "v1", "k2": "v2"},
}
dest := &StargateTemplate{
	NodeSelector: map[string]string{"k2": "v2-bis", "k3": "v3"},
}
mergo.Merge(dest, src)
fmt.Printf("%v\n", dest.NodeSelector)
// map[k1:v1 k2:v2-bis k3:v3]
  • For structs, this means that fields are merged:
src := &StargateTemplate{
	ContainerImage: &images.Image{
		Registry:   "reg1",
		Repository: "repo1",
		Name:       "img1",
	},
}
dest := &StargateTemplate{
	ContainerImage: &images.Image{
		Name: "img2",
		Tag:  "latest",
	},
}
mergo.Merge(dest, src)
fmt.Printf("%v\n", dest.ContainerImage)
// reg1/repo1/img2:latest
  • For slices however, Mergo does not have the ability to merge them, so it's still all-or-nothing:
src := &StargateTemplate{
	Tolerations: []corev1.Toleration{{
		Key:      "k1",
		Operator: corev1.TolerationOpExists,
	}},
}
dest := &StargateTemplate{
	Tolerations: []corev1.Toleration{{
		Key:      "k2",
		Operator: corev1.TolerationOpEqual,
	}},
}
mergo.Merge(dest, src)
fmt.Printf("%+v\n", dest.Tolerations)
// [{Key:k2 Operator:Equal Value: Effect: TolerationSeconds:<nil>}]

As discussed previously with @Miles-Garnsey, we think that Mergo is a good choice to start with, but might not be enough because of certain limitations. I created another merge library: Goalesce – it doesn't have such limitations. We might want to explore it later on.

@adutra
Copy link
Contributor Author

adutra commented Apr 28, 2022

FYI @Miles-Garnsey and I are now considering Goalesce rather then Mergo.

The key issue is around the *bool type, which we would like to merge with 3-valued logic semantics.

Also, Goalesce has merge-by-key semantics for slices, which could be beneficial in a few places.

@adejanovski adejanovski added zh:Review Issues in the ZenHub pipeline 'Review' and removed zh:In-Progress Issues in the ZenHub pipeline 'In-Progress' labels Apr 29, 2022
@adutra
Copy link
Contributor Author

adutra commented Apr 29, 2022

Capturing here some discussions that happened offline. The PR is in good shape and will hopefully get approved soon; but there are some areas of improvement that I suggest should be tackled in follow-up PRs:

  • Now that CassandraDatacenterOptions set in one struct that is shared by the Cl… #508 is merged, we could next look into changing the current Coalesce function in datacenter.go. The issue is that the function returns a different type from the inputs, DatacenterConfig. Currently neither Mergo nor Goalesce can do that. We'd need to embed a DatacenterOptions in the DatacenterConfig struct.
  • We could look into customizing the merge behavior of things like StargateTemplate.Tolerations for instance. I think we could use merge-by-key semantics (a.k.a strategic merge) to merge this slice. There may be other good candidates for that in the API.
  • We need to add proper documentation explaining the rules that are used when merging templates. It's probably best to do this once all the work around merging stuff is complete.

@adutra adutra removed their assignment Jun 2, 2022
@adejanovski adejanovski added zh:In-Progress Issues in the ZenHub pipeline 'In-Progress' and removed zh:Review Issues in the ZenHub pipeline 'Review' labels Jun 2, 2022
@Miles-Garnsey
Copy link
Member

Miles-Garnsey commented Jun 28, 2022

I'm just coming back to this now as I can see it has been set high priority. I'm finally coming up for air from the CDC work and need to resync with you @adutra. Tell me if my understanding is correct on two points:

  1. We don't yet have consistent in-place/allocate-new behaviour for this function's outputs. The blocker here is that an allocate-new approach requires a deepcopy function, and we're blocked on that because we require the implementation of Floyd's algorithm to detect cycles.
  2. One proposed solution is just to turn cycle detection off. This is fine for all of our use cases (we won't have cycles in our CRs) but does risk an infinite loop (and likely stack overflow) if misapplied.

I'm happy to work on a cycle detection algorithm here, but that may take a bit of time, and I want to talk about the way I'm proposing to do it.

Is there a third path here which implements a very basic counter to allow us to set maximum copy depth, set that relatively high, and return an error when the depth bound is breached? I feel like we'd want this even if we had cycle detection available, because deep structs might create some nasty performance hits.

@adutra
Copy link
Contributor Author

adutra commented Jun 28, 2022

The blocker here is that an allocate-new approach requires a deepcopy function, and we're blocked on that because we require the implementation of Floyd's algorithm to detect cycles.

@Miles-Garnsey we do need deepcopy, and I was working on it until end of May: adutra/goalesce#25.

Deepcopy is not related to cycle detection though, and is not blocked by it. We don't have proper cycle detection yet, but as you pointed out, we don't need it for K8ssandra.

I'd be more than happy if you could work on cycle detection. But I wonder if we shouldn't finish deepcopy first? As the work I was doing there changes a lot the project files.

@Miles-Garnsey
Copy link
Member

But I wonder if we shouldn't finish deepcopy first?

I'm happy to pick up your work on that then. I'll start right away and see how much I can get through before something else pops up and takes my attention away.

@adejanovski adejanovski removed the zh:In-Progress Issues in the ZenHub pipeline 'In-Progress' label Sep 7, 2022
@adejanovski adejanovski added the in-progress Issues in the state 'in-progress' label Nov 17, 2022
@adejanovski adejanovski added review Issues in the state 'review' in-progress Issues in the state 'in-progress' and removed in-progress Issues in the state 'in-progress' review Issues in the state 'review' labels Nov 22, 2022
@adutra
Copy link
Contributor Author

adutra commented Nov 25, 2022

Update: Goalesce now has a deep-copy utility and its deep-merge utility was rebuilt on top of it. Cycle detection is still clunky though, but we don't need that feature for the operator.

I am going to broaden the scope of this issue to address all cases where we merge a cluster-level and a dc-level template:

  • DatacenterOptions
  • StargateTemplate
  • TelemetrySpec

It will also address the special case of Image where a deep-merge can be used to merge image defaults with the actual image characteristics present in the CRD.

@adutra adutra changed the title K8SSAND-1460 ⁃ Stargate templates should be coalesced field by field K8SSAND-1460 ⁃ Cluster- and dc-level templates should be deep-merged Nov 25, 2022
@adejanovski adejanovski added done Issues in the state 'done' and removed in-progress Issues in the state 'in-progress' labels Dec 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working done Issues in the state 'done'
Projects
No open projects
Archived in project
4 participants