Add controllers implementation #17

invidian · 2020-12-15T16:11:41Z

Draft with the progress on initial controllers implementation. Mainly opened as a backup and for some early reviews.

Signed-off-by: Mateusz Gozdek mateusz@kinvolk.io

Closes #14
Closes #15

invidian · 2020-12-15T17:00:50Z

If you push the following image to Tinkerbell, then this code will select one of available hardwares from Tinkerbell and install Ubuntu cloud with cloud-init from kubeadm bootstrapper:

FROM alpine:3.12

RUN apk add -U qemu-img

Pushed as 10.17.3.2/ubuntu-install.

At the moment cluster does not spawn and I don't have access to the machine to verify why. Most likely because of missing containerd, Kubernetes packages etc.

invidian · 2020-12-18T17:51:10Z

Tests are missing, but this can already be tried out and reviewed.

For setup workflow, follow:

Instruction from Document development workflow #13
https://kinvolk.io/docs/lokomotive/0.5/quickstarts/tinkerbell/ if you don't have Tinkerbell cluster.

If you followed Lokomotive guide, also add file like: lokomotive-assets/terraform/capi.tf with the following content to add 3 more workers to libvirt for CAPI machines:

locals {
  capi_workers = [
    "10.17.3.6",
    "10.17.3.7",
    "10.17.3.8",
  ]
}

module "tink_worker_capi" {
  source = "../terraform-modules/tinkerbell-sandbox/worker"

  count = length(local.capi_workers)

  ip   = local.capi_workers[count.index]
  name = "capi-${count.index}"

  sandbox = module.tinkerbell_sandbox

  depends_on = [
    module.tinkerbell_sandbox,
  ]
}

Run Terraform:
```
terraform apply
```
Build and push installer image to Tinkerbell registry as described in Add controllers implementation #17 (comment).

Generate a config for your cluster:

clusterctl config cluster capi-quickstart --infrastructure=tinkerbell:v0.0.0-dirty --kubernetes-version=v1.20.0 --control-plane-machine-count=1 --worker-machine-count=1 > test-cluster.yaml

Add your SSH keys to config in case you need to debug it.
Create a cluster:
```
kubectl apply -f test-cluster.yaml
```
When you see in the logs that workflows has been created, reboot the machines so they pick it up. If you use libvirt, run for example:
```
virsh reset tinkerbell-sandbox-lokomotive-cluster-capi-0
```

When your cluster is provisioned, get Kubeconfig for it and apply some CNI on it, for example Calico, so nodes becomes ready:

clusterctl get kubeconfig capi-quickstart kubeconfig-workload
KUBECONFIG=kubeconfig-workload kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml

Now cluster should soon become ready, which can be checked using the following commands:
```
kubectl get kubeadmcontrolplane
kubectl get machines
```

api/v1alpha3/tinkerbellmachine_types.go

detiber · 2020-12-21T15:14:35Z

api/v1alpha3/tinkerbellmachine_types.go

+	// TODO: Those fields are not intended to be filled in by the user, but by the controller.
+	// Should we move them to Status struct?
+	HardwareID string `json:"hardwareID,omitempty"`
+	TemplateID string `json:"templateID,omitempty"`
+	WorkflowID string `json:"workflowID,omitempty"`
+	ProviderID string `json:"providerID,omitempty"`


Generally for determining spec/status I try to consider if it can be recreated, if so, then Status is ideal, but if not it needs to be in Spec.

HardwareID field could be derived from ProviderID, which is a required field. We would have to strip the tinkerbell:// prefix from it, so I think this one could be moved.

TemplateID and WorkflowID are unique, so they must stay I guess.

And ProviderID is required as mentioned earlier.

Do you think it's worth it to move HardwareID to status then?

detiber · 2020-12-21T15:37:01Z

main.go

+		crcc := &reconcilers.ClusterReconcileContextConfig{
+			Client:           mgr.GetClient(),
+			HardwareIPGetter: tinkerbellClient,
+			Log:              ctrl.Log.WithName("controllers").WithName("TinkerbellCluster"),
+		}


Could this bit of config be pushed into TinkerbellClusterReconciler? It would be nice to try and keep things relatively consistent with other providers here.

My idea here was to have a distinction between the TinkerbellMachineReconciler, which is a controller and handles Reconcile() method. As handling both cluster and machine objects is rather complex (e.g. we need to pull plenty of dependent objects before we can start processing), I think it's a good idea to separate this controller from the object which handles just a single cluster/machine.

With separate object we can express "Given the following data sources/sinks (dependencies like Client, Tinkerbell Client etc.) reconcile me a cluster/machine with a given name (namespaced name)". This way, we can build up an API used by the controller, so it's responsibilities are minimized to what controller should do, for example it can keep the control over when to re-queue the reconciliation (like New() or IntoMachineReconcileContext() which may return nil when dependencies are not ready yet).

If we move config bits into TinkerbellClusterReconciler, then we must either extend the API of this struct or copy the fields into separate config in Reconcile() method if we want to keep the abstraction/separation I described.

Having those 2 separate should also allow easier testing, when we turn clients into interfaces and we replace them with mocks.

detiber · 2020-12-21T15:38:13Z

main.go

+		bmrcc := &reconcilers.BaseMachineReconcileContextConfig{
+			Client:           mgr.GetClient(),
+			TinkerbellClient: tinkerbellClient,
+			Log:              ctrl.Log.WithName("controllers").WithName("TinkerbellMachine"),
+		}


Could this bit of config be pushed into TinkerbellMachineReconciler? It would be nice to try and keep things relatively consistent with other providers here.

detiber · 2020-12-21T15:52:31Z

internal/templates/templates.go

+apt:
+  sources:
+    kubernetes:
+      # TODO: We use Xenial for Focal, but it seems upstream does not
+      # publish newer pool?
+      source: "deb https://apt.kubernetes.io/ kubernetes-xenial main"
+      # Key from https://packages.cloud.google.com/apt/doc/apt-key.gpg
+      key: |
+        -----BEGIN PGP PUBLIC KEY BLOCK-----
+
+        mQENBF/Jfl4BCADTPUXdkNu057X+P3STVxCzJpU2Mn+tUamKdSdVambGeYFINcp/
+        EGwNGhdb0a1BbHs1SWYZbzwh4d6+p3k4ABzVMO+RpMu/aBx9E5aOn5c8GzHjZ/VE
+        aheqLLhSUcSCzChSZcN5jz0hTGhmAGaviMt6RMzSfbIhZPj1kDzBiGd0Qwd/rOPn
+        Jr4taPruR3ecBjhHti1/BMGd/lj0F7zQnCjp7PrqgpEPBT8jo9wX2wvOyXswSI/G
+        sfbFiaOJfDnYengaEg8sF+u3WOs0Z20cSr6kS76KHpTfa3JjYsfHt8NDw8w4e3H8
+        PwQzNiRP9tXeMASKQz3emMj/ek6HxjihY9qFABEBAAG0umdMaW51eCBSYXB0dXJl
+        IEF1dG9tYXRpYyBTaWduaW5nIEtleSAoLy9kZXBvdC9nb29nbGUzL3Byb2R1Y3Rp
+        b24vYm9yZy9jbG91ZC1yYXB0dXJlL2tleXMvY2xvdWQtcmFwdHVyZS1wdWJrZXlz
+        L2Nsb3VkLXJhcHR1cmUtc2lnbmluZy1rZXktMjAyMC0xMi0wMy0xNl8wOF8wNS5w
+        dWIpIDxnbGludXgtdGVhbUBnb29nbGUuY29tPokBKAQTAQgAHAUCX8l+XgkQi1fF
+        woNvS+sCGwMFCQPDCrACGQEAAEF6CACaekro6aUJJd3mVtrtLOOewV8et1jep5ew
+        mpOrew/pajRVBeIbV1awVn0/8EcenFejmP6WFcdCWouDVIS/QmRFQV9N6YXN8Piw
+        alrRV3bTKFBHkwa1cEH4AafCGo0cDvJb8N3JnM/Rmb1KSGKr7ZXpmkLtYVqr6Hgz
+        l+snrlH0Xwsl5r3SyvqBgvRYTQKZpKqmBEd1udieVoLSF988kKeNDjFa+Q1SjZPG
+        W+XukgE8kBUbSDx8Y8q6Cszh3VVY+5JUeqimRgJ2ADY2/3lEtAZOtmwcBlhY0cPW
+        Vqga14E7kTGSWKC6W96Nfy9K7L4Ypp8nTMErus181aqwwNfMqnpnuQENBF/Jfl4B
+        CADDSh+KdBeNjIclVVnRKt0QT5593yF4WVZt/TgNuaEZ5vKknooVVIq+cJIfY/3l
+        Uqq8Te4dEjodtFyKe5Xuego6qjzs8TYFdCAHXpXRoUolT14m+qkJ8rhSrpN0TxIj
+        WJbJdm3NlrgTam5RKJw3ShypNUxyolnHelXxqyKDCkxBSDmR6xcdft3wdQl5IkIA
+        wxe6nywmSUtpndGLRJdJraJiaWF2IBjFNg3vTEYj4eoehZd4XrvEyLVrMbKZ5m6f
+        1o6QURuzSrUH9JT/ivZqCmhPposClXXX0bbi9K0Z/+uVyk6v76ms3O50rIq0L0Ye
+        hM8G++qmGO421+0qCLkdD5/jABEBAAGJAR8EGAEIABMFAl/Jfl4JEItXxcKDb0vr
+        AhsMAAAbGggAw7lhSWElZpGV1SI2b2K26PB93fVI1tQYV37WIElCJsajF+/ZDfJJ
+        2d6ncuQSleH5WRccc4hZfKwysA/epqrCnwc7yKsToZ4sw8xsJF1UtQ5ENtkdArVi
+        BJHS4Y2VZ5DEUmr5EghGtZFh9a6aLoeMVM/nrZCLstDVoPKEpLokHu/gebCwfT/n
+        9U1dolFIovg6eKACl5xOx+rzcAVp7R4P527jffudz3dKMdLhPrstG0w5YbyfPPwW
+        MOPp+kUF45eYdR7kKKk09VrJNkEGJ0KQQ6imqR1Tn0kyu4cvkfqnCUF0rrn7CdBq
+        LSCv1QRhgr6TChQf7ynWsPz5gGdVjh3tIw==
+        =dsvF
+        -----END PGP PUBLIC KEY BLOCK-----
+
+packages:
+- containerd
+# TODO: Use version from configuration.
+- [kubelet, {{.KubernetesVersion}}]
+- [kubeadm, {{.KubernetesVersion}}]
+
+# TODO: Add it to spec.
+ssh_authorized_keys:
+{{- range .SSHPublicKeys }}
+- {{ . }}
+{{- end }}
+
+# Allow SSH as root for debugging.
+{{- if .SSHPublicKeys }}
+disable_root: false
+{{- end }}
+`


Most of this can likely be moved to the cluster template, which would allow for it to be more easily tweaked without invasive changes.

I felt that having it here is better, as then it can be better tested etc. I think the less clutter/boilerplate user sees (so what clusterctl config generates, which is already a lot), the better. It's also user-proof, as one might consider removing some lines from it, thinking they are not required and we're not able to validate that programatically in runtime that they are there. And cluster provisioning may fail then.

Now that I look into it again, on the other hand placing all this stuff in cluster template makes defined in single place... At least kubeadmconfig does not allow specifying APT repositories and packages, so there is no danger, that user will override the packages we add. And with support for other OSes, we will be able to programatically select what package version to use and how to configure those repositories. I don't think it will be possible with cluster template.

detiber · 2020-12-21T15:54:51Z

internal/templates/templates.go

+      - name: "dump-cloud-init"
+        image: ubuntu-install
+        command:
+          - sh
+          - -c
+          - |
+            echo '{{.cloudInit}}' | base64 -d > /statedir/90_dpkg.cfg
+      - name: "download-image"
+        image: ubuntu-install
+        command:
+          - sh
+          - -c
+          - |
+            # TODO: Pull image from Tinkerbell nginx and convert it there, so we can pipe
+            # wget directly into dd.
+            /usr/bin/wget https://cloud-images.ubuntu.com/focal/current/focal-server-cloudimg-amd64.img \
+              -O /statedir/focal-server-cloudimg-amd64.img
+      - name: "write-image-to-disk"
+        image: ubuntu-install
+        command:
+          - sh
+          - -c
+          - |
+            /usr/bin/qemu-img convert -f qcow2 -O raw /statedir/focal-server-cloudimg-amd64.img /dev/vda
+      - name: "write-cloud-init-config"
+        image: ubuntu-install
+        command:
+          - sh
+          - -c
+          - |
+            set -eux
+            partprobe /dev/vda
+            mkdir -p /mnt/target
+            mount -t ext4 /dev/vda1 /mnt/target
+            cp /statedir/90_dpkg.cfg /mnt/target/etc/cloud/cloud.cfg.d/
+            # Those commands are required to satisfy kubeadm preflight checks.
+            # We cannot put those in 'write_files' or 'runcmd' from cloud-config, as it will override
+            # what kubeadm bootstrapper generates and there is no trivial way to merge with this.
+            # We could put this in templates/cluster-template.yaml, but this makes is visible to the user
+            # making user-facing configuration more complex and more fragile at the same time, as user may
+            # remove it from the configuration.
+            echo br_netfilter > /mnt/target/etc/modules-load.d/kubernetes.conf
+            cat <<EOF > /mnt/target/etc/sysctl.d/99-kubernetes-cri.conf
+            net.bridge.bridge-nf-call-iptables = 1
+            net.ipv4.ip_forward                = 1
+            EOF
+            umount /mnt/target
+      # This task shouldn't really be there, but there is no other way to reboot the
+      # Tinkerbell Worker into target OS in Tinkerbell for now.
+      - name: "reboot"
+        image: ubuntu-install
+        command:
+          - sh
+          - -c
+          - |
+            echo 1 > /proc/sys/kernel/sysrq; echo b > /proc/sysrq-trigger


Not something we need to solve now, but might be good to think about how we can avoid hardcoding ubuntu here and make it easier to swap out a different underlying OS without having to make changes to the deployed binary.

Right. We could perhaps change to capt-install or just capt.

detiber · 2020-12-21T16:00:32Z

internal/reconcilers/cluster.go

+func (crcc *ClusterReconcileContextConfig) New(namespacedName types.NamespacedName) (ReconcileContext, error) {
+	crc := &clusterReconcileContext{
+		log:               crcc.Log.WithValues("tinkerbellcluster", namespacedName),
+		ctx:               context.Background(),


When we look to updating to support the in-development v1alpha4 version of Cluster API, this is going to get a bit messy with the way controller-runtime v0.7.0 is using contexts and passing loggers around: https://github.com/kubernetes-sigs/controller-runtime/releases/tag/v0.7.0, the biggest one being that Reconcile() will be passed a context and that context will be enriched with a logger.

I see. I think when the time comes to an update, we can then:

Remove the Log field from the configuration.

Accept context as a parameter to New(). This can actually be done right now.

I added a context parameter to New().

detiber · 2020-12-21T16:12:08Z

internal/reconcilers/cluster.go

+		id, err := crc.hardwareIPGetter.NextAvailableHardwareID(crc.ctx)
+		if err != nil {
+			return fmt.Errorf("getting next available hardware: %w", err)
+		}
+
+		ip, err := crc.hardwareIPGetter.GetHardwareIP(crc.ctx, id)
+		if err != nil {
+			return fmt.Errorf("getting hardware IP: %w", err)
+		}
+
+		crc.log.Info("Assigning IP to cluster", "ip", ip, "clusterName", crc.tinkerbellCluster.Name)
+
+		crc.tinkerbellCluster.Spec.ControlPlaneEndpoint.Host = ip


This feels a bit racy if we are creating multiple Clusters at the same time, how do we ensure that the id/ip we find here is the one that is used for the initial Machine instance that is created?

Yeah, I'm aware of that. We could do some sort of reservation here via cluster objects annotation or to create some kind of stub workflow here to mark hardware as reserved. This might have side-effects though.

how do we ensure that the id/ip we find here is the one that is used for the initial Machine instance that is created?

It's not completely random, though definitely could be improved. When we deploy controlplane node, we select hardware using this IP address, to make sure the right hardware is picked. So in case new hardware shows up in the meanwhile, we will still select the right one. But there is no protection from other cluster stealing the hardware with this IP...

Having a list of hardware UUIDs for controlplane should improve that. It should be easy to implement, so perhaps we should add it here.

detiber · 2020-12-21T16:14:05Z

internal/reconcilers/cluster.go

+		crc.tinkerbellCluster.Spec.ControlPlaneEndpoint.Host = ip
+	}
+
+	// TODO: How can we support changing that?


It's a bit more complicated in our case currently because we don't have a load balancer (yet), I don't think we have a good way (other than making assumptions related to the bootstrap provider).

Signed-off-by: Jason DeTiberus <detiber@users.noreply.github.com>

invidian · 2021-01-06T16:25:42Z

Converted back to draft, as this is now based on #8 as we use SHIM implementation from there. This should not be merged until #8 is.

invidian · 2021-01-11T23:41:34Z

Spotted this weird error while doing some tests:

[manager] E0111 23:40:08.975102      97 controller.go:257] controller-runtime/controller "msg"="Reconciler error" "error"="failed to create workflow: failed to create workflow in Tinkerbell: rpc error: code = Unknown desc = failed to get template with ID : failed to get template: one GetBy field must be set to build a get condition" "controller"="workflow" "name"="capi-quickstart-control-plane-jthgs" "namespace"=""

Also, I hit tinkerbell/tink#413 :|

invidian · 2021-01-12T23:31:11Z

Running PGPASSWORD=tinkerbell docker-compose exec db psql -U tinkerbell -c 'drop trigger events_channel ON events;' workarounds tinkerbell/tink#413 well enough.

Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>

And enable new linters, even though some of them are with questionable quality, like paralleltest. Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>

detiber · 2021-02-16T20:55:12Z

Closed in favor of #32

invidian force-pushed the invidian/controllers-implementation branch 3 times, most recently from 8ee2b5c to 7f66656 Compare December 18, 2020 17:46

invidian marked this pull request as ready for review December 18, 2020 17:50

invidian changed the title ~~WIP: add controllers implementation~~ Add controllers implementation Dec 18, 2020

invidian requested review from detiber and gianarb December 21, 2020 10:05

invidian mentioned this pull request Dec 21, 2020

Add required controlPlaneEndpoint field to cluster spec #14

Closed

detiber mentioned this pull request Dec 21, 2020

CAPT (Cluster API for Tinkerbell) #1

Closed

detiber reviewed Dec 21, 2020

View reviewed changes

detiber added 10 commits December 21, 2020 17:37

Start building out shim layer for Tinkerbell

5e04d71

Signed-off-by: Jason DeTiberus <detiber@users.noreply.github.com>

updates to templates

cfb3e70

Start building out template controller

24842b1

review feedback

c9ae7d6

Continue building out Tink types and controllers

38dd4d7

Refactor the shim a bit

651ee32

current state

4bffdc1

latest changes

fe98699

Start wiring up controllers and event handlers

dab620e

update generated code

303da8c

invidian force-pushed the invidian/controllers-implementation branch 5 times, most recently from 70df4a5 to 993b791 Compare December 22, 2020 15:54

detiber added 3 commits December 22, 2020 17:27

Add tests for tink informers, update statuses

1746f69

fixes

a40f06d

fixes/cleanup

9126feb

detiber added 2 commits December 23, 2020 16:29

force requeueafter when workflow state is not success

923ef04

de-lint

c7fa75d

invidian force-pushed the invidian/controllers-implementation branch from 993b791 to 9fed279 Compare January 6, 2021 16:24

invidian marked this pull request as draft January 6, 2021 16:24

invidian mentioned this pull request Jan 6, 2021

Figure out checking if given hardware ID is available to use #15

Closed

invidian mentioned this pull request Jan 12, 2021

Evaluating template in workflow replaced << with << tinkerbell/tink#415

Closed

invidian force-pushed the invidian/controllers-implementation branch 4 times, most recently from ba5def7 to 57cf307 Compare January 13, 2021 11:24

invidian added 2 commits January 13, 2021 13:04

Add initial controllers implementation

3186e7e

Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>

Update golangci-lint to latest version

71d3cae

And enable new linters, even though some of them are with questionable quality, like paralleltest. Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>

invidian force-pushed the invidian/controllers-implementation branch from 57cf307 to 71d3cae Compare January 13, 2021 12:10

invidian mentioned this pull request Jan 15, 2021

Containerd package gets corrupted sometimes tinkerbell/osie#183

Closed

detiber mentioned this pull request Feb 16, 2021

Add controllers implementation #32

Merged

detiber closed this Feb 16, 2021

invidian deleted the invidian/controllers-implementation branch February 16, 2021 21:02

detiber mentioned this pull request Feb 16, 2021

Address feedback from https://github.com/tinkerbell/cluster-api-provider-tink/pull/8 #34

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add controllers implementation #17

Add controllers implementation #17

invidian commented Dec 15, 2020 •

edited

Loading

invidian commented Dec 15, 2020

invidian commented Dec 18, 2020 •

edited

Loading

detiber Dec 21, 2020

invidian Dec 21, 2020

detiber Dec 21, 2020

invidian Dec 22, 2020

detiber Dec 21, 2020

detiber Dec 21, 2020

invidian Dec 21, 2020

invidian Dec 22, 2020

detiber Dec 21, 2020

invidian Dec 21, 2020

detiber Dec 21, 2020

invidian Dec 22, 2020

invidian Dec 22, 2020

detiber Dec 21, 2020

invidian Dec 21, 2020 •

edited

Loading

detiber Dec 21, 2020

invidian commented Jan 6, 2021

invidian commented Jan 11, 2021

invidian commented Jan 12, 2021

detiber commented Feb 16, 2021

Add controllers implementation #17

Add controllers implementation #17

Conversation

invidian commented Dec 15, 2020 • edited Loading

invidian commented Dec 15, 2020

invidian commented Dec 18, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

invidian Dec 21, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

invidian commented Jan 6, 2021

invidian commented Jan 11, 2021

invidian commented Jan 12, 2021

detiber commented Feb 16, 2021

invidian commented Dec 15, 2020 •

edited

Loading

invidian commented Dec 18, 2020 •

edited

Loading

invidian Dec 21, 2020 •

edited

Loading