-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add controllers implementation #17
Add controllers implementation #17
Conversation
If you push the following image to Tinkerbell, then this code will select one of available hardwares from Tinkerbell and install Ubuntu cloud with cloud-init from kubeadm bootstrapper:
Pushed as At the moment cluster does not spawn and I don't have access to the machine to verify why. Most likely because of missing containerd, Kubernetes packages etc. |
8ee2b5c
to
7f66656
Compare
Tests are missing, but this can already be tried out and reviewed. For setup workflow, follow:
|
// TODO: Those fields are not intended to be filled in by the user, but by the controller. | ||
// Should we move them to Status struct? | ||
HardwareID string `json:"hardwareID,omitempty"` | ||
TemplateID string `json:"templateID,omitempty"` | ||
WorkflowID string `json:"workflowID,omitempty"` | ||
ProviderID string `json:"providerID,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally for determining spec/status I try to consider if it can be recreated, if so, then Status is ideal, but if not it needs to be in Spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HardwareID
field could be derived from ProviderID
, which is a required field. We would have to strip the tinkerbell://
prefix from it, so I think this one could be moved.
TemplateID
and WorkflowID
are unique, so they must stay I guess.
And ProviderID
is required as mentioned earlier.
Do you think it's worth it to move HardwareID
to status then?
main.go
Outdated
crcc := &reconcilers.ClusterReconcileContextConfig{ | ||
Client: mgr.GetClient(), | ||
HardwareIPGetter: tinkerbellClient, | ||
Log: ctrl.Log.WithName("controllers").WithName("TinkerbellCluster"), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this bit of config be pushed into TinkerbellClusterReconciler
? It would be nice to try and keep things relatively consistent with other providers here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My idea here was to have a distinction between the TinkerbellMachineReconciler
, which is a controller and handles Reconcile()
method. As handling both cluster and machine objects is rather complex (e.g. we need to pull plenty of dependent objects before we can start processing), I think it's a good idea to separate this controller from the object which handles just a single cluster/machine.
With separate object we can express "Given the following data sources/sinks (dependencies like Client, Tinkerbell Client etc.) reconcile me a cluster/machine with a given name (namespaced name)". This way, we can build up an API used by the controller, so it's responsibilities are minimized to what controller should do, for example it can keep the control over when to re-queue the reconciliation (like New()
or IntoMachineReconcileContext()
which may return nil
when dependencies are not ready yet).
If we move config bits into TinkerbellClusterReconciler
, then we must either extend the API of this struct or copy the fields into separate config in Reconcile()
method if we want to keep the abstraction/separation I described.
Having those 2 separate should also allow easier testing, when we turn clients into interfaces and we replace them with mocks.
main.go
Outdated
bmrcc := &reconcilers.BaseMachineReconcileContextConfig{ | ||
Client: mgr.GetClient(), | ||
TinkerbellClient: tinkerbellClient, | ||
Log: ctrl.Log.WithName("controllers").WithName("TinkerbellMachine"), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this bit of config be pushed into TinkerbellMachineReconciler
? It would be nice to try and keep things relatively consistent with other providers here.
internal/templates/templates.go
Outdated
apt: | ||
sources: | ||
kubernetes: | ||
# TODO: We use Xenial for Focal, but it seems upstream does not | ||
# publish newer pool? | ||
source: "deb https://apt.kubernetes.io/ kubernetes-xenial main" | ||
# Key from https://packages.cloud.google.com/apt/doc/apt-key.gpg | ||
key: | | ||
-----BEGIN PGP PUBLIC KEY BLOCK----- | ||
|
||
mQENBF/Jfl4BCADTPUXdkNu057X+P3STVxCzJpU2Mn+tUamKdSdVambGeYFINcp/ | ||
EGwNGhdb0a1BbHs1SWYZbzwh4d6+p3k4ABzVMO+RpMu/aBx9E5aOn5c8GzHjZ/VE | ||
aheqLLhSUcSCzChSZcN5jz0hTGhmAGaviMt6RMzSfbIhZPj1kDzBiGd0Qwd/rOPn | ||
Jr4taPruR3ecBjhHti1/BMGd/lj0F7zQnCjp7PrqgpEPBT8jo9wX2wvOyXswSI/G | ||
sfbFiaOJfDnYengaEg8sF+u3WOs0Z20cSr6kS76KHpTfa3JjYsfHt8NDw8w4e3H8 | ||
PwQzNiRP9tXeMASKQz3emMj/ek6HxjihY9qFABEBAAG0umdMaW51eCBSYXB0dXJl | ||
IEF1dG9tYXRpYyBTaWduaW5nIEtleSAoLy9kZXBvdC9nb29nbGUzL3Byb2R1Y3Rp | ||
b24vYm9yZy9jbG91ZC1yYXB0dXJlL2tleXMvY2xvdWQtcmFwdHVyZS1wdWJrZXlz | ||
L2Nsb3VkLXJhcHR1cmUtc2lnbmluZy1rZXktMjAyMC0xMi0wMy0xNl8wOF8wNS5w | ||
dWIpIDxnbGludXgtdGVhbUBnb29nbGUuY29tPokBKAQTAQgAHAUCX8l+XgkQi1fF | ||
woNvS+sCGwMFCQPDCrACGQEAAEF6CACaekro6aUJJd3mVtrtLOOewV8et1jep5ew | ||
mpOrew/pajRVBeIbV1awVn0/8EcenFejmP6WFcdCWouDVIS/QmRFQV9N6YXN8Piw | ||
alrRV3bTKFBHkwa1cEH4AafCGo0cDvJb8N3JnM/Rmb1KSGKr7ZXpmkLtYVqr6Hgz | ||
l+snrlH0Xwsl5r3SyvqBgvRYTQKZpKqmBEd1udieVoLSF988kKeNDjFa+Q1SjZPG | ||
W+XukgE8kBUbSDx8Y8q6Cszh3VVY+5JUeqimRgJ2ADY2/3lEtAZOtmwcBlhY0cPW | ||
Vqga14E7kTGSWKC6W96Nfy9K7L4Ypp8nTMErus181aqwwNfMqnpnuQENBF/Jfl4B | ||
CADDSh+KdBeNjIclVVnRKt0QT5593yF4WVZt/TgNuaEZ5vKknooVVIq+cJIfY/3l | ||
Uqq8Te4dEjodtFyKe5Xuego6qjzs8TYFdCAHXpXRoUolT14m+qkJ8rhSrpN0TxIj | ||
WJbJdm3NlrgTam5RKJw3ShypNUxyolnHelXxqyKDCkxBSDmR6xcdft3wdQl5IkIA | ||
wxe6nywmSUtpndGLRJdJraJiaWF2IBjFNg3vTEYj4eoehZd4XrvEyLVrMbKZ5m6f | ||
1o6QURuzSrUH9JT/ivZqCmhPposClXXX0bbi9K0Z/+uVyk6v76ms3O50rIq0L0Ye | ||
hM8G++qmGO421+0qCLkdD5/jABEBAAGJAR8EGAEIABMFAl/Jfl4JEItXxcKDb0vr | ||
AhsMAAAbGggAw7lhSWElZpGV1SI2b2K26PB93fVI1tQYV37WIElCJsajF+/ZDfJJ | ||
2d6ncuQSleH5WRccc4hZfKwysA/epqrCnwc7yKsToZ4sw8xsJF1UtQ5ENtkdArVi | ||
BJHS4Y2VZ5DEUmr5EghGtZFh9a6aLoeMVM/nrZCLstDVoPKEpLokHu/gebCwfT/n | ||
9U1dolFIovg6eKACl5xOx+rzcAVp7R4P527jffudz3dKMdLhPrstG0w5YbyfPPwW | ||
MOPp+kUF45eYdR7kKKk09VrJNkEGJ0KQQ6imqR1Tn0kyu4cvkfqnCUF0rrn7CdBq | ||
LSCv1QRhgr6TChQf7ynWsPz5gGdVjh3tIw== | ||
=dsvF | ||
-----END PGP PUBLIC KEY BLOCK----- | ||
|
||
packages: | ||
- containerd | ||
# TODO: Use version from configuration. | ||
- [kubelet, {{.KubernetesVersion}}] | ||
- [kubeadm, {{.KubernetesVersion}}] | ||
|
||
# TODO: Add it to spec. | ||
ssh_authorized_keys: | ||
{{- range .SSHPublicKeys }} | ||
- {{ . }} | ||
{{- end }} | ||
|
||
# Allow SSH as root for debugging. | ||
{{- if .SSHPublicKeys }} | ||
disable_root: false | ||
{{- end }} | ||
` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of this can likely be moved to the cluster template, which would allow for it to be more easily tweaked without invasive changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I felt that having it here is better, as then it can be better tested etc. I think the less clutter/boilerplate user sees (so what clusterctl config
generates, which is already a lot), the better. It's also user-proof, as one might consider removing some lines from it, thinking they are not required and we're not able to validate that programatically in runtime that they are there. And cluster provisioning may fail then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I look into it again, on the other hand placing all this stuff in cluster template makes defined in single place... At least kubeadmconfig does not allow specifying APT repositories and packages, so there is no danger, that user will override the packages we add. And with support for other OSes, we will be able to programatically select what package version to use and how to configure those repositories. I don't think it will be possible with cluster template.
internal/templates/templates.go
Outdated
- name: "dump-cloud-init" | ||
image: ubuntu-install | ||
command: | ||
- sh | ||
- -c | ||
- | | ||
echo '{{.cloudInit}}' | base64 -d > /statedir/90_dpkg.cfg | ||
- name: "download-image" | ||
image: ubuntu-install | ||
command: | ||
- sh | ||
- -c | ||
- | | ||
# TODO: Pull image from Tinkerbell nginx and convert it there, so we can pipe | ||
# wget directly into dd. | ||
/usr/bin/wget https://cloud-images.ubuntu.com/focal/current/focal-server-cloudimg-amd64.img \ | ||
-O /statedir/focal-server-cloudimg-amd64.img | ||
- name: "write-image-to-disk" | ||
image: ubuntu-install | ||
command: | ||
- sh | ||
- -c | ||
- | | ||
/usr/bin/qemu-img convert -f qcow2 -O raw /statedir/focal-server-cloudimg-amd64.img /dev/vda | ||
- name: "write-cloud-init-config" | ||
image: ubuntu-install | ||
command: | ||
- sh | ||
- -c | ||
- | | ||
set -eux | ||
partprobe /dev/vda | ||
mkdir -p /mnt/target | ||
mount -t ext4 /dev/vda1 /mnt/target | ||
cp /statedir/90_dpkg.cfg /mnt/target/etc/cloud/cloud.cfg.d/ | ||
# Those commands are required to satisfy kubeadm preflight checks. | ||
# We cannot put those in 'write_files' or 'runcmd' from cloud-config, as it will override | ||
# what kubeadm bootstrapper generates and there is no trivial way to merge with this. | ||
# We could put this in templates/cluster-template.yaml, but this makes is visible to the user | ||
# making user-facing configuration more complex and more fragile at the same time, as user may | ||
# remove it from the configuration. | ||
echo br_netfilter > /mnt/target/etc/modules-load.d/kubernetes.conf | ||
cat <<EOF > /mnt/target/etc/sysctl.d/99-kubernetes-cri.conf | ||
net.bridge.bridge-nf-call-iptables = 1 | ||
net.ipv4.ip_forward = 1 | ||
EOF | ||
umount /mnt/target | ||
# This task shouldn't really be there, but there is no other way to reboot the | ||
# Tinkerbell Worker into target OS in Tinkerbell for now. | ||
- name: "reboot" | ||
image: ubuntu-install | ||
command: | ||
- sh | ||
- -c | ||
- | | ||
echo 1 > /proc/sys/kernel/sysrq; echo b > /proc/sysrq-trigger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not something we need to solve now, but might be good to think about how we can avoid hardcoding ubuntu here and make it easier to swap out a different underlying OS without having to make changes to the deployed binary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. We could perhaps change to capt-install
or just capt
.
internal/reconcilers/cluster.go
Outdated
func (crcc *ClusterReconcileContextConfig) New(namespacedName types.NamespacedName) (ReconcileContext, error) { | ||
crc := &clusterReconcileContext{ | ||
log: crcc.Log.WithValues("tinkerbellcluster", namespacedName), | ||
ctx: context.Background(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we look to updating to support the in-development v1alpha4 version of Cluster API, this is going to get a bit messy with the way controller-runtime v0.7.0 is using contexts and passing loggers around: https://github.com/kubernetes-sigs/controller-runtime/releases/tag/v0.7.0, the biggest one being that Reconcile()
will be passed a context and that context will be enriched with a logger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I think when the time comes to an update, we can then:
- Remove the
Log
field from the configuration. - Accept context as a parameter to
New()
. This can actually be done right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a context parameter to New()
.
internal/reconcilers/cluster.go
Outdated
id, err := crc.hardwareIPGetter.NextAvailableHardwareID(crc.ctx) | ||
if err != nil { | ||
return fmt.Errorf("getting next available hardware: %w", err) | ||
} | ||
|
||
ip, err := crc.hardwareIPGetter.GetHardwareIP(crc.ctx, id) | ||
if err != nil { | ||
return fmt.Errorf("getting hardware IP: %w", err) | ||
} | ||
|
||
crc.log.Info("Assigning IP to cluster", "ip", ip, "clusterName", crc.tinkerbellCluster.Name) | ||
|
||
crc.tinkerbellCluster.Spec.ControlPlaneEndpoint.Host = ip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels a bit racy if we are creating multiple Clusters at the same time, how do we ensure that the id/ip we find here is the one that is used for the initial Machine instance that is created?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm aware of that. We could do some sort of reservation here via cluster objects annotation or to create some kind of stub workflow here to mark hardware as reserved. This might have side-effects though.
how do we ensure that the id/ip we find here is the one that is used for the initial Machine instance that is created?
It's not completely random, though definitely could be improved. When we deploy controlplane node, we select hardware using this IP address, to make sure the right hardware is picked. So in case new hardware shows up in the meanwhile, we will still select the right one. But there is no protection from other cluster stealing the hardware with this IP...
Having a list of hardware UUIDs for controlplane should improve that. It should be easy to implement, so perhaps we should add it here.
internal/reconcilers/cluster.go
Outdated
crc.tinkerbellCluster.Spec.ControlPlaneEndpoint.Host = ip | ||
} | ||
|
||
// TODO: How can we support changing that? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit more complicated in our case currently because we don't have a load balancer (yet), I don't think we have a good way (other than making assumptions related to the bootstrap provider).
Signed-off-by: Jason DeTiberus <detiber@users.noreply.github.com>
70df4a5
to
993b791
Compare
993b791
to
9fed279
Compare
Spotted this weird error while doing some tests:
Also, I hit tinkerbell/tink#413 :| |
Running |
ba5def7
to
57cf307
Compare
Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
And enable new linters, even though some of them are with questionable quality, like paralleltest. Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
57cf307
to
71d3cae
Compare
Closed in favor of #32 |
Draft with the progress on initial controllers implementation. Mainly opened as a backup and for some early reviews.
Signed-off-by: Mateusz Gozdek mateusz@kinvolk.io
Closes #14
Closes #15