Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Talos OS #64

Merged
merged 2 commits into from
Jun 15, 2023
Merged

Conversation

gerhard
Copy link
Contributor

@gerhard gerhard commented Apr 21, 2023

Resolves metal-stack/csi-driver-lvm#86

/etc on Talos OS is read-only, which means that this chart will fail to install as is. The new hostWritePath option allows the default /etc/lvm to be overwritten. For Talos OS, setting hostWritePath: /var/etc/lvm fixes the install issue. This is just half the story.

After this installs, creating a PVs will fail since the controller has some hard-coded paths starting with /etc. Here is the second half of this fix: metal-stack/csi-driver-lvm#87

First step is to get that PR 👆 merged first, then update the image versions in this one before merging this change.


Learn more about the Talos' file system: https://www.talos.dev/v1.4/learn-more/architecture/

gerhard added a commit to gerhard/csi-driver-lvm that referenced this pull request Apr 21, 2023
`/etc` on Talos OS is read-only, which means that new PVs will fail to
create. This change makes the hard-coded `/etc/lvm` configurable via the
`-hostwritepath` flag.

NOTE that this also changes the current `/run/lock/lvm` to
`/etc/lvm/lock`.

This is a requirement for
metal-stack/helm-charts#64

Next step: open as draft to check the current trajectory of these
changes. FWIW, this commit message will be amended before the PR is
ready to merge. I am still missing integration tests. Since this was
developed on macOS, the integration tests will not work since they imply
a Linux system, e.g. fallocate & losetup. This is a good point to switch
to my Linux workstation.

Signed-off-by: Gerhard Lazu <gerhard@lazu.ch>
@gerhard gerhard force-pushed the add-talos-support branch 2 times, most recently from e0ad37c to 92d70e6 Compare April 22, 2023 22:07
gerhard added a commit to gerhard/csi-driver-lvm that referenced this pull request Apr 23, 2023
`/etc` on Talos OS is read-only, which means that new PVs will fail to
create. This change makes the hard-coded `/etc/lvm` configurable via the
`-hostwritepath` flag.

NOTE that this also changes the current `/run/lock/lvm` to
`/etc/lvm/lock`.

This is a requirement for
metal-stack/helm-charts#64

Next step: open as draft to check the current trajectory of these
changes. FWIW, this commit message will be amended before the PR is
ready to merge. I am still missing integration tests. Since this was
developed on macOS, the integration tests will not work since they imply
a Linux system, e.g. fallocate & losetup. This is a good point to switch
to my Linux workstation.

Signed-off-by: Gerhard Lazu <gerhard@lazu.ch>
gerhard added a commit to gerhard/csi-driver-lvm that referenced this pull request Apr 24, 2023
`/etc` on Talos OS is read-only, which means that new PVs will fail to
create. This change makes the hard-coded `/etc/lvm` configurable via the
`--hostwritepath` flag.

NOTE that this also changes the current `/run/lock/lvm` to
`/etc/lvm/lock`.

This is a requirement for
metal-stack/helm-charts#64

Signed-off-by: Gerhard Lazu <gerhard@lazu.ch>
majst01 pushed a commit to metal-stack/csi-driver-lvm that referenced this pull request Apr 28, 2023
* Add Talos OS support

`/etc` on Talos OS is read-only, which means that new PVs will fail to
create. This change makes the hard-coded `/etc/lvm` configurable via the
`--hostwritepath` flag.

NOTE that this also changes the current `/run/lock/lvm` to
`/etc/lvm/lock`.

This is a requirement for
metal-stack/helm-charts#64

Signed-off-by: Gerhard Lazu <gerhard@lazu.ch>

* Create loop devices part of `make test`

After this change, people which expect integration tests to be
self-contained, will not be disappointed. It took me a while to figure
out why **some** integration tests were failing locally. I eventually
found out about this requirement in this doc page:
https://docs.metal-stack.io/stable/external/csi-driver-lvm/README/. The
GitHub Actions workflow also helped. Even then, the mknod command was
not mentioned anywhere. My NixOS host did not have these special files
/dev/loop100 & /dev/loop101 created. With this change, `make test` is
self-contained & it should work the same on all Linux hosts, whether
it's a local development workstation or running in GitHub Actions.

Speaking of GitHub Actions, we do not want to run the build-platforms
job if the DOCKER_REGISTRY_TOKEN secret is not set. If we don't check
for this, the job will fail in repo forks, where these secrets will not
be available by default. FWIW, `${{ secrets. }}` is not available in
`if` conditions. The secret value needs to be exposed as an env for the
`if` condition to work correctly. FTR:
https://github.com/orgs/community/discussions/26726

I also remembered to remove the loop devices part of `make
test-cleanup` & double-check that the loop device has been actually
removed. I have hit a situation where the file was deleted, but
/dev/loop100 was still left dangling. Had to `sudo dmsetup remove` it.

Lastly, Docker CLI is configured to ignore the *.img files. These are
created in the same directory and should not be sent to Docker when
running `docker build`.

Signed-off-by: Gerhard Lazu <gerhard@lazu.ch>

* Refactor tests

Remove all hard-coded sleeps **except** the last one, when we delete the
csi-lvm-controller, otherwise PVCs may not get deleted before the
controller is deleted. When this happens, the loop devices will not be
cleared correctly when running `make test-cleanup`.

We also want to test one thing per test, otherwise we may not know why a
test failed. We leverage `kubectl wait --for=jsonpath=` as much as
possible. This way the tests do not need to check for specific strings,
we let `--for=jsonpath=` do that. The best part with this approach is
that we can use the `--timeout` flag. This brings the **entire**
integration test suite duration to 70 seconds. Before this change, the
sleeps alone (170s) would take longer than that.

To double-check for race conditions or flaky tests, I ran all tests
locally 100 times with `RERUN=100 make test`. All 100 runs passed. This
looks good to me!

Separately, I have also tested this in Talos v1.4.0 running K8s 1.26.4.
Everything works as expected now. See this PR comment for more details:
#87 (comment)

Signed-off-by: Gerhard Lazu <gerhard@lazu.ch>

---------

Signed-off-by: Gerhard Lazu <gerhard@lazu.ch>
Resolves metal-stack/csi-driver-lvm#86

`/etc` on Talos OS is read-only, which means that this chart will fail
to install as is. The new `hostWritePath` option allows the default
`/etc/lvm` to be overwritten. For Talos OS, setting `hostWritePath:
/var/etc/lvm` fixes the install issue. This is just half the story.

After this installs, creating a PVs will fail since the controller has
some hard-coded paths starting with `/etc`. Here is the second half of
this fix: metal-stack/csi-driver-lvm#87

First step is to get that PR 👆 merged first, then update the image
versions in this one before merging this change.

Learn more about the Talos' file system:
https://www.talos.dev/v1.4/learn-more/architecture/

Signed-off-by: Gerhard Lazu <gerhard@lazu.ch>
@gerhard gerhard force-pushed the add-talos-support branch from 92d70e6 to fcf8221 Compare June 14, 2023 07:03
@gerhard gerhard marked this pull request as ready for review June 14, 2023 07:03
@gerhard gerhard requested a review from a team as a code owner June 14, 2023 07:03
@gerhard
Copy link
Contributor Author

gerhard commented Jun 14, 2023

I finally had a spare cycle to come back to this. On the positive side, I was able to test this change in production for a few months now. All working great!

Looking forward to getting this merged & a new helm chart version released so that I can stop using my fork of the chart.

@majst01
Copy link
Contributor

majst01 commented Jun 14, 2023

some linter errors occured ?

@gerhard
Copy link
Contributor Author

gerhard commented Jun 14, 2023

some linter errors occured ?

Fixed!

@Gerrit91 Gerrit91 merged commit 9edcc9e into metal-stack:master Jun 15, 2023
@gerhard gerhard deleted the add-talos-support branch June 26, 2023 08:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Read-only filesystems are not supported
3 participants