Skip to content

Commit

Permalink
Add support for Talos OS (#87)
Browse files Browse the repository at this point in the history
* Add Talos OS support

`/etc` on Talos OS is read-only, which means that new PVs will fail to
create. This change makes the hard-coded `/etc/lvm` configurable via the
`--hostwritepath` flag.

NOTE that this also changes the current `/run/lock/lvm` to
`/etc/lvm/lock`.

This is a requirement for
metal-stack/helm-charts#64

Signed-off-by: Gerhard Lazu <gerhard@lazu.ch>

* Create loop devices part of `make test`

After this change, people which expect integration tests to be
self-contained, will not be disappointed. It took me a while to figure
out why **some** integration tests were failing locally. I eventually
found out about this requirement in this doc page:
https://docs.metal-stack.io/stable/external/csi-driver-lvm/README/. The
GitHub Actions workflow also helped. Even then, the mknod command was
not mentioned anywhere. My NixOS host did not have these special files
/dev/loop100 & /dev/loop101 created. With this change, `make test` is
self-contained & it should work the same on all Linux hosts, whether
it's a local development workstation or running in GitHub Actions.

Speaking of GitHub Actions, we do not want to run the build-platforms
job if the DOCKER_REGISTRY_TOKEN secret is not set. If we don't check
for this, the job will fail in repo forks, where these secrets will not
be available by default. FWIW, `${{ secrets. }}` is not available in
`if` conditions. The secret value needs to be exposed as an env for the
`if` condition to work correctly. FTR:
https://github.com/orgs/community/discussions/26726

I also remembered to remove the loop devices part of `make
test-cleanup` & double-check that the loop device has been actually
removed. I have hit a situation where the file was deleted, but
/dev/loop100 was still left dangling. Had to `sudo dmsetup remove` it.

Lastly, Docker CLI is configured to ignore the *.img files. These are
created in the same directory and should not be sent to Docker when
running `docker build`.

Signed-off-by: Gerhard Lazu <gerhard@lazu.ch>

* Refactor tests

Remove all hard-coded sleeps **except** the last one, when we delete the
csi-lvm-controller, otherwise PVCs may not get deleted before the
controller is deleted. When this happens, the loop devices will not be
cleared correctly when running `make test-cleanup`.

We also want to test one thing per test, otherwise we may not know why a
test failed. We leverage `kubectl wait --for=jsonpath=` as much as
possible. This way the tests do not need to check for specific strings,
we let `--for=jsonpath=` do that. The best part with this approach is
that we can use the `--timeout` flag. This brings the **entire**
integration test suite duration to 70 seconds. Before this change, the
sleeps alone (170s) would take longer than that.

To double-check for race conditions or flaky tests, I ran all tests
locally 100 times with `RERUN=100 make test`. All 100 runs passed. This
looks good to me!

Separately, I have also tested this in Talos v1.4.0 running K8s 1.26.4.
Everything works as expected now. See this PR comment for more details:
#87 (comment)

Signed-off-by: Gerhard Lazu <gerhard@lazu.ch>

---------

Signed-off-by: Gerhard Lazu <gerhard@lazu.ch>
  • Loading branch information
gerhard authored Apr 28, 2023
1 parent 999b3ee commit 84661b5
Show file tree
Hide file tree
Showing 15 changed files with 131 additions and 100 deletions.
1 change: 1 addition & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
*.img
11 changes: 9 additions & 2 deletions .github/workflows/docker.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -55,8 +55,6 @@ jobs:

- name: Test
run: |
for i in 100 101; do fallocate -l 1G loop${i}.img ; sudo losetup /dev/loop${i} loop${i}.img; done
sudo losetup -a
make test
build-platforms:
Expand All @@ -65,33 +63,41 @@ jobs:
needs:
- lint
- test
env:
DOCKER_REGISTRY_TOKEN: ${{ secrets.DOCKER_REGISTRY_TOKEN }}

steps:
- name: Log in to the container registry
if: ${{ env.DOCKER_REGISTRY_TOKEN != '' }}
uses: docker/login-action@v2
with:
registry: ${{ env.REGISTRY }}
username: ${{ secrets.DOCKER_REGISTRY_USER }}
password: ${{ secrets.DOCKER_REGISTRY_TOKEN }}

- name: Checkout
if: ${{ env.DOCKER_REGISTRY_TOKEN != '' }}
uses: actions/checkout@v3

- name: Set up Go 1.19
if: ${{ env.DOCKER_REGISTRY_TOKEN != '' }}
uses: actions/setup-go@v3
with:
go-version: 1.19

- name: Set up Docker Buildx
if: ${{ env.DOCKER_REGISTRY_TOKEN != '' }}
uses: docker/setup-buildx-action@v2

- name: Make tag
if: ${{ env.DOCKER_REGISTRY_TOKEN != '' }}
run: |
[ "${GITHUB_EVENT_NAME}" == 'pull_request' ] && echo "tag=${GITHUB_HEAD_REF##*/}" >> $GITHUB_ENV || true
[ "${GITHUB_EVENT_NAME}" == 'release' ] && echo "tag=${GITHUB_REF##*/}" >> $GITHUB_ENV || true
[ "${GITHUB_EVENT_NAME}" == 'push' ] && echo "tag=latest" >> $GITHUB_ENV || true
- name: Build and push image
if: ${{ env.DOCKER_REGISTRY_TOKEN != '' }}
uses: docker/build-push-action@v3
with:
context: .
Expand All @@ -100,6 +106,7 @@ jobs:
platforms: linux/amd64,linux/arm64,linux/arm/v7

- name: Build and push provisioner image
if: ${{ env.DOCKER_REGISTRY_TOKEN != '' }}
uses: docker/build-push-action@v3
with:
context: .
Expand Down
36 changes: 31 additions & 5 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,25 @@ build-plugin:
build-provisioner:
docker build -t csi-driver-lvm-provisioner . -f cmd/provisioner/Dockerfile

.PHONY: test
test: build-plugin build-provisioner
/dev/loop%:
@fallocate --length 1G loop$*.img
ifndef GITHUB_ACTIONS
@sudo mknod $@ b 7 $*
endif
@sudo losetup $@ loop$*.img
@sudo losetup $@

rm-loop%:
@sudo losetup -d /dev/loop$* || true
@! losetup /dev/loop$*
@sudo rm -f /dev/loop$*
@rm loop$*.img
# If removing this loop device fails, you may need to:
# sudo dmsetup info
# sudo dmsetup remove <DEVICE_NAME>

.PHONY: kind
kind:
@if ! which kind > /dev/null; then echo "kind needs to be installed"; exit 1; fi
@if ! kind get clusters | grep csi-driver-lvm > /dev/null; then \
kind create cluster \
Expand All @@ -40,15 +57,24 @@ test: build-plugin build-provisioner
--kubeconfig $(KUBECONFIG); fi
@kind --name csi-driver-lvm load docker-image csi-driver-lvm
@kind --name csi-driver-lvm load docker-image csi-driver-lvm-provisioner

.PHONY: rm-kind
rm-kind:
@kind delete cluster --name csi-driver-lvm

RERUN ?= 1
.PHONY: test
test: build-plugin build-provisioner /dev/loop100 /dev/loop101 kind
@cd tests && docker build -t csi-bats . && cd -
@for i in {1..$(RERUN)}; do \
docker run -i$(DOCKER_TTY_ARG) \
-e HELM_REPO=$(HELM_REPO) \
-v "$(KUBECONFIG):/root/.kube/config" \
-v "$(PWD)/tests:/code" \
--network host \
csi-bats \
--verbose-run --trace --timing bats/test.bats
--verbose-run --trace --timing bats/test.bats ; \
done

.PHONY: test-cleanup
test-cleanup:
@kind delete cluster --name csi-driver-lvm
test-cleanup: rm-loop100 rm-loop101 rm-kind
3 changes: 2 additions & 1 deletion cmd/lvmplugin/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ func init() {

var (
endpoint = flag.String("endpoint", "unix://tmp/csi.sock", "CSI endpoint")
hostWritePath = flag.String("hostwritepath", "/etc/lvm", "host path where config, cache & backups will be written to")
driverName = flag.String("drivername", "lvm.csi.metal-stack.io", "name of the driver")
nodeID = flag.String("nodeid", "", "node id")
ephemeral = flag.Bool("ephemeral", false, "publish volumes in ephemeral mode even if kubelet did not ask for it (only needed for Kubernetes 1.15)")
Expand Down Expand Up @@ -68,7 +69,7 @@ func main() {
}

func handle() {
driver, err := lvm.NewLvmDriver(*driverName, *nodeID, *endpoint, *ephemeral, *maxVolumesPerNode, version, *devicesPattern, *vgName, *namespace, *provisionerImage, *pullPolicy)
driver, err := lvm.NewLvmDriver(*driverName, *nodeID, *endpoint, *hostWritePath, *ephemeral, *maxVolumesPerNode, version, *devicesPattern, *vgName, *namespace, *provisionerImage, *pullPolicy)
if err != nil {
fmt.Printf("Failed to initialize driver: %s\n", err.Error())
os.Exit(1)
Expand Down
6 changes: 5 additions & 1 deletion pkg/lvm/controllerserver.go
Original file line number Diff line number Diff line change
Expand Up @@ -36,14 +36,15 @@ type controllerServer struct {
nodeID string
devicesPattern string
vgName string
hostWritePath string
kubeClient kubernetes.Clientset
provisionerImage string
pullPolicy v1.PullPolicy
namespace string
}

// NewControllerServer
func newControllerServer(ephemeral bool, nodeID string, devicesPattern string, vgName string, namespace string, provisionerImage string, pullPolicy v1.PullPolicy) (*controllerServer, error) {
func newControllerServer(ephemeral bool, nodeID string, devicesPattern string, vgName string, hostWritePath string, namespace string, provisionerImage string, pullPolicy v1.PullPolicy) (*controllerServer, error) {
if ephemeral {
return &controllerServer{caps: getControllerServiceCapabilities(nil), nodeID: nodeID}, nil
}
Expand All @@ -69,6 +70,7 @@ func newControllerServer(ephemeral bool, nodeID string, devicesPattern string, v
}),
nodeID: nodeID,
devicesPattern: devicesPattern,
hostWritePath: hostWritePath,
vgName: vgName,
kubeClient: *kubeClient,
namespace: namespace,
Expand Down Expand Up @@ -137,6 +139,7 @@ func (cs *controllerServer) CreateVolume(ctx context.Context, req *csi.CreateVol
kubeClient: cs.kubeClient,
namespace: cs.namespace,
vgName: cs.vgName,
hostWritePath: cs.hostWritePath,
}
if err := createProvisionerPod(ctx, va); err != nil {
klog.Errorf("error creating provisioner pod :%v", err)
Expand Down Expand Up @@ -197,6 +200,7 @@ func (cs *controllerServer) DeleteVolume(ctx context.Context, req *csi.DeleteVol
kubeClient: cs.kubeClient,
namespace: cs.namespace,
vgName: cs.vgName,
hostWritePath: cs.hostWritePath,
}
if err := createProvisionerPod(ctx, va); err != nil {
klog.Errorf("error creating provisioner pod :%v", err)
Expand Down
13 changes: 8 additions & 5 deletions pkg/lvm/lvm.go
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ type Lvm struct {
nodeID string
version string
endpoint string
hostWritePath string
ephemeral bool
maxVolumesPerNode int64
devicesPattern string
Expand Down Expand Up @@ -76,6 +77,7 @@ type volumeAction struct {
kubeClient kubernetes.Clientset
namespace string
vgName string
hostWritePath string
}

const (
Expand All @@ -93,7 +95,7 @@ var (
)

// NewLvmDriver creates the driver
func NewLvmDriver(driverName, nodeID, endpoint string, ephemeral bool, maxVolumesPerNode int64, version string, devicesPattern string, vgName string, namespace string, provisionerImage string, pullPolicy string) (*Lvm, error) {
func NewLvmDriver(driverName, nodeID, endpoint string, hostWritePath string, ephemeral bool, maxVolumesPerNode int64, version string, devicesPattern string, vgName string, namespace string, provisionerImage string, pullPolicy string) (*Lvm, error) {
if driverName == "" {
return nil, fmt.Errorf("no driver name provided")
}
Expand Down Expand Up @@ -123,6 +125,7 @@ func NewLvmDriver(driverName, nodeID, endpoint string, ephemeral bool, maxVolume
version: vendorVersion,
nodeID: nodeID,
endpoint: endpoint,
hostWritePath: hostWritePath,
ephemeral: ephemeral,
maxVolumesPerNode: maxVolumesPerNode,
devicesPattern: devicesPattern,
Expand All @@ -139,7 +142,7 @@ func (lvm *Lvm) Run() error {
// Create GRPC servers
lvm.ids = newIdentityServer(lvm.name, lvm.version)
lvm.ns = newNodeServer(lvm.nodeID, lvm.ephemeral, lvm.maxVolumesPerNode, lvm.devicesPattern, lvm.vgName)
lvm.cs, err = newControllerServer(lvm.ephemeral, lvm.nodeID, lvm.devicesPattern, lvm.vgName, lvm.namespace, lvm.provisionerImage, lvm.pullPolicy)
lvm.cs, err = newControllerServer(lvm.ephemeral, lvm.nodeID, lvm.devicesPattern, lvm.vgName, lvm.hostWritePath, lvm.namespace, lvm.provisionerImage, lvm.pullPolicy)
if err != nil {
return err
}
Expand Down Expand Up @@ -360,7 +363,7 @@ func createProvisionerPod(ctx context.Context, va volumeAction) (err error) {
Name: "lvmbackup",
VolumeSource: v1.VolumeSource{
HostPath: &v1.HostPathVolumeSource{
Path: "/etc/lvm/backup",
Path: filepath.Join(va.hostWritePath, "backup"),
Type: &hostPathType,
},
},
Expand All @@ -369,7 +372,7 @@ func createProvisionerPod(ctx context.Context, va volumeAction) (err error) {
Name: "lvmcache",
VolumeSource: v1.VolumeSource{
HostPath: &v1.HostPathVolumeSource{
Path: "/etc/lvm/cache",
Path: filepath.Join(va.hostWritePath, "cache"),
Type: &hostPathType,
},
},
Expand All @@ -378,7 +381,7 @@ func createProvisionerPod(ctx context.Context, va volumeAction) (err error) {
Name: "lvmlock",
VolumeSource: v1.VolumeSource{
HostPath: &v1.HostPathVolumeSource{
Path: "/run/lock/lvm",
Path: filepath.Join(va.hostWritePath, "lock"),
Type: &hostPathType,
},
},
Expand Down
Loading

0 comments on commit 84661b5

Please sign in to comment.