Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Spin and Go versions for actions #217

Merged

Conversation

kate-goldenring
Copy link
Collaborator

We are seeing some CI errors seemingly due to too old of Go and TinyGo. This also bumps Spin version in CI.

Signed-off-by: Kate Goldenring <kate.goldenring@fermyon.com>
@kate-goldenring
Copy link
Collaborator Author

@Mossaka I am a little lost at why the tests are failing. I can build the images locally. The cluster logs show that the spin app is having a hard time connecting with the MQTT broker (but there are failures in HTTP trigger apps too):

time="2024-10-29T04:01:23.914095692Z" level=info msg="StartContainer for \"83894a26ccd9b450c7ba18df23b946c5d7214d797c1e85ae570ec7cf69f5a2f5\" returns successfully"
time="2024-10-29T04:01:24.820250695Z" level=error msg="run_wasi ERROR >>>  failed: failed to connect to 'mqtt://10.43.174.211:1883'

Caused by:
    [-1] TCP/TLS connect failure"

I wonder if there is an issue with the k3d cluster certs

@kate-goldenring
Copy link
Collaborator Author

kate-goldenring commented Oct 31, 2024

new hypothesis: the tests are failing because the runner is running out of space to execute the containers. Testing by no longer using an in cluster EMQX broker (250+ MB) and instead using test.mosquitto.org with a very unique topic

Message:          The node was low on resource: ephemeral-storage. Threshold quantity: 3892562797, available: 3696304Ki. Container redis was using 36Ki, request is 0, has larger consumption of ephemeral-storage. 

Signed-off-by: Kate Goldenring <kate.goldenring@fermyon.com>
@kate-goldenring
Copy link
Collaborator Author

MQTT test without local broker is passing but we are still failing to get all apps running:

NAME                                        READY   STATUS    RESTARTS   AGE    IP          NODE                        NOMINATED NODE   READINESS GATES
spin-outbound-redis-584db675d7-ljlxt        1/1     Running   0          2m     10.42.2.6   k3d-test-cluster-server-0   <none>           <none>
spin-static-assets-67b7b5878b-7ztfx         1/1     Running   0          2m     10.42.0.7   k3d-test-cluster-agent-0    <none>           <none>
wasm-spin-5f4cfb8886-5q4kf                  1/1     Running   0          2m     10.42.0.5   k3d-test-cluster-agent-0    <none>           <none>
spin-keyvalue-7995f66b8d-mp9d5              1/1     Running   0          2m     10.42.0.6   k3d-test-cluster-agent-0    <none>           <none>
spin-mqtt-message-logger-6b57c956c5-5546v   1/1     Running   0          2m     10.42.1.6   k3d-test-cluster-agent-1    <none>           <none>
spin-multi-trigger-app-7f7586b9b-5dzmb      0/1     Error     0          2m     10.42.1.5   k3d-test-cluster-agent-1    <none>           <none>
redis                                       1/1     Running   0          2m1s   10.42.2.5   k3d-test-cluster-server-0   <none>           <none>

This time it is the redis trigger component (which is a part of spin-multi-trigger-app-7f7586b9b-5dzmb). There is likely a race case where the spin app is starting before the redis one based on age ^^

Signed-off-by: Kate Goldenring <kate.goldenring@fermyon.com>
@Mossaka
Copy link
Member

Mossaka commented Nov 1, 2024

here is likely a race case where the spin app is starting before the redis one based on age ^^

Interesting, maybe we should wait longer to allow k8s to restart failed containers?

.gitignore Outdated
test/*

.vscode/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer we keep IDE specific settings to local computer, not upstream.

Signed-off-by: Kate Goldenring <kate.goldenring@fermyon.com>
@kate-goldenring
Copy link
Collaborator Author

Go evicted pods again do to resource constraints. Trying again with only one agent in k3d cluster. If this doesn't resolve it, i think we should look into if GH runners reduced their default size. Im confused by the suddenness of this. I wont be able to get back to this for a couple days so fingers crossed this does the trick

@kate-goldenring
Copy link
Collaborator Author

We are still getting disk pressure:

Reason:              Evicted
Message:             Pod was rejected: The node had condition: [DiskPressure]. 

@kate-goldenring
Copy link
Collaborator Author

Next test is modifying the kubelet eviction threshold https://k3d.io/v5.4.2/faq/faq/#pods-evicted-due-to-lack-of-disk-space

Signed-off-by: Kate Goldenring <kate.goldenring@fermyon.com>
Signed-off-by: Kate Goldenring <kate.goldenring@fermyon.com>
Signed-off-by: Kate Goldenring <kate.goldenring@fermyon.com>
@kate-goldenring
Copy link
Collaborator Author

@Mossaka @devigned @radu-matei @jsturtevant can I have one more look at this? Finally got CI passing by doing to rust and docker cleanup before we run the tests

@kate-goldenring kate-goldenring merged commit 62f6197 into spinkube:main Nov 5, 2024
8 checks passed
@kate-goldenring kate-goldenring deleted the update-tinygo-and-spin branch November 5, 2024 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants