-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maximise GH runner space
step failing in some repositories CIs
#813
Comments
Thank you for reporting us your feedback! The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5300.
|
Maximise GH runner space
step failing in some repositories' CIsMaximise GH runner space
step failing in some repositories CIs
There's a similar issue filed in the action's repo easimon/maximize-build-space#38 |
Looking at the runner's storage with the different ubuntu image version:
image version
it seems that the distribution of space on the runner has changed, where:
so the action is no longer able to free on root filesystem Note that the freed up space at the end should not be affected, because the extra space in the temp disk will be utilized |
I tested with it would be ideal if we can make the input dynamic, but I'm not sure if that's feasible |
Thanks for looking into it @NohaIhab ! I think the approach is right, we need to decrease the |
Due to a change in the GH runners storage, we cannot longer reserve more than ~30GB of storage with the easimon/maximize-build-space action, as we'd be hitting the following error: fallocate: invalid length value specified This commit changes the value to 29696. Part of canonical/bundle-kubeflow#813
Due to a change in the GH runners storage, we cannot longer reserve more than ~30GB of storage with the easimon/maximize-build-space action, as we'd be hitting the following error: fallocate: invalid length value specified This commit changes the value to 29696 to avoid issues. Part of canonical/bundle-kubeflow#813
ci: Decrease root-reserve-mb to fit the new runner storage addresses canonical/bundle-kubeflow#813 Co-authored-by: Noha Ihab <49988746+NohaIhab@users.noreply.github.com>
* ci: Decrease root-reserve-mb to fit the new runner storage (#223) to fix canonical/bundle-kubeflow#813 * fix: use mysql-k8s `8.0/edge` in the integration tests and bundles (#225) * fix: use mysql-k8s edge * fix: add comment to revert to stable
I still see the issue e.g. in canonical/kfp-operators#416 looks like there is not enough disk space even after the @NohaIhab PR. I debugged by ssh into the worker after the failed tests. Disk space is almost completely exhausted
I can also see pods not being scheduled because of
Did more digging and the main problem are the images in microk8s which we use in kfp-bundle tests which take 15Gbs of disk space. Some of these images are deployed multiple times as pods. |
In investigating canonical/kfp-operators#426 I noticed how the
log snippet from disk space stepRun echo "Memory and swap:" echo "Available storage:" NAME TYPE SIZE USED PRIO Available storage: log snippet from disk space stepRun echo "Memory and swap:" NAME TYPE SIZE USED PRIO Available storage: Probably this happened at the same time as the GH runner change that is discussed in this issue. |
fwiw, I've found the jlumbroso/free-disk-space action with default settings is working better, leaving a runner with ~45GB free after execution. An example of this is in kfp's tests |
Use jlumbroso/free-disk-space@v1.3.1 Ref canonical/bundle-kubeflow#813
I haven't seen this issue anymore, and we changed a lot of CIs around. Will close it because all our CIs that use the mazimise runner space action are not failing anymore (see all the attached PRs and commits). |
Bug Description
Running the automated integration tests in the CI on a PR is not possible as the
Maximise GH runner space
step is failing with the following message:To Reproduce
Create a PR in any of the Charmed Kubeflow owned repositories where the aforementioned step is used.
Environment
CI environment.
Relevant Log Output
Besides the one provided already, you can refer to this error
Affected repositories (from PRs)
The text was updated successfully, but these errors were encountered: