Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upload core dumps from failed tests in CI #1850

Merged
merged 3 commits into from
Nov 2, 2023
Merged

upload core dumps from failed tests in CI #1850

merged 3 commits into from
Nov 2, 2023

Conversation

spoonincode
Copy link
Member

Core dumps will now be uploaded from crashing tests in CI. Resolves #640. Example of failure and uploaded files can be found here,
https://github.com/AntelopeIO/leap/actions/runs/6724910321
(I trimmed the run down to just Tests & NP Tests on a single platform)

While I didn't change anything on the ENF runners to accomplish this, it's worth noting that this probably only works with ENF runners because of their behavior to put core dumps in to /var/lib/systemd/coredump. Something like Github's runners don't do that. While I like to keep the workflows runner-agnostic so that forks could potentially work in the future, afaik this won't break the workflow on other runners; you'll just miss out on the core dumps.

Coming along for the ride in some separate commits were a couple modifications to use node20 instead of node16. In some of our repos starting to notice warnings about using node16, for example: https://github.com/AntelopeIO/bls12-381/actions/runs/6633721660 (I'm not sure why we haven't seen warnings in leap; possibly a slow rollout by Github?). So,

  • upgrade to actions/checkout@v4, and
  • upgrade parallel-ctest-containers to use node20

@spoonincode spoonincode added the CICD Anything dealing with the CI workflow behavior label Nov 1, 2023
Base automatically changed from cse=no_5x to main November 1, 2023 20:57
@spoonincode spoonincode changed the title [5.0] upload core dumps from failed tests in CI upload core dumps from failed tests in CI Nov 1, 2023
@@ -181,12 +188,17 @@ jobs:
log-tarball-prefix: ${{matrix.cfg.name}}
tests-label: nonparallelizable_tests
test-timeout: 420
- name: Export core dumps
run: docker run --mount type=bind,source=/var/lib/systemd/coredump,target=/cores alpine sh -c 'tar -C /cores/ -c .' | tar x
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this looks weird, and to add more confusion it's different than the parallel tests job because NP/LR tests don't run the job in a container until parallel-ctest-containers step.

On ENF runners the user is not root but rather a user named enf that does not have sudo privileges (this is unlike GitHub runners). But /var/lib/systemd/coredump files are only readable by root. Since docker is not creating a userns in this case, we can use docker as a cheat to get access to the files without sudo and without addtional modificatons on the ENF runners.

@spoonincode spoonincode merged commit 0b5a37b into main Nov 2, 2023
29 checks passed
@spoonincode spoonincode deleted the ci_core_5x branch November 2, 2023 02:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD Anything dealing with the CI workflow behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CICD: Capture core dump files from CI runs
3 participants