Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable GitHub action to test the patches as early and as often as possible #21

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

coiby
Copy link
Member

@coiby coiby commented Jul 3, 2024

This patch set enables GitHub Action to test the patches whenever there is a pull request (PR) or a change to the PR. By testing the patches as early and as often as possible, we can prevent problems or catch problems early.

Four kinds of tests will be triggered automatically,

For integration tests, the tests are now re-written in TMT. TMT (Test Management Tool) supports running tests on virtual machines and provides an user-friendly interface (YAML) to define tasks like installing packages, running commands on the virtual machines. By using TMT, we no longer need to spend time and efforts on maintaining the current test framework and can also enjoy rich handy features like managing tests provided by TMT.

@coiby coiby changed the title Enable GitHub action to test the patches as early and as often as possible [RFC] Enable GitHub action to test the patches as early and as often as possible Jul 3, 2024
Copy link
Collaborator

@Barathrum-Liu Barathrum-Liu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this change.

To be honest, I don't have enough knowledge about github actions to confirm the correctness of this PR. But I think it will not affect the functionality of kdump-utils itself, I am willing to give a try.

Copy link
Collaborator

@prudo1 prudo1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @coiby,
I second @Barathrum-Liu, it's great to have those tests run as early as possible. I'm a little bit concerned about all those hard coded version being used in the tests. Is there a way to stick with some official packages instead? Or maybe let all tests run on the same container, so we only need to update the container?
Thanks
Philipp

Comment on lines 9 to 12
- uses: actions/checkout@v2
- run: wget https://github.com/mvdan/sh/releases/download/v3.4.3/shfmt_v3.4.3_linux_amd64 -O /usr/local/bin/shfmt && chmod +x /usr/local/bin/shfmt
- run: shfmt -d *.sh kdumpctl mk*dumprd
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason why you download shfmt directly from github instead of simply installing it via the official repo? If so could you please add a comment why you are doing this.
I'm afraid with all the hard coded versions it will be pretty painful to maintain these tests as it would require to manually update them regularly when we run into problems...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising the concern! I downloaded shfmt from github mainly wanting to use latest shfmt. This patch dates back to one or two years ago and that time latest shfmt is v3.4.3 but the default ubuntu image provided by github uses a much older version (in fact Github's latest ubuntu is 22.04 and still has v3.4.3). Now for new version, I use Fedora's latest docker image which ships v3.7.0 (the latest upstream version is 3.8.0) instead. So we can use a relatively new shfmt without maintaining the Github Action .yml file.

Comment on lines 15 to 17
steps:
- uses: actions/checkout@v2
- run: curl -L -O https://github.com/koalaman/shellcheck/releases/download/v0.8.0/shellcheck-v0.8.0.linux.x86_64.tar.xz && tar -xJf shellcheck-v0.8.0.linux.x86_64.tar.xz
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same like for shfmt

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For shellcheck, I also use Fedora's latest docker image which ships shellcheck:0.9.0 (the latest upstream version is v0.10.0) in new version now.

USE_GUESTMOUNT=1
fi

KUMP_TEST_QEMU_TIMEOUT=20m USE_GUESTMOUNT=$USE_GUESTMOUNT BASE_IMAGE=/usr/share/cloud_images/Fedora-Cloud-Base-Generic.x86_64-40-1.14.qcow2 RELEASE=$VERSION_ID make test-run
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when F40 runs out of service? Do we need to provide a new docker image then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current tests just fail F40 and they are difficult to fix. So I decide to abandon our own testing framework and choose TMT instead. Now the code to maintain is significantly less

47 files changed, 519 insertions(+), 1393 deletions(-)

For new approach, and there is no need to maintain the docker image.

Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
@coiby coiby force-pushed the github_action branch 2 times, most recently from 1b73bcd to 267c113 Compare August 1, 2024 01:24
Currently, all integration tests fail F40 because of failing to install
package and enable services on the cloud image properly. TMT (Test Management
Tool) supports running tests on virtual machines and provides an
user-friendly interface (YAML) to define tasks like installing packages,
running commands on the virtual machines. By using TMT, we no longer
need to spend time and efforts on maintaining the current test framework
and can also enjoy rich handy features like managing tests provided by TMT.

There are six test steps in TMT. Quoting [1], here is an overview,
 - discover
   Gather information about test cases to be executed.

 - provision
   Provision an environment for testing or use localhost.

 - prepare
   Prepare the environment for testing.

 - execute
   Run tests using the specified executor.

 - report
   Provide test results overview and send reports.

 - finish
   Perform the finishing tasks and clean up provisioned guests.

And each step step is configurable via the so-called plans [2]. TMT also
uses plans to group relevant tests and select needed tests.

This commit adds the test of sending crash dump to a NFS target by
setting two VMs using TMT's MultiHost Testing feature [3],
 - client
   Setup up NFS dump target and trigger crash
 - server
   acts as an NFS server and check sent vmcore

The scaffold of TMT test project can be created as follows,

    mkdir tests && cd tests
    tmt init --template mini
    mv plans/example/{example,kdump}.fmf

    tmt tests create /setup/default_crashkernel -t beakerlib
    tmt tests create /setup/nfs_server -t beakerlib
    tmt tests create /test/trigger_crash -t beakerlib
    tmt tests create /test/check_vmcore -t beakerlib

Then we fill in the content for created plans and tests,

    # tmt run -a
    /var/tmp/tmt/run-017

    /plans/kdump
        discover
            how: fmf
            name: client-setup
            directory: /root/kdump-tests
            tests: /setup/kdump
            how: fmf
            name: server-setup
    ...
            execute task rhkdump#4: server-test on server
            how: tmt

            summary: 4 tests executed
        report
            how: display
            summary: 4 tests passed
        finish

An main.fmf is put under the root directory of tests and setup for other
tests to inherit [7].

[1] https://tmt.readthedocs.io/en/stable/overview.html#run
[2] https://tmt.readthedocs.io/en/stable/spec/plans.html
[3] https://tmt.readthedocs.io/en/stable/guide.html#multihost-testing
[7] https://tmt.readthedocs.io/en/stable/guide.html#inheritance

Signed-off-by: Coiby Xu <coxu@redhat.com>
@coiby
Copy link
Member Author

coiby commented Aug 1, 2024

Hi @Barathrum-Liu and @prudo1,

Thanks for the encouragements! I've pushed a new version and the biggest change is I abandon current testing framework due to some difficult issues and switch to TMT.

@coiby coiby changed the title [RFC] Enable GitHub action to test the patches as early and as often as possible Enable GitHub action to test the patches as early and as often as possible Aug 1, 2024
coiby added 10 commits August 1, 2024 20:27
To make it easier to troubleshot test failures, set root password as
kdump. Then we can use "virsh console VM" to have console access to a
VM.

Note we can also have ssh access to the VMs using tmt [1]. For example,
"tmt run login --step prepare" allows you to access the VM at the
prepare step.

[1] https://tmt.readthedocs.io/en/stable/examples.html#guest-login

Signed-off-by: Coiby Xu <coxu@redhat.com>
Currently the test fails with the following logs,

                    kdump: Starting kdump: [OK]
                    :: [ 10:08:20 ] :: [   PASS   ] :: Command 'kdumpctl restart' (Expected 0, got 0)
                    :: [ 10:08:20 ] :: [  BEGIN   ] :: Running 'echo 1 > /proc/sys/kernel/sysrq'
                    :: [ 10:08:20 ] :: [   PASS   ] :: Command 'echo 1 > /proc/sys/kernel/sysrq' (Expected 0, got 0)
                    client_loop: send disconnect: Broken pipe
                00:00:28 errr /client-test/tests/client (on client) (beakerlib: State 'started') [1/1]
                journal.txt: /var/tmp/tmt/run-035/plans/kdump/execute/data/guest/client/client-test/tests/client-4/journal.txt
        summary: 4 tests passed and 1 error

Use "tmt-reboot -c" to trigger a kernel panic instead.

Signed-off-by: Coiby Xu <coxu@redhat.com>
Currently it takes about half an hour to finish the test on
dell-per340-03. If we use a faster mirror, e.g.
https://mirrors.tuna.tsinghua.edu.cn/fedora, only 4m is needed.

Support using custom mirror by "tmt run --environment" e.g.
tmt run --environment CUSTOM_MIRROR=https://mirrors.tuna.tsinghua.edu.cn/fedora ...

Note fedora-cisco-openh264 is disabled because it's not needed.

Signed-off-by: Coiby Xu <coxu@redhat.com>
We can use "tmt run --environment KDUMP_UTILS_RPM=PATH_TO_RPM" to install
custom built kdump-utils.

Signed-off-by: Coiby Xu <coxu@redhat.com>
For local vmcore dumping, we only need to trigger a kernel crash and
check if vmcore exists and it can re-use some setups and tests from NFS
dumping.

Note
1. plans/main.fmf is created for NFS and local dumping test plans to
   inherit .
2. Virtual tests [1] are created so local dumping can re-use the code of
   checking vmcore on an NFS server
3. The local test is disabled for now due to [2]
4. exit-first=true [3] is to make the executor stop executing tests once
   a test failure is encountered. But TMT currently has the problem of
   exit-first not applying across discover phase boundaries [4]. So for local
   dumping test plan, simply remove multiple discover phases [5].

[1] https://tmt.readthedocs.io/en/stable/guide.html#virtual-tests
[2] https://bugzilla.redhat.com/show_bug.cgi?id=2270423
[3] https://tmt.readthedocs.io/en/stable/spec/plans.html#exit-first
[4] teemtee/tmt#3116
[5] https://tmt.readthedocs.io/en/stable/guide.html#multihost-testing

Signed-off-by: Coiby Xu <coxu@redhat.com>
TMT stores server SSH key in
/var/tmp/tmt/run-*/plans/ssh/provision/server/id_ecdsa.

Signed-off-by: Coiby Xu <coxu@redhat.com>
To avoid triggering the kernel panic repeatedly, this test use kexec
reboot to only boot into the initrd with early kdump enabled once.

Note BeakerLib somehow breaks this test (after triggering a kernel
panic, the machine doesn't boot the kdump kernel) so it gets removed.

Signed-off-by: Coiby Xu <coxu@redhat.com>
Use hardware.disk[1] to specify an additional disk for LVM2 thin
provision as follows,

    provision:
       - name: client
         how: virtual
         connection: system
         hardware:
            disk:
             - size: = 40GB
             - size: = 1GB

Note tmt doesn't support specify a size of smaller unit like 300MB.

[1] https://tmt.readthedocs.io/en/stable/spec/hardware.html#disk

Signed-off-by: Coiby Xu <coxu@redhat.com>
Somehow the dmesg log extracted by makedumpfile vmcore-dmesg.2.txt has
"[      T[0-9]]" like follows,

    --- /var/crash/192.168.122.27-2024-07-30-21:01:49/vmcore-dmesg.txt      2024-07-30 21:01:51.865079103 +0000
    +++ /var/crash/192.168.122.27-2024-07-30-21:01:49/vmcore-dmesg.txt.2    2024-07-30 21:02:05.504060793 +0000
    @@ -1,830 +1,830 @@
    -[    0.000000] Linux version 6.8.5-301.fc40.x86_64 (mockbuild@0bc0cc78c12e4762acf61c209bd02e96) (gcc (GCC) 14.0.1 20240328 (Red Hat 14.0.1-0), GNU ld version 2.41-34.fc40) #1 SMP PREEMPT_DYNAMIC Thu Apr 11 20:00:10 UTC 2024
    -[    0.000000] Command line: BOOT_IMAGE=(hd0,gpt3)/vmlinuz-6.8.5-301.fc40.x86_64 no_timer_check net.ifnames=0 console=tty1 console=ttyS0,115200n8 root=UUID=8424dfd0-f878-4302-8ee5-b2ec6e5eb868 rootflags=subvol=root crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M
    -[    0.000000] BIOS-provided physical RAM map:
    -[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
    -[    0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
    -[    0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
    ...
    +[    0.000000] [      T0] Linux version 6.8.5-301.fc40.x86_64 (mockbuild@0bc0cc78c12e4762acf61c209bd02e96) (gcc (GCC) 14.0.1 20240328 (Red Hat 14.0.1-0), GNU ld version 2.41-34.fc40) #1 SMP PREEMPT_DYNAMIC Thu Apr 11 20:00:10 UTC 2024
    +[    0.000000] [      T0] Command line: BOOT_IMAGE=(hd0,gpt3)/vmlinuz-6.8.5-301.fc40.x86_64 no_timer_check net.ifnames=0 console=tty1 console=ttyS0,115200n8 root=UUID=8424dfd0-f878-4302-8ee5-b2ec6e5eb868 rootflags=subvol=root crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M
    +[    0.000000] [      T0] BIOS-provided physical RAM map:
    +[    0.000000] [      T0] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
    +[    0.000000] [      T0] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
    +[    0.000000] [      T0] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved

Signed-off-by: Coiby Xu <coxu@redhat.com>
With this patch, four kinds of tests are triggered when there is a
pull request created in a Github kexec-tools repo,
   - format check using shfmt
   - static analysis using shellcheck
   - ShellSpec unit tests (test cases in spec/)
   - integration tests (tests cases in tests/)

Signed-off-by: Coiby Xu <coxu@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants