Quality improvements and best current practices #3

SmithChart · 2024-09-11T07:57:18Z

This PR adds lots of suggestions done by @Bastian-Krause aiming at improving the overall quality of the test suite.

Our RAUC install will not change the partition layout during install. Thus there is no need to check the size of the same partitions in both slots.

Using `lsblk --json` we can directly parse the tools output and must not rely on parsing shell output.

Using `findmnt --json` we can directly parse the tools output. This is considered better practice than parsing the shell output of `df`.

The previous test very bluntly only tested if the I2C-Subsystem in the kernel has been loaded at all. But this does not tell us anything about the actual devices on the bus we want to use. The new tests instead try to use the I2C devices the way userspace would. Actually none of the I2C devices are directly used by userspace: * The EEPROMs are read by the bootloader and their information are added to the devicetree's chosen_node. But for the EEPROMs we can at least do a read, so we can check if this works. * The PMIC is used inside the kernel. So we will only check the hardware has been probed. * The USB-Hub is also used inside the kernel So we will only check the hardware has been probed.

The previous test very bluntly only tested if the SPI-subsystem in the kernel has been loaded at all. But this does not tell us anything about the actual devices on the bus we want to use. The new tests instead try to use the SPI devices the way userspace would. * The Ethernet switch is managed by DSA. By successfully reading the statistics we can assume that the SPI communication to the Ethernet Switch works. * The ADC is managed by IIO. By successfully reading a non-zero ADC reading we can assume that we are reading actual values from the ADC. * The LCD is managed by drm. We only check if the correct driver has been probed.

Without the mmc the system would not boot in the first place. Additionally we have tests in `test_filesystems.py` that check if the partitions on the mmc have been probed correctly. If these succeed we can assume that mmc-subsystem was loaded correctly.

This is just another representation of the configuration EEPROMs present on i2c-0 and i2c2. Both are not used by userspace directly. So there is no need to check if these nodes in `/sys` exist.

Until now we only tested, if the endpoint provided a somewhat useful value. By also comparing the value to the output of `sensors` we can check the complete `tacd` stack.

The original test case contained a lot of code to handle different format strings. But it's unclear why that all that complexity was actually needed. This change removes all the complex format handling and reduces it to a bare minimum for this DUT.

In some places we have been using `stdout,_ rc = shell.run()` followed by `assert rc == 0`. The better way to do this is by using `stdout = shell.run_check()` since this removes a bit of boilerplate in this places.

By disconnecting the USB drive during the test we can be sure that Linux has dropped all buffers for the USB storage. And thus that this test not only tests buffers somewhere in the kernel without actually writing data to the USB drive. Since we already have the _eet_ connected to the TAC we can simply use it to disconnect and connect the USB-stick during the test.

This test makes sure the third state (On, Off, OffFloating) of the TACs power switch works electrically. This is especially important since this state is not exposed on the web interface (or rather only in the API). With this test we make sure that the load is actually engaged, when the power switch is "Off" - and that we can also switch it to "OffFloating".

If we do not overwrite the parent's __attrs_post_init__(), we do not need to call it.

Bastian-Krause

Looks good overall. Here are some minor suggestions:

tests/test_filesystems.py

tests/test_network.py

tests/test_userspace.py

tests/test_linux_hardware.py

tests/test_tacd.py

tests/helper.py

Running commands in the shell in the background is merely a hack to have commands run in parallel: * You have to take care of `stdout` and `stderr` not messing up output of other programs. * You have to track the `pid` yourself. Using `systemd-run` solves these problems. This commit also introduces a context manager that wraps around `systemd-run` and makes test cases very readable.

Inferring the block device from the USB path at runtime is not needed, since the path in `/dev/disk/by-path/` will be stable for a given USB storage device in a given USB port. This way we get rid of a few extra commands. But more important: We get rid of pipes in shell commands, that may fail without failing the whole command.

The fixture `prepare_network` tears down the whole network setup of the TAC and replaces it with it's own setup. At the end of the test the process is reversed. But to make sure that the TAC is actually online we should at least wait until we have reached the same conditions as in the `network` state of the strategy. This way, if following tests rely on the TAC being online we can still fulfill that.

Using `try .. finally` we can make sure that we try to clean up after our test cases run - even if they fail.

Otherwise this line would only be a no-op.

Until now we have tested the size of the file systems in `slot 0` and `slot 1`. This seems natural, since these are two independent systems. But both slots basically contain the same system - only installed via two different methods. We currently have no reason to believe that the system installed via RAUC may have another file system layout that the system installed via fastboot. Thus we rewrite the tests to only test the file system sizes in `slot 0` and remove a good amount of complexity and testing time. If we ever encounter problems in this realm we can still write tests that focus on the actual problems fixed.

The strategy already checks, whether the system is running. Thus, it makes total sense to also collect debug information if this state could not be reached. This is a preparation for the removal of RAUC handling from the strategy. After support for RAUC has been removed from the strategy, the strategy will always assume, that we are in `system0`. Thus it does not make sense to have tests for `system1` anymore.

Until now we handling of RAUC installs and handling of state transitions between these two systems was done by the strategy. This led to a fair amount of added complexity in the strategy. For example did some states pretend to be reached (shell, network), even if actually a we were in one of the RAUC-states (`system0`, `system1`). Reducing the complexity in the strategy makes it also clear that other special cases, like reconfiguring network from `online` to `loopback` should not be part of the strategy. This commit removes handling of RAUC installation and tracking of the active slot from the strategy. As a replacement it adds pytest `fixtures` that allow test cases to manipulate which slot should be booted. The fixture `set_bootstate_in_bootloader` should be used in tests that want to boot into `system1` or change retries, `mark-good` etc. This fixture makes sure (using the `default_bootstate` fixture) that the system will always be in `system0` and has the default bootste after the test has finished. The only requirements for tests is, that they should not install a new system into `system0`, nor make `system0` unusable. While doing all these changes the actual RAUC tests have been adapted to more closely follow the best current practices for these tests.

Both states `shell` and `network` are basically the same: * `shell`: The system is up in userspace and all services have started. * `network`: Like `shell`, but we have checked that we are online. To ease the transitions between `network` and `shell` there is even code in place that acts like both states are the same. This commit now removes the additional `network` state and adds the online check to the `shell` state.

Unpacking a single element list using the following notation has the benefit, that it will raise an exception, if the list to unpack is not exactly one element long: > [stdout] = shell.run_check("some_command") This is especially useful for commands that are expected to only return a single line of output. Co-authored-by: Bastian Krause <bst@pengutronix.de>

SmithChart · 2024-09-16T09:58:54Z

@Bastian-Krause From my side this is ready to merge. Feel free to have another look.

tests/test_interfaces_usb.py

tests/test_tacd.py

Without this change all tests left the *eet* in what ever state happend to be the last. This does not work well with the notion that tests should leave the system in it's initial state. With this change the new `eet`-fixture will make sure that the *eet* is returned to "no connection" at tear down.

With just `0.2 s` of delay the measurement of `v1/dut/feedback/current` did not reliably succeed. It seems the test-setup simply needs a short moment until there is actually current flowing that can be measured.

Bastian-Krause

Nice work, thanks for sticking with it! The test suite is now in a much better and more reliable shape.

SmithChart added 7 commits September 11, 2024 09:54

test/filesystems: Test partition sizes only in one RAUC slot

872c874

Our RAUC install will not change the partition layout during install. Thus there is no need to check the size of the same partitions in both slots.

test_partition_sizes: Use lsblk --json instead of fdisk

b8d54ff

Using `lsblk --json` we can directly parse the tools output and must not rely on parsing shell output.

test_filesystem_sizes: Use findmnt --json instead of df

4c82e45

Using `findmnt --json` we can directly parse the tools output. This is considered better practice than parsing the shell output of `df`.

test_hardware: No need to test the nvmem subsystem

8830d9c

This is just another representation of the configuration EEPROMs present on i2c-0 and i2c2. Both are not used by userspace directly. So there is no need to check if these nodes in `/sys` exist.

SmithChart requested a review from Bastian-Krause September 11, 2024 07:57

SmithChart and others added 7 commits September 11, 2024 17:14

tests/tacd: http_temperature: Also compare temperature to sensors

bd84857

Until now we only tested, if the endpoint provided a somewhat useful value. By also comparing the value to the output of `sensors` we can check the complete `tacd` stack.

tests: Replace shell.run with shell.run_check where possible

2cbf584

In some places we have been using `stdout,_ rc = shell.run()` followed by `assert rc == 0`. The better way to do this is by using `stdout = shell.run_check()` since this removes a bit of boilerplate in this places.

treewide: simplify imports

44cb269

lxatacstrategy: drop obsolete __attrs_post_init__()

374f282

If we do not overwrite the parent's __attrs_post_init__(), we do not need to call it.

SmithChart force-pushed the cfi/review-feedback branch from d53f9d2 to 34bf2d1 Compare September 12, 2024 09:41

Bastian-Krause requested changes Sep 12, 2024

View reviewed changes

SmithChart force-pushed the cfi/review-feedback branch from 34bf2d1 to 95c307f Compare September 13, 2024 16:45

SmithChart added 12 commits September 16, 2024 10:49

test_linux_stress: Remove stray print

a51da21

test_network/prepare_network: Add more reasoning for reconfiguring

3bb2c6e

test_network: Make sure cleanup action are always executed

d5f5b6b

Using `try .. finally` we can make sure that we try to clean up after our test cases run - even if they fail.

test_chrony: Use CSV reader instead of parsing by hand

c5d66be

test_switch_config: Actually test the result of comparison

1a87742

Otherwise this line would only be a no-op.

test_hostname: Use str.rstrip() instead of str.split()

757e861

SmithChart and others added 2 commits September 16, 2024 10:49

SmithChart force-pushed the cfi/review-feedback branch from a11baf4 to 35990f5 Compare September 16, 2024 09:28

SmithChart marked this pull request as ready for review September 16, 2024 09:57

SmithChart requested a review from Bastian-Krause September 16, 2024 09:58

Bastian-Krause reviewed Oct 8, 2024

View reviewed changes

tests/test_interfaces_usb.py Outdated Show resolved Hide resolved

Bastian-Krause reviewed Oct 8, 2024

View reviewed changes

tests/test_tacd.py Outdated Show resolved Hide resolved

SmithChart added 2 commits October 9, 2024 10:57

test_tacd_eet_analog: Increase timeout for analog to settle

b5e1fb7

With just `0.2 s` of delay the measurement of `v1/dut/feedback/current` did not reliably succeed. It seems the test-setup simply needs a short moment until there is actually current flowing that can be measured.

SmithChart requested a review from Bastian-Krause October 9, 2024 09:04

Bastian-Krause approved these changes Oct 10, 2024

View reviewed changes

SmithChart merged commit 54aba7d into linux-automation:master Oct 10, 2024
2 checks passed

SmithChart deleted the cfi/review-feedback branch October 10, 2024 12:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quality improvements and best current practices #3

Quality improvements and best current practices #3

SmithChart commented Sep 11, 2024

Bastian-Krause left a comment

SmithChart commented Sep 16, 2024

Bastian-Krause left a comment

Quality improvements and best current practices #3

Quality improvements and best current practices #3

Conversation

SmithChart commented Sep 11, 2024

Bastian-Krause left a comment

Choose a reason for hiding this comment

SmithChart commented Sep 16, 2024

Bastian-Krause left a comment

Choose a reason for hiding this comment