Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workflows: Add LAVA tests workflow #48

Merged
merged 1 commit into from
Nov 28, 2023

Conversation

cazou
Copy link
Contributor

@cazou cazou commented Aug 3, 2023

cros-codecs CI workflow:

The ci/config.yaml contains the test configurations. For each architecture (amd/intel), it specifies:

  • The lava device type to use
  • The supported codecs for the architecture (vp8, h.264, ...)
  • The test suites to be run for each codec (currently one per codec)
  • The test vectors to be skipped in each test suite

The github action workflow uses ci/config.yaml to get the lava device_type to run the test on.
The test on lava is run from ci/test-cases/run_tests.py and uses ci/config.yaml to generate the fluster command to run.

About the lava test job

  • Only NFS images are used. The hardware is not being tested, only cros-codecs is, no need to flash anything on the hardware.

@google-cla
Copy link

google-cla bot commented Aug 3, 2023

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@dwlsalmeida
Copy link
Collaborator

The segfault is likely a driver fault, however it's possibly linked to bad input being issued from cros-codecs. Hard to say without a backtrace and other ways to reproduce.

If this is run against a debug version of the driver with tracing enabled we can probably identify the cause. For https://github.com/intel/media-driver they have the MOS_TRACE macro and friends that can be enabled with a compile flag.

@cazou
Copy link
Contributor Author

cazou commented Aug 3, 2023

@dwlsalmeida I used the acer-cp514-2h-1160g7-volteer device type on lava staging. Maybe there are better options with more stable drivers ?

@dwlsalmeida
Copy link
Collaborator

@cazou without a stacktrace it's hard to tell :/

I know very little about CI, but can't you ask it to run ccdec with valgrind or something? If I can get any insight into what part of the code is causing this I can fix it.

Also, it's very weird that absolutely nothing in VP9 is passing, definitely something is up.

@dwlsalmeida
Copy link
Collaborator

dwlsalmeida commented Aug 7, 2023

Summarizing what I know about the segfaults, taking vp90-2-02-size-lf-1920x1080.webm for example:

The source stride and the destination stride do not match for Tiger Lake:

#1  0x00007efe0b81fd38 in DdiMedia_CopyPlane (dst=0x563eff3510d0 "", dstPitch=960, src=0x563eff71ece0 "", srcPitch=1920, height=540) at /home/detlev/Sources/goo0028/media-driver/media_driver/linux/common/ddi/media_libva.cpp:5173

This causes a segfault here in the driver, when copying the V plane.

static void DdiMedia_CopyPlane(
    uint8_t *dst,
    uint32_t dstPitch,
    uint8_t *src,
    uint32_t srcPitch,
    uint32_t height)
{
    uint32_t rowSize = std::min(dstPitch, srcPitch);
    for (int y = 0; y < height; y += 1)
    {
        memcpy(dst, src, rowSize);
        dst += dstPitch;
        src += srcPitch;
    }
}

Running this on my Ice Lake machine, I get srcPitch==dstPitch==960.

I have no idea why this returns 1920 for the V plane on Tiger Lake for presumably i420 content:

        DdiMedia_GetChromaPitchHeight(DdiMedia_MediaFormatToOsFormat(surface->format), surface->iPitch, surface->iHeight, &chromaPitch, &chromaHeight);

My hunch is that, somehow, this line gets executed:

            *chromaPitch = pitch;

Instead of

            *chromaPitch = MOS_ALIGN_CEIL(pitch, 2) / 2;

In

static VAStatus DdiMedia_GetChromaPitchHeight(
    uint32_t fourcc,
    uint32_t pitch,
    uint32_t height,
    uint32_t *chromaPitch,
    uint32_t *chromaHeight)
{
    DDI_CHK_NULL(chromaPitch, "nullptr chromaPitch", VA_STATUS_ERROR_INVALID_PARAMETER);
    DDI_CHK_NULL(chromaHeight, "nullptr chromaHeight", VA_STATUS_ERROR_INVALID_PARAMETER);

    switch(fourcc)
    {
        case VA_FOURCC_NV12:
        case VA_FOURCC_P010:
        case VA_FOURCC_P012:
        case VA_FOURCC_P016:
            *chromaHeight = MOS_ALIGN_CEIL(height, 2) / 2;
            *chromaPitch = pitch;
            break;
        case VA_FOURCC_I420:
        case VA_FOURCC_YV12:
            *chromaHeight = MOS_ALIGN_CEIL(height, 2) / 2;
            *chromaPitch = MOS_ALIGN_CEIL(pitch, 2) / 2;
            break;
        case VA_FOURCC_411P:
        case VA_FOURCC_422H:
        case VA_FOURCC_444P:
        case VA_FOURCC_RGBP:
            *chromaHeight = height;
            *chromaPitch = pitch;
            break;
        case VA_FOURCC_422V:
        case VA_FOURCC_IMC3:
            *chromaHeight = MOS_ALIGN_CEIL(height, 2) / 2;
            *chromaPitch = pitch;
            break;
        default:
            *chromaPitch = 0;
            *chromaHeight = 0;
    }

    return VA_STATUS_SUCCESS;
}

Notably, we stopped indicating the fourcc when creating the surfaces some time ago:

        /// Create new surfaces and add them to the pool, using `descriptors` as backing memory.
        pub(crate) fn add_surfaces(&mut self, descriptors: Vec<M>) -> Result<(), VaError> {
            let surfaces = self.display.create_surfaces(
                self.rt_format,
                // Let the hardware decide the best internal format - we will get the desired fourcc
                // when creating the image.
                None, <-----------
                self.coded_resolution.width,
                self.coded_resolution.height,
                self.usage_hint,
                descriptors,
            )?;

It may very well be that the bug is in between the surface's internal format and the memcpy.

@cazou can you confirm the VA-API version? (vainfo)

@Gnurou
Copy link
Collaborator

Gnurou commented Aug 7, 2023

Thanks for this, this is looking good!

I think one feature we will need is the ability to keep track of which tests are expected to fail, since we cannot reach 100% due to bugs in drivers and sometimes cros-codecs itself. So the CI job should be successful as long as no test regressed.

@cazou
Copy link
Contributor Author

cazou commented Aug 7, 2023

Running this on my Ice Lake machine, I get srcPitch==dstPitch==960.

I have no idea why this returns 1920 for the V plane on Tiger Lake for presumably i420 content:

@dwlsalmeida To be fully correct, this happens on debian on both hp-x360-12b-ca0500na-n4000-octopus and acer-cp514-2h-1160g7-volteer device types on lava.
I was not able to reproduce this on my laptop running Gentoo (media-driver 23.2.4). But I was able to reproduce this on my laptop within a docker running debian.

@cazou can you confirm the VA-API version? (vainfo)

Gentoo:

libva info: VA-API version 1.18.0
libva info: Trying to open /usr/lib64/va/drivers/iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_18
libva info: va_openDriver() returns 0
vainfo: VA-API version: 1.18 (libva 2.18.2)
vainfo: Driver version: Intel iHD driver for Intel(R) Gen Graphics - 23.2.4 ()

Debian:

libva info: VA-API version 1.17.0
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_17
libva info: va_openDriver() returns 0
vainfo: VA-API version: 1.17 (libva 2.12.0)
vainfo: Driver version: Intel iHD driver for Intel(R) Gen Graphics - 23.1.1 ()

I edited the debian info, it was with one of my debug version, now it is the one from the debian package

CPU is 11th gen Intel i7-1185G7

@cazou
Copy link
Contributor Author

cazou commented Aug 7, 2023

It looks like an issue with the debian package. I tried on bullseye (oldstable), bookworm (stable) and trixe (unstable) and all have the same issue. But when I rebuild the driver from the git repo at the same version, it works correctly (no segfault).
I will have a closer look at this debian package

@cazou
Copy link
Contributor Author

cazou commented Aug 7, 2023

I confirm that this is an issue with the way the package is built: Debian separates non-free libs from the other ones, so the intel-media-va-driver package is built with the -DENABLE_NONFREE_KERNELS=OFF flag. Apparently, with that flag, there is a bug that makes ccdec segfault.

A simple fix is to install intel-media-va-driver-non-free instead from the non-free apt source. We need to evaluate if that package can be installed on a rootfs that will be made publicly available.

With that package, the segfault issue is not a blocker anymore to continue implementing the CI.

@dwlsalmeida
Copy link
Collaborator

FWIW, on Arch, we all get the "non-free" stuff by default -> https://gitlab.archlinux.org/archlinux/packaging/packages/intel-media-driver/-/blob/main/PKGBUILD

@cazou cazou force-pushed the wip/detlev/t44031-add-ci branch 4 times, most recently from 708ce03 to b42f0f9 Compare August 11, 2023 14:50
@cazou cazou force-pushed the wip/detlev/t44031-add-ci branch 3 times, most recently from f71aa87 to d272cbd Compare August 25, 2023 21:12
@Gnurou
Copy link
Collaborator

Gnurou commented Aug 28, 2023

@cazou, before we can merge this we will need you to fix the CLA situation - could you check https://github.com/chromeos/cros-codecs/pull/48/checks?check_run_id=16226713377 ?

@cazou cazou force-pushed the wip/detlev/t44031-add-ci branch 2 times, most recently from fcbff5d to 759732e Compare September 28, 2023 15:59
@cazou cazou changed the title WIP: workflows: Add LAVA tests workflow workflows: Add LAVA tests workflow Sep 28, 2023
@cazou cazou marked this pull request as ready for review October 1, 2023 20:30
@cazou
Copy link
Contributor Author

cazou commented Oct 3, 2023

@Gnurou @dwlsalmeida The failures are due to the missing action secrets on this project. We need to add, as secrets:

  • GH_BEARER: A Github token with read access to the Actions artifacts (This will be publicly visible on LAVA)
  • LAVA_TOKEN: A LAVA (lava.collabora.dev) token to be able to submit jobs
    And a variable:
  • LAVA_USERNAME: The user name associated with the LAVA token

@cazou cazou force-pushed the wip/detlev/t44031-add-ci branch 2 times, most recently from 5918412 to d8db7cb Compare October 10, 2023 13:55
@Gnurou
Copy link
Collaborator

Gnurou commented Oct 13, 2023

The failures are due to the missing action secrets on this project.

Ok, so can we prevent the LAVA tests from being run in the Github actions for now? We will enable them with the CL that adds the necessary tokens.

Besides the submit jobs seem to be failing for unrelated reasons:

Error: Unable to resolve action `dorny/test-reporter@v1.7`, unable to find version `v1.7`

@cazou
Copy link
Contributor Author

cazou commented Oct 13, 2023

The failures are due to the missing action secrets on this project.

Ok, so can we prevent the LAVA tests from being run in the Github actions for now? We will enable them with the CL that adds the necessary tokens.

They can, yes. But that's basically all there is in this PR. Can't we just merge this when the secrets are set ?

Besides the submit jobs seem to be failing for unrelated reasons:

Error: Unable to resolve action `dorny/test-reporter@v1.7`, unable to find version `v1.7`

Ha yes, I'll fix it. Upstream uses tags instead of a branch, so the whole version number is needed.

@Gnurou
Copy link
Collaborator

Gnurou commented Nov 8, 2023

Hi @cazou,

They can, yes. But that's basically all there is in this PR. Can't we just merge this when the secrets are set ?

I'm fine with that but have to admit I am a bit oblivious as to what secrets we need to obtain, how, and where to set them (as they are "secrets" I suppose we shall not push them into this repository). Could you enlighten me a bit about that?

@cazou
Copy link
Contributor Author

cazou commented Nov 8, 2023

Hi @cazou,

They can, yes. But that's basically all there is in this PR. Can't we just merge this when the secrets are set ?

I'm fine with that but have to admit I am a bit oblivious as to what secrets we need to obtain, how, and where to set them (as they are "secrets" I suppose we shall not push them into this repository). Could you enlighten me a bit about that?

Hi @Gnurou !

We need to set those secrets:
GH_BEARER: A Github token with read access to the Actions artifacts (This will be publicly visible on LAVA)
LAVA_TOKEN: A LAVA (lava.collabora.dev) token to be able to submit jobs

I am currently in the process of getting a token from our LAVA people and we'll be able to give that to you for this repo.
It will be stored on this repository as a secret. Only you can set or edit it and nobody can read it. it will be shown as *** in the logs (see https://github.com/cazou/cros-codecs/actions/runs/6510164313/job/17683262845#step:3:1)

The GH_BEARER secret must be a token that you generate for this repository, with the read permission. It will be used by LAVA to recover the built ccdec artifact to be tested.

They can be set in Settings > Secrets and Variables > Actions. There is a Add new repository secret button.

And a variable LAVA_USERNAME: The user name associated with the LAVA token. We'll provide that as well. Doesn't need to be a secret and can be added at the same place as Action Secrets, in the variables tab.

@obbardc
Copy link

obbardc commented Nov 8, 2023

@cazou Can you also make the LAVA_RPC2_URL (e.g. https://lava.collabora.dev/RPC2) a variable (non-secret), so that forks could use their own Lava instance ?

.github/workflows/lava.yml Outdated Show resolved Hide resolved
.github/workflows/lava.yml Outdated Show resolved Hide resolved
.github/workflows/lava.yml Outdated Show resolved Hide resolved
.github/workflows/lava.yml Outdated Show resolved Hide resolved
.github/workflows/lava.yml Outdated Show resolved Hide resolved
.github/workflows/lava.yml Outdated Show resolved Hide resolved
Build ccdec and run tests with fluster on LAVA.

The workflow waits for the results from LAVA and shows a report with
each test status.
@cazou
Copy link
Contributor Author

cazou commented Nov 17, 2023

Updated the commit as kernelci/kernelci-core#2051 was merged. It now uses the kernelci amd64 bullseye-gst-fluster rootfs

@Gnurou
Copy link
Collaborator

Gnurou commented Nov 28, 2023

Attempting a merge now that the secrets are in - fingers crossed...

@Gnurou Gnurou merged commit 5f0eea1 into chromeos:main Nov 28, 2023
2 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants