Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: Use libLLVM from Rust CI tarball #1022

Merged
merged 1 commit into from
Sep 25, 2024
Merged

Conversation

vadorovsky
Copy link
Member

@vadorovsky vadorovsky commented Sep 7, 2024

Instead of relying on Homebrew for macOS (which ships older LLVM
versions) and apt.llvm.org for Linux (which often has bugs at the
packaging level), we now use the tarball from Rust CI to provide
libLLVM. This ensures it always matches the version used by the latest
Rust nightly.

This removes our reliance on homebrew shipping LLVM versions matching
those used by rustc.


This change is Reviewable

Copy link

netlify bot commented Sep 7, 2024

Deploy Preview for aya-rs-docs ready!

Built without sensitive environment variables

Name Link
🔨 Latest commit 8a5d22b
🔍 Latest deploy log https://app.netlify.com/sites/aya-rs-docs/deploys/66f4555b154a470008d038e1
😎 Deploy Preview https://deploy-preview-1022--aya-rs-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@mergify mergify bot added the test A PR that improves test cases or CI label Sep 7, 2024
@vadorovsky vadorovsky changed the title ci: Fix bpf-linker installation macOS ci: Fix bpf-linker installation on macOS Sep 7, 2024
@vadorovsky vadorovsky force-pushed the fix-macos branch 15 times, most recently from 10231aa to 0badb23 Compare September 8, 2024 16:00
@tyrone-wu
Copy link
Contributor

tyrone-wu commented Sep 8, 2024

I finally got it to pass in a dum way. 😭
Here are the changed I made to make it pass:
In test/integration-ebpf/src/test.rs, I modified tracepoint program to:

#[tracepoint]
pub fn test_tracepoint(ctx: TracePointContext) -> u32 {
    // some arbitrary work
    let mut res = ctx.uid().wrapping_add(ctx.pid());
    for _ in 0..10_000 {
        res = res.wrapping_mul(123);
        res ^= (ctx.pid() << 3) ^ (ctx.uid() << 4);
        res = res.rotate_left(2);
        res = res.reverse_bits();
        res = res.swap_bytes();
        res = res.count_ones();
        unsafe { bpf_printk!(b"number %u", res) };
    }
    res
}

And in test/integration-test/src/tests/info.rs, I added this in between for test_prog_stats():

...
prog.attach("syscalls", "sys_enter_bpf").unwrap();

// Executing bpf syscalls to trigger `sys_enter_bpf`
for _ in 0..5 {
        let _ = loaded_programs().collect::<Vec<_>>();
}

let test_prog = prog.info().unwrap();
...

I experimented with just one of the modifications above, and it wasn't enough to "pass", so it looks like a combination of both is needed for macos to pass.

After some testing, I'm fairly confident that macos is recording in milliseconds. I did some janky experiments (ik, this is a dum way to do this 😭), and run_time ns are always in 1,000,000s:

Copy link
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 3 files at r1, all commit messages.
Reviewable status: 1 of 3 files reviewed, 9 unresolved discussions (waiting on @dave-tucker and @vadorovsky)


-- commits line 4 at r1:
Please wrap at 72 columns


-- commits line 8 at r1:
can we use the tarball from rust CI on linux too?


.github/workflows/ci.yml line 199 at r1 (raw file):

          submodules: recursive

      - uses: dtolnay/rust-toolchain@master

why did this need to move?


.github/workflows/ci.yml line 237 at r1 (raw file):

        # The clang shipped on macOS doesn't support BPF, so we need LLVM from brew.
        #
        # We also need the newest LLVM static libraries for bpf-linker, potentially newer than the

just remove this? we already talk about clang above.


.github/workflows/ci.yml line 252 at r1 (raw file):

          echo $(brew --prefix)/opt/llvm/bin >> $GITHUB_PATH

          # LLVM packaged in brew is usually too old for bpf-linker or any other package repository for macOS.

LLVM packaged in brew or any other package repository for macOS is usually too old for bpf-linker.

Code quote:

LLVM packaged in brew is usually too old for bpf-linker or any other package repository for macOS.

.github/workflows/ci.yml line 272 at r1 (raw file):

        if: runner.os == 'macOS'
        # NB: rustc doesn't ship libLLVM.so on macOS, so disable proxying (default feature). We also
        # --force so that bpf-linker gets always relinked against the latest LLVM from the Rust CI

just say "latest LLVM downloaded above." to avoid rewriting this every time we change the implementation details


.github/workflows/ci.yml line 275 at r1 (raw file):

        # tarball.
        env:
          LLVM_SYS_191_PREFIX: /usr/local/lib/rustc-llvm

this is slightly unfortunate. I think if you download the rust LLVM tarball and add it to GITHUB_PATH before installing LLVM from brew then you could avoid it - the llvm_config from rust would win. WDYT?


test/integration-ebpf/src/test.rs line 37 at r1 (raw file):

pub fn test_tracepoint(ctx: TracePointContext) -> u32 {
    // Some arbitrary work, to make the program running for some time and have
    // a real `run_time` in `test_program_info`.

why do we care about this? I read this comment and it doesn't make any sense to me that "macos is recording in milliseconds" - nothing about these integration tests runs on macos at all, it's all on an emulated linux kernel.


test/integration-test/src/tests/info.rs line 218 at r1 (raw file):

    prog.attach("syscalls", "sys_enter_bpf").unwrap();

    // Executing bpf syscalls to trigger `sys_enter_bpf`

it's not clear to me from this comment nor from the commit message why this addition was needed. help?

@tyrone-wu
Copy link
Contributor

tyrone-wu commented Sep 9, 2024

test/integration-ebpf/src/test.rs line 37 at r1 (raw file):

pub fn test_tracepoint(ctx: TracePointContext) -> u32 {
    // Some arbitrary work, to make the program running for some time and have
    // a real `run_time` in `test_program_info`.

why do we care about this? I read this comment and it doesn't make any sense to me that "macos is recording in milliseconds" - nothing about these integration tests runs on macos at all, it's all on an emulated linux kernel.

I shouldn't have worded it as "recording". The only thing I'm certain is that I observed run_time_ns having exactly 1,000,000 precision in macos. The test program performs enough work for running time to be non-zero.

test/integration-test/src/tests/info.rs line 218 at r1 (raw file):

    prog.attach("syscalls", "sys_enter_bpf").unwrap();

    // Executing bpf syscalls to trigger `sys_enter_bpf`

it's not clear to me from this comment nor from the commit message why this addition was needed. help?

I had loaded_programs() execute a couple bpf syscall times to trigger the test program. I tried just doing it once but macos wasn't passing, so I place it in loop.
Beforehand prog.info() was the one triggering the program.


Honestly I don't have a clue as to why macos runner is behaving like this. 😓
The only thing I could trace is possible https://github.com/torvalds/linux/blob/master/kernel/sched/clock.c#L62 overrided in https://github.com/torvalds/linux/blob/master/kernel/bpf/trampoline.c#L888? but that doesn't feel right, and is just a guess...

@vadorovsky vadorovsky force-pushed the fix-macos branch 6 times, most recently from a533972 to ba3bb51 Compare September 13, 2024 13:49
Copy link
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r7, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @dave-tucker, @tyrone-wu, and @vadorovsky)

@tyrone-wu
Copy link
Contributor

@vadorovsky I would also add that specifically the hvf qemu accelerator is causing the millisecond resolution to occur.

@vadorovsky vadorovsky force-pushed the fix-macos branch 4 times, most recently from b1f1ba7 to 4e26eb4 Compare September 25, 2024 10:16
Copy link
Member Author

@vadorovsky vadorovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 3 files reviewed, 2 unresolved discussions (waiting on @dave-tucker and @tamird)


.github/workflows/ci.yml line 239 at r4 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

Let's document please!

Done. Unfortunately the LLVM binary tarballs for macOS are broken, so I'm sticking to brew. Let me know if you find the explanation in the comment sufficient.

See llvm/llvm-project#92260 (comment)


test/integration-ebpf/src/test.rs line 37 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

We need to document the voodoo we're doing here.

Done, tried my best to explain that.


test/integration-test/src/tests/info.rs line 218 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

We need to document the voodoo we're doing here.

Done and I also simplified the voodoo.

We don't need the loop, executing loaded_programs() (or we could do anything triggering the BPF syscall) followed by a short sleep is sufficient.

Copy link
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 3 files at r8, 1 of 1 files at r9, all commit messages.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @dave-tucker, @tyrone-wu, and @vadorovsky)


.github/workflows/ci.yml line 202 at r9 (raw file):

        if: runner.os == 'Linux'
        # ubuntu-22.04 comes with clang 14[0] which doesn't include support for signed and 64bit
        # enum values which was added in clang 15[1].

we should keep this blurb as well (else it's not clear why we can't use distro-packaged clang)

i believe if we move to ubuntu-24.04 (recently GA) we can go back to distro clang


.github/workflows/ci.yml line 229 at r9 (raw file):

        # The tar shipped on macOS doesn't support --wildcards, so we need GNU tar.
        #
        # The clang shipped on macOS doesn't support BPF, so we need LLVM from brew.

we still need this sentence else it isn't clear why we can't use the system's clang


.github/workflows/ci.yml line 248 at r9 (raw file):

        # of clang. Version parity with LLVM shipped with Rust nightly is not
        # required for clang, as it is only used for compiling eBPF programs
        # written in C for integration tests. But we still prefer using a

This last sentence is sounding alarm in my head: we should try to use the oldest version we can reasonably get our hands on - we don't want to force our users to do this same thing. If it works with distro/mac/brew then we should assert that here.


.github/workflows/ci.yml line 330 at r9 (raw file):

        run: find test/.tmp -name 'vmlinuz-*' | xargs -t cargo xtask integration-test vm

      - name: Setup tmate session

remove before merging


test/integration-ebpf/src/test.rs line 39 at r9 (raw file):

    // resolved in milliseconds instead of nanoseconds.
    // Therefore, to make sure that we receive a non-zero `run_time` in
    // `test_program_info` when running on macOS, make some arbitrary work

nit: s/make/do/


test/integration-test/src/tests/info.rs line 218 at r1 (raw file):

Previously, vadorovsky (Michal Rostecki) wrote…

Done and I also simplified the voodoo.

We don't need the loop, executing loaded_programs() (or we could do anything triggering the BPF syscall) followed by a short sleep is sufficient.

can you fix the grammar?


test/integration-test/src/tests/info.rs line 222 at r9 (raw file):

    // execute.
    let _ = loaded_programs().collect::<Vec<_>>();
    std::thread::sleep(std::time::Duration::from_millis(100));

100 millis! how about 10?

Copy link
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r10, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @dave-tucker and @vadorovsky)

Copy link
Member Author

@vadorovsky vadorovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @dave-tucker and @tamird)


.github/workflows/ci.yml line 202 at r9 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

we should keep this blurb as well (else it's not clear why we can't use distro-packaged clang)

i believe if we move to ubuntu-24.04 (recently GA) we can go back to distro clang

Do you mean the blurb about lack of signed enum support in clang 14? Do you mind if I rather add it in the "Install LLVM" step? I can move it before the "Install prerequisites" steps, so the explanation would come first before any packages are installed.


.github/workflows/ci.yml line 229 at r9 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

we still need this sentence else it isn't clear why we can't use the system's clang

As mentioned above, if we move the "Install LLVM" step up, we won't have to mention it twice.


.github/workflows/ci.yml line 248 at r9 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

This last sentence is sounding alarm in my head: we should try to use the oldest version we can reasonably get our hands on - we don't want to force our users to do this same thing. If it works with distro/mac/brew then we should assert that here.

Fair, that's a good point. I will try using the LLVM 15 tarball on Linux and stick to brew on macOS.


test/integration-ebpf/src/test.rs line 39 at r9 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

nit: s/make/do/

Done


test/integration-test/src/tests/info.rs line 218 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

can you fix the grammar?

How is it now?


test/integration-test/src/tests/info.rs line 222 at r9 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

100 millis! how about 10?

10 was still flaky, but 20 seems to work

Copy link
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r11, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @dave-tucker and @vadorovsky)


.github/workflows/ci.yml line 202 at r9 (raw file):

Previously, vadorovsky (Michal Rostecki) wrote…

Do you mean the blurb about lack of signed enum support in clang 14? Do you mind if I rather add it in the "Install LLVM" step? I can move it before the "Install prerequisites" steps, so the explanation would come first before any packages are installed.

Yeah, the blurb that explains why we need to download LLVM binaries at all, but see #1039 - I don't think we need to do this anymore.


.github/workflows/ci.yml line 229 at r9 (raw file):

Previously, vadorovsky (Michal Rostecki) wrote…

As mentioned above, if we move the "Install LLVM" step up, we won't have to mention it twice.

Not sure I follow. The reasons are different on mac and linux. On mac I believe the issue is that system clang can't target bpfel.


.github/workflows/ci.yml line 248 at r9 (raw file):

Previously, vadorovsky (Michal Rostecki) wrote…

Fair, that's a good point. I will try using the LLVM 15 tarball on Linux and stick to brew on macOS.

No need for tarballs! See #1039.

Copy link
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tyrone-wu heads up: I just remove the run_time assertion in 087e10a. Feel free to reland it on its own, it need not block this PR imo.

Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @dave-tucker and @vadorovsky)

@tamird
Copy link
Member

tamird commented Sep 25, 2024

I rebased this on #1039 and reduced the scope - it now only fetches libLLVM from rust CI. See the updated commit message. Hope this is OK!

Copy link

mergify bot commented Sep 25, 2024

@vadorovsky, this pull request is now in conflict and requires a rebase.

@mergify mergify bot added the needs-rebase label Sep 25, 2024
@mergify mergify bot removed the needs-rebase label Sep 25, 2024
@tamird tamird changed the title ci: Use release LLVM versions from tarballs ci: Use libLLVM from Rust CI tarball Sep 25, 2024
Copy link
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 3 of 3 files at r12, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @dave-tucker)

Instead of relying on Homebrew for macOS (which ships older LLVM
versions) and `apt.llvm.org` for Linux (which often has bugs at the
packaging level), we now use the tarball from Rust CI to provide
`libLLVM`. This ensures it always matches the version used by the latest
Rust nightly.

This removes our reliance on homebrew shipping LLVM versions matching
those used by rustc.
Copy link
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r13, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @dave-tucker)

@vadorovsky vadorovsky merged commit 045032b into aya-rs:main Sep 25, 2024
27 checks passed
@vadorovsky vadorovsky deleted the fix-macos branch September 25, 2024 18:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test A PR that improves test cases or CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants