Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

usdt: Add support for kernel usdt semaphore activation #1567

Merged
merged 9 commits into from
Nov 10, 2020

Conversation

danobi
Copy link
Member

@danobi danobi commented Oct 21, 2020

This PR adds support for the kernel's uprobe ref count API. If you
pass the kernel an offset to the usdt semaphore while attaching
a uprobe, the kernel will manage the lifetime of the semaphore.

This is better because:

  • if bpftrace crashes the semaphore will be deactivated
  • the kernel does a much better job implementing
    --usdt-file-activation than we can in userspace (faster and less
    racy)

This is nice because it makes usdt's with semaphore act like a regular
usdt. No need to pass the PID to activate the semaphore.

Checklist
  • Language changes are updated in docs/reference_guide.md
  • User-visible and non-trivial changes updated in CHANGELOG.md
  • The new behaviour is covered by tests

@danobi danobi added the do-not-merge Changes are not ready to be merged into master yet label Oct 21, 2020
@danobi
Copy link
Member Author

danobi commented Oct 21, 2020

Dependent on iovisor/bcc#3135 .

@@ -1268,7 +1268,18 @@ hi
^C
```

bpftrace also supports USDT semaphores. You may activate semaphores by passing in `-p $PID` or
bpftrace also supports USDT semaphores. If both your environment and bpftrace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a (new) kernel feature or has it been in there for ages? Might be good to list the required version if this requires a modern kernel

@danobi
Copy link
Member Author

danobi commented Oct 21, 2020 via email

@javierhonduco
Copy link
Contributor

cc @dalehamel

@dalehamel
Copy link
Contributor

Thanks @javierhonduco i've been keeping an eye on this and iovisor/bcc#3135

@danobi danobi removed the do-not-merge Changes are not ready to be merged into master yet label Oct 28, 2020
@danobi
Copy link
Member Author

danobi commented Oct 28, 2020

bcc PR is merged, this is ready for review now

@danobi danobi mentioned this pull request Nov 3, 2020
@fbs
Copy link
Member

fbs commented Nov 3, 2020

Looks good :). @dalehamel do you have time for a review, im not that familiar with usdts

@dalehamel
Copy link
Contributor

@fbs i'll give this a closer look this week

Copy link
Contributor

@dalehamel dalehamel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it. I wonder if we can update github actions for CI so that we run on several versions of ubuntu, as I think that ubuntu 16.04 is still on 4.15 kernel that doesn't have this API, but 20.04 and 18.04 are on 5.4 kernels that do have it.

It would be great to be able to test both code paths (kernel has ref counter offset, kernel doesn't) in CI, but everything looks right to me.

@fbs
Copy link
Member

fbs commented Nov 5, 2020

@danobi needs a rebase

Will be used in a later commit to see if we can use the kernel's usdt
semaphore support.
The function was getting huge, with a bunch of #ifdef's sprinkled
around. It was pretty hard to reason about, let alone add more #ifdef
branches.

This is (hopefully) a non-functional change.
If available, let the kernel manage the semaphore count for the usdt
probe. This is better because the kernel can activate the semaphore on
all processes that have the target mapped much faster than
the current --usdt-file-activation implementation (which walks
/proc/*/maps) and greps for strings.

It's also better b/c if bpftrace crashes the kernel can reliable
decrement the semaphore.
Now you can gate a runtime test on a feature NOT being present.

eg: REQUIRES_FEATURE !btf
@danobi
Copy link
Member Author

danobi commented Nov 5, 2020

Love it. I wonder if we can update github actions for CI so that we run on several versions of ubuntu, as I think that ubuntu 16.04 is still on 4.15 kernel that doesn't have this API, but 20.04 and 18.04 are on 5.4 kernels that do have it.

@dalehamel I'll file a separate issue for VM tests in CI

@danobi
Copy link
Member Author

danobi commented Nov 6, 2020

Don't merge this yet -- I'm investigating the build failure. This is looking to be a really strange bug. Either in bcc or in the kernel.

@danobi danobi added the do-not-merge Changes are not ready to be merged into master yet label Nov 6, 2020
@danobi
Copy link
Member Author

danobi commented Nov 6, 2020

Ok, so it took an entire day of printf debugging the kernel, but I have a suspicion.

I think embedded build's toolchain is having usdt_test_semaphore load its executable image into 2 distinct VMAs while my host's native toolchain is doing 1 VMA. Then to be safe (guessing), b/c the insn exists in both VMAs, the kernel installs a breakpoint in both VMAs (also guessing). Regardless, what the kernel is doing isn't wrong. If it wants to bump the semaphore by 2 instead of 1, so be it. As long as it subtracts by 2 during cleanup (which it does).

I'll update the tests to allow >1 semaphore count bump.

The kernel can do it much faster and reliably.

Note that the kernel may increment the semaphore by more than 1. This
happens when the executable image is mapped to >1 VMAs. The runtime
tests are updated to accept a semaphore value > 0.
If we run two usdt_semaphore_test binaries and both are running a busy
loop full speed, then all the events generated by the first binary can
overflow the perf buffer and starve out events from the second binary.
@danobi danobi removed the do-not-merge Changes are not ready to be merged into master yet label Nov 6, 2020
@dalehamel
Copy link
Contributor

I think embedded build's toolchain is having usdt_test_semaphore load its executable image into 2 distinct VMAs while my host's native toolchain is doing 1 VMA.

That's really bizarre if it depends on the toolchain... maybe it is the BCC version though? I think the embedded builds were still on 0.12 last I checked, but there is a build that does "latest" also, the "edge" build. Is that one also behaving oddly?

Then to be safe (guessing), b/c the insn exists in both VMAs, the kernel installs a breakpoint in both VMAs (also guessing). Regardless, what the kernel is doing isn't wrong. If it wants to bump the semaphore by 2 instead of 1, so be it. As long as it subtracts by 2 during cleanup (which it does).

I think I've seen this exact issue before, but I forget what the conclusion was...

I've found using dd and seeking to the offset in the elf notes for the different addresses of a probe, while the probes are still attached, is a good way to confirm this behavior. IIRC there is also an outstanding bug in bpftrace where we weren't attaching to all USDT probe addresses if there were multiple, but I think that bcc will attach to all of them correctly if you use the kernel API, but I don't think the embedded builds except for the one using the latest bcc would.

@danobi
Copy link
Member Author

danobi commented Nov 6, 2020

That's really bizarre if it depends on the toolchain... maybe it is the BCC version though?

I don't think so. I singled stepped the code using GDB and it really was the syscall (perf_event_open) incrementing by 2. I dumped the perf_event_open() args using strace and everything matched.

I could also reproduce the issue with a native build of bpftrace against the embedded built test prog. So I really do think it's the test prog doing some funny stuff here.

IIRC there is also an outstanding bug in bpftrace where we weren't attaching to all USDT probe addresses if there were multiple

I think I fixed that in 17588d2 . Lemme know if it's still buggy and I can take a look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants