Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uprobes in Kubernetes #1516

Closed
ggaurav10 opened this issue Jan 8, 2018 · 17 comments
Closed

uprobes in Kubernetes #1516

ggaurav10 opened this issue Jan 8, 2018 · 17 comments

Comments

@ggaurav10
Copy link

i am trying to run BCC tool ugc.py for node and java apps which are running in separate containers in kubernetes. The BCC tools are packaged in another privileged container which is running in same PID namespace as host, so they can see the running apps.

I am getting below error:
perf_event_open(/sys/kernel/debug/tracing/events/uprobes/p__usr_local_bin_node_0xf5a794_6052_bcc_12351/id): Input/output error
The offcputime tool runs properly though and I am able to generate the expected flamegraphs showing full stacks of the apps. The issue is seen only with uprobes.

Node is built with dtrace support, and both java and node apps are running with -XX:+PreserveFramePointer and --perf-basic-prof options respectively.

Value in the file /proc/sys/kernel/perf_event_paranoid is set to -1 (which allows all probes by all users).

The uprobe support is also enabled in the kernel (as is BPF support):

cat /lib64/modules/4.13.16-coreos-r1/build/.config | grep -i uprobe
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_UPROBES=y
CONFIG_UPROBE_EVENTS=y

The symbol map files for both the apps are also present in /tmp/ of the BCC container.
This behaviour is seen on both 4.13.16-coreos-r1 and 4.12.10-coreos kernel versions.

Is there anything that I am missing here?

Thanks.

@goldshtn
Copy link
Collaborator

goldshtn commented Jan 8, 2018

Do any uprobes work at all? Say, on libc or something? There is a known issue with uprobes and some container runtimes, I vaguely remember it having something to do with the overlay. I’m on mobile right now but could you see if there’s a prior issue here on bcc mentioning this?

@ggaurav10
Copy link
Author

Thanks @goldshtn, for your response.
I am getting same error with libc too. Yes, #1231 talks about the same issue which seems to be because of missing readpage() handler in overlayfs driver.

@yonghong-song
Copy link
Collaborator

Right, overlayfs does not support uprobe at all.
In kernel/events/uprobes.c,

        /* copy_insn() uses read_mapping_page() or shmem_read_mapping_page() */
        if (!inode->i_mapping->a_ops->readpage && !shmem_mapping(inode->i_mapping))
                return -EIO;

For overlayfs, the "readpage" is not defined, and !shmem_mapping is true, so
you will get an -EIO error.

@yonghong-song
Copy link
Collaborator

Close the issue as it is really a file-system issue.

@saurabh83
Copy link

Is this still an issue with overlayfs? @yonghong-song

@yonghong-song
Copy link
Collaborator

The issue is fixed in 4.17 or 4.18 kernel, I forgot which exact version.

@sergei-la
Copy link

sergei-la commented Aug 2, 2019

@yonghong-song , would you be able to point to the overlayfs readpage fix ? I am trying to use uprobes on the system with kernel 4.15.x and overlayfs , was wondering what was the fix if I would be able to port it.

@yonghong-song
Copy link
Collaborator

@sergei-la
Copy link

thanks a lot

@erthalion
Copy link
Contributor

Hi,

Maybe you (@yonghong-song @ggaurav10 @sergei-la ?) are aware about any similar problems in such situations with the newer version of kernel? I have a similar setup with a two Kubernetes containers, one is running an application, another "profiling" privileged container is attached to the host pid namespace. Everything is happening on linux kernel 4.19.43 on overlayfs2, which supposed to have the fix mentioned above (I've checked, the commit refers to 4.17). But in my case there are no errors, bcc creates a uprobe an attaches correctly, but do not catches any probe hits. In fact even if I create the same uprobe via perf I don't see probe hits in /sys/kernel/debug/tracing/uprobe_profile, so I guess it's not a bcc issue, but maybe someone has any ideas why could it be? I'm a bit confused, since e.g. when I run the same setup just inside a docker container locally on 4.19.44 and overlayfs everything is fine, and I'm out of ideas what could be wrong.

@yonghong-song
Copy link
Collaborator

In kubernetes container, could you trace kernel function uprobe_dispatcher or not? This way, we can know whether the kernel uprobe infrastructure ignores this uprobe or not. Can you also do the tracing in docker container to make sure uprobe_dispatcher is triggered?

@erthalion
Copy link
Contributor

In kubernetes container, could you trace kernel function uprobe_dispatcher or not?

If I understand correctly, this function is defined as static, Is it even possible to trace it? E.g. when I'm trying to use ftrace for that (the same for kprobe_events):

$ echo uprobe_dispatcher > set_ftrace_filter
bash: echo: write error: Invalid argument

# at the same time do_sys_open works
$ echo do_sys_open > set_ftrace_filter

@yonghong-song
Copy link
Collaborator

You should be able to trace it. But in this case, I do get an error message

trace_kprobe: Could not probe notrace function uprobe_dispatcher

You could recompile your kernel with CONFIG_KPROBE_EVENTS_ON_NOTRACE and try again.
Not sure whether it is possible or not in your environment.

@erthalion
Copy link
Contributor

Sorry for long delay.

You could recompile your kernel with CONFIG_KPROBE_EVENTS_ON_NOTRACE and try again.
Not sure whether it is possible or not in your environment.

Nope, unfortunately in this case it's not possible. I'll try to investigate more, since I've got an impression that something in the K8S configuration was changed, and then return if there will be any results.

@hazelnutsgz
Copy link

Here we use kernel 5.10.76, and find the same issue for containerized apps as @erthalion , bcc uprobe can be attached but not hitting.

@narang99
Copy link

narang99 commented Sep 27, 2022

Even for me, epbf attaches to the container but none of the probes are called (in plain docker containers too)

Is there any trick to making it work? Or can someone please help and explain the root cause of this issue?

The issue is persisting in 5.15 Linux too
I tried googling around but couldn't find an exact explanation...

Thanks!

@vakabus
Copy link

vakabus commented Apr 24, 2023

I have the same issue on kernel 6.2, Kubernetes installed with kubeadm, docker, docker-cri and ovn-kubernetes CNI plugin. The probes can be registered but they never trigger.

More specifically, I am trying to debug a process from the binary ovs-vswitchd (part of the network plugin). I might have discovered a workaround - if I ssh into a relevant node in the cluster, find the relevant container using docker commands and manually restart it, the probes start triggering.

In other words, I can make it work roughly like this:

ssh node
docker ps | grep -v POD  # find out the relevant container ID from this
docker restart 2863e0cde365

A manually started container (docker run) seems to work just fine for me. No problems there. The way docker-cri starts the containers seems to be the issue here. But I might as well be completely wrong... 🤷

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants