Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

common/seccomp: add rseq syscall #30620

Merged
merged 1 commit into from
Mar 2, 2022
Merged

Conversation

belimawr
Copy link
Contributor

@belimawr belimawr commented Mar 1, 2022

What does this PR do?

Adds rseq to the list of allowed system calls on Linux

Why is it important?

rseq syscall is available on glibc >= 2.35, and called when CGO is
used. If we don't allow rseq, Beats will eventually crash with an
glibc error: Fatal glibc error: rseq registration failed.

Checklist

  • My code follows the style guidelines of this project
    - [ ] I have commented my code, particularly in hard-to-understand areas
    - [ ] I have made corresponding changes to the documentation
    - [ ] I have made corresponding change to the default configuration files
    - [ ] I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

  • Double check the backport versions.
  • Double check if we need to add rseq to other architectures.

How to test this PR locally

Compile with CGO enabled and run any Beat on a machine (or VM) using glibc >= 2.35. Arch Linux VMs are a good choice for this test. Below is a quick snippet of how to do it using Vagrant

vagrant init archlinux/archlinux
vagrant up
vagrant ssh
# into the machine
sudo pacman -Syu --noconfirm # Update all packages, including glibc
systemctl reboot

# The ssh connection will close and you're back to the host
# Wait the VM to finish rebooting, then:
vagrant ssh 

# Make sure you're running glibc 2.35
pacman -Ss glibc | grep installed

# Clone, compile and run your Beats

Related issues

## Use cases
## Screenshots

Logs

Here is a strace -c of Filebeat after applying this patch, rseq is the last syscall listed

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ------------------
 74.04    0.019749           2      8725           rt_sigreturn
 13.29    0.003545          17       200        22 futex
  9.27    0.002474         494         5           clone
  0.96    0.000255           2       114           rt_sigaction
  0.50    0.000133           5        23           rt_sigprocmask
  0.42    0.000111           8        13           getpid
  0.28    0.000076           2        30           nanosleep
  0.20    0.000053           2        18           read
  0.16    0.000044           1        36           mmap
  0.15    0.000039           3        11           tgkill
  0.12    0.000031           3         9           mprotect
  0.11    0.000029           2        12           pread64
  0.10    0.000028           4         7           openat
  0.08    0.000021           4         5           fcntl
  0.07    0.000018           3         5         3 epoll_ctl
  0.04    0.000011          11         1           readlinkat
  0.03    0.000009           1         6           close
  0.03    0.000008           2         4           newfstatat
  0.03    0.000008           1         6           epoll_pwait
  0.02    0.000006           2         3           fstat
  0.01    0.000004           1         3           uname
  0.01    0.000004           2         2           sigaltstack
  0.01    0.000004           1         4           getrandom
  0.01    0.000003           3         1         1 ioctl
  0.01    0.000003           1         2           sched_yield
  0.01    0.000003           3         1           getppid
  0.01    0.000003           3         1           prctl
  0.01    0.000003           3         1           gettid
  0.00    0.000000           0         1           write
  0.00    0.000000           0         2           lseek
  0.00    0.000000           0         1           munmap
  0.00    0.000000           0         3           brk
  0.00    0.000000           0         1         1 access
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         2         1 arch_prctl
  0.00    0.000000           0         1           sched_getaffinity
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         1           set_robust_list
  0.00    0.000000           0         1           prlimit64
  0.00    0.000000           0         1           rseq
------ ----------- ----------- --------- --------- ------------------
100.00    0.026675           2      9264        28 total

@belimawr belimawr added bug libbeat backport-v8.0.0 Automated backport with mergify backport-v8.1.0 Automated backport with mergify backport-7.17 Automated backport to the 7.17 branch with mergify labels Mar 1, 2022
@belimawr belimawr requested review from ph and cmacknz March 1, 2022 12:38
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Mar 1, 2022
@ph
Copy link
Contributor

ph commented Mar 1, 2022

@belimawr I am trying to understand the scope of the problem. Which one of the following is true (or both)

  1. If the user goes to elastic.co and download Filebeat 7.17 it will crash on a system with a newer glibc.
  2. If a beats compiled on a dev machine is run on a a system with a newer glibc it will crash.

Note for future self: https://www.efficios.com/blog/2019/02/08/linux-restartable-sequences/

@belimawr belimawr requested a review from andrewkroh March 1, 2022 14:21
@belimawr
Copy link
Contributor Author

belimawr commented Mar 1, 2022

@andrewkroh could you take a look at this PR?

@belimawr
Copy link
Contributor Author

belimawr commented Mar 1, 2022

@belimawr I am trying to understand the scope of the problem. Which one of the following is true (or both)

  1. If the user goes to elastic.co and download Filebeat 7.17 it will crash on a system with a newer glibc.
  2. If a beats compiled on a dev machine is run on a a system with a newer glibc it will crash.

Note for future self: https://www.efficios.com/blog/2019/02/08/linux-restartable-sequences/

Both are true. I didn't dig too much into the internals of glibc or why/when it's calling rseq, this comment on the Go issue I opened seems to suggest that any call to pthread_create will cause Beats to crash.

My understanding from what was discussed on the Go issue is that by installing this seccomp filters, when rseq syscall was added to glibc, Beats became incompatible with it, hence the backport to all versions we still support.

I did test some of our official releases, if I remember correctly 7.16.x, they all crash very quickly. I'm happy to dig more into this if needed or to better document which versions that are crashing. Just let me know if it's needed or not.

I also tried to get some extra information by running the Linux auditing documented on our seccomp package, but it not showed me anything (I had to compile Auditbeat with CGO_ENABLED=0 go build ., so I'm not even sure it was compiled with all features enabled).

@elasticmachine
Copy link
Collaborator

elasticmachine commented Mar 1, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-03-01T15:41:52.762+0000

  • Duration: 117 min 58 sec

Test stats 🧪

Test Results
Failed 0
Passed 38365
Skipped 3319
Total 41684

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@belimawr
Copy link
Contributor Author

belimawr commented Mar 1, 2022

btw folks (@andrewkroh, @ph), does any of you know if we need to also enable rseq on arm/arm64? I'll try to test on arm this afternoon.

@belimawr
Copy link
Contributor Author

belimawr commented Mar 1, 2022

I was curious, so I've just tested Filebeat 7.15.0 (downloaded from elastic.co) and it also crashes:

Fatal glibc error: rseq registration failed
2022-03-01T16:36:11.709+0100    WARN    beater/filebeat.go:381  Filebeat is unable to load the Ingest Node pipelines for the configured modules because the Elasticsearch output is not configured/enabled. If you have already loaded the Ingest Node pipelines or are using Logstash pipelines, you can ignore this warning.
zsh: IOT instruction (core dumped)  ./filebeat -c ~/go/src/github.com/elastic/beats/filebeat/config.yml -e -v

rseq syscall is available on glibc >= 2.35, and called when CGO is
used. If we don't allow rseq, Beats will eventually crash with an
glibc error: `Fatal glibc error: rseq registration failed`.

Fixes: elastic#30576
@ph
Copy link
Contributor

ph commented Mar 1, 2022

@belimawr Looking at the changes, a user could update their seccomp policy to add rseq

fyi @simitt

@belimawr
Copy link
Contributor Author

belimawr commented Mar 1, 2022

I also added the rseq on x86 (linux_386). (I had forgotten to push it with the original changes)

@belimawr
Copy link
Contributor Author

belimawr commented Mar 1, 2022

@belimawr Looking at the changes, a user could update their seccomp policy to add rseq

fyi @simitt

Awesome! A user already asked for it. I'll try it and report on the issue. Thanks a lot @ph!

@simitt
Copy link
Contributor

simitt commented Mar 1, 2022

@belimawr Looking at the changes, a user could update their seccomp policy to add rseq

fyi @simitt

@ph - isn't that exactly the gap I raised between standalone and managed by Elastic Agent? Users can customize for standalone, but not when running under Elastic Agent.

@ph
Copy link
Contributor

ph commented Mar 1, 2022

@simitt This is exactly the game you raised, so we will have to consider it.

@belimawr belimawr added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Mar 2, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Mar 2, 2022
@belimawr belimawr merged commit f02fa32 into elastic:main Mar 2, 2022
@belimawr belimawr deleted the add-rseq-syscall branch March 2, 2022 14:44
mergify bot pushed a commit that referenced this pull request Mar 2, 2022
rseq syscall is available on glibc >= 2.35, and called when CGO is
used. If we don't allow rseq, Beats will eventually crash with an
glibc error: `Fatal glibc error: rseq registration failed`.

Fixes: #30576
(cherry picked from commit f02fa32)
mergify bot pushed a commit that referenced this pull request Mar 2, 2022
rseq syscall is available on glibc >= 2.35, and called when CGO is
used. If we don't allow rseq, Beats will eventually crash with an
glibc error: `Fatal glibc error: rseq registration failed`.

Fixes: #30576
(cherry picked from commit f02fa32)
mergify bot pushed a commit that referenced this pull request Mar 2, 2022
rseq syscall is available on glibc >= 2.35, and called when CGO is
used. If we don't allow rseq, Beats will eventually crash with an
glibc error: `Fatal glibc error: rseq registration failed`.

Fixes: #30576
(cherry picked from commit f02fa32)
belimawr added a commit that referenced this pull request Mar 9, 2022
rseq syscall is available on glibc >= 2.35, and called when CGO is
used. If we don't allow rseq, Beats will eventually crash with an
glibc error: `Fatal glibc error: rseq registration failed`.

Fixes: #30576
(cherry picked from commit f02fa32)

Co-authored-by: Tiago Queiroz <tiago.queiroz@elastic.co>
belimawr added a commit that referenced this pull request Mar 9, 2022
rseq syscall is available on glibc >= 2.35, and called when CGO is
used. If we don't allow rseq, Beats will eventually crash with an
glibc error: `Fatal glibc error: rseq registration failed`.

Fixes: #30576
(cherry picked from commit f02fa32)

Co-authored-by: Tiago Queiroz <tiago.queiroz@elastic.co>
belimawr added a commit that referenced this pull request Mar 14, 2022
rseq syscall is available on glibc >= 2.35, and called when CGO is
used. If we don't allow rseq, Beats will eventually crash with an
glibc error: `Fatal glibc error: rseq registration failed`.

Fixes: #30576
(cherry picked from commit f02fa32)

Co-authored-by: Tiago Queiroz <tiago.queiroz@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-7.17 Automated backport to the 7.17 branch with mergify backport-v8.0.0 Automated backport with mergify backport-v8.1.0 Automated backport with mergify bug libbeat Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Beats crashing with glibc 2.35 - Fatal glibc error: rseq registration failed
7 participants