Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Auditbeat] system/socket: Monitor all online CPUs #22827

Merged
merged 6 commits into from
Dec 2, 2020

Conversation

adriansr
Copy link
Contributor

@adriansr adriansr commented Dec 1, 2020

What does this PR do?

This patch updates the tracing library in Auditbeat to fetch the list of online CPUs from /sys/devices/system/cpu/online so that it can install kprobes in all of them regardless of its own affinity mask, and correctly skipping offline CPUs.

Why is it important?

Auditbeat's system/socket dataset needs to install kprobes on all online CPUs.

Previously, it was using Go's runtime.NumCPU() to determine the CPUs in the system, and monitoring CPUs 0 to NumCPU-1. This was a mistake that lead to startup failures or loss of events in any of the following scenarios:

  • When Auditbeat is started with a CPU affinity mask that excludes some CPUs.
  • When there are offline or isolated CPUs in the system.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

Easier way to reproduce is to start Auditbeat with a CPU affinity mask that excludes the first CPU and only allows it to run on the second CPU:

sudo taskset 2 auditbeat [...]

This will pin Auditbeat to CPU1 while kprobes will be installed to CPU0, preventing guesses to work.

Alternatively, one can disable a few CPUs before launching Auditbeat:

# echo 0 > /sys/devices/system/cpu/cpu0/online

Related issues

Related #18755

This PR fixes most of the problems reported in the above issue, but the main issue is fixed by #22787

@adriansr adriansr requested a review from a team as a code owner December 1, 2020 16:34
@elasticmachine
Copy link
Collaborator

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Dec 1, 2020
@adriansr adriansr marked this pull request as draft December 1, 2020 16:36
@adriansr adriansr requested review from a team and removed request for a team December 1, 2020 16:36
@adriansr
Copy link
Contributor Author

adriansr commented Dec 1, 2020

Marked as draft until it's tested further.

@elasticmachine
Copy link
Collaborator

elasticmachine commented Dec 1, 2020

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Pull request #22827 updated

  • Start Time: 2020-12-02T18:32:33.011+0000

  • Duration: 31 min 3 sec

Test stats 🧪

Test Results
Failed 0
Passed 232
Skipped 33
Total 265

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 232
Skipped 33
Total 265

Auditbeat's system/socket dataset needs to install kprobes on all
online CPUs.

Previously, it was using runtime.NumCPU() to determine the CPUs in the
system, and monitoring CPUs 0 to NumCPU. This was a mistake that lead
to startup failures or loss of events in any of the following scenarios:
- When Auditbeat is started with a CPU affinity mask that excludes some CPUs
- When there are offline CPUs in the system.

This patch updates the tracing library in Auditbeat to fetch the list of
online CPUs from /sys/devices/system/cpu/online so that it can install
kprobes in all of them regardless of its own affinity mask, and correctly
skipping offline CPUs.

Related elastic#18755
@adriansr adriansr marked this pull request as ready for review December 2, 2020 17:41
@adriansr adriansr added review and removed in progress Pull request is currently in progress. labels Dec 2, 2020
Copy link

@andrewstucki andrewstucki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some questions, which if cleared up I'll go ahead and approve.

x-pack/auditbeat/tracing/cpu.go Outdated Show resolved Hide resolved
x-pack/auditbeat/tracing/cpu.go Show resolved Hide resolved
@adriansr adriansr merged commit 6356887 into elastic:master Dec 2, 2020
@adriansr adriansr added the needs_backport PR is waiting to be backported to other branches. label Dec 2, 2020
adriansr added a commit to adriansr/beats that referenced this pull request Dec 2, 2020
Auditbeat's system/socket dataset needs to install kprobes on all
online CPUs.

Previously, it was using runtime.NumCPU() to determine the CPUs in the
system, and monitoring CPUs 0 to NumCPU. This was a mistake that lead
to startup failures or loss of events in any of the following scenarios:
- When Auditbeat is started with a CPU affinity mask that excludes some CPUs
- When there are offline or isolated CPUs in the system.

This patch updates the tracing library in Auditbeat to fetch the list of
online CPUs from /sys/devices/system/cpu/online so that it can install
kprobes in all of them regardless of its own affinity mask, and correctly
skipping offline CPUs.

Related elastic#18755

(cherry picked from commit 6356887)
@adriansr adriansr added v7.11.0 and removed needs_backport PR is waiting to be backported to other branches. labels Dec 2, 2020
adriansr added a commit to adriansr/beats that referenced this pull request Dec 2, 2020
Auditbeat's system/socket dataset needs to install kprobes on all
online CPUs.

Previously, it was using runtime.NumCPU() to determine the CPUs in the
system, and monitoring CPUs 0 to NumCPU. This was a mistake that lead
to startup failures or loss of events in any of the following scenarios:
- When Auditbeat is started with a CPU affinity mask that excludes some CPUs
- When there are offline or isolated CPUs in the system.

This patch updates the tracing library in Auditbeat to fetch the list of
online CPUs from /sys/devices/system/cpu/online so that it can install
kprobes in all of them regardless of its own affinity mask, and correctly
skipping offline CPUs.

Related elastic#18755

(cherry picked from commit 6356887)
adriansr added a commit that referenced this pull request Dec 3, 2020
Auditbeat's system/socket dataset needs to install kprobes on all
online CPUs.

Previously, it was using runtime.NumCPU() to determine the CPUs in the
system, and monitoring CPUs 0 to NumCPU. This was a mistake that lead
to startup failures or loss of events in any of the following scenarios:
- When Auditbeat is started with a CPU affinity mask that excludes some CPUs
- When there are offline or isolated CPUs in the system.

This patch updates the tracing library in Auditbeat to fetch the list of
online CPUs from /sys/devices/system/cpu/online so that it can install
kprobes in all of them regardless of its own affinity mask, and correctly
skipping offline CPUs.

Related #18755

(cherry picked from commit 6356887)
adriansr added a commit that referenced this pull request Dec 3, 2020
Auditbeat's system/socket dataset needs to install kprobes on all
online CPUs.

Previously, it was using runtime.NumCPU() to determine the CPUs in the
system, and monitoring CPUs 0 to NumCPU. This was a mistake that lead
to startup failures or loss of events in any of the following scenarios:
- When Auditbeat is started with a CPU affinity mask that excludes some CPUs
- When there are offline or isolated CPUs in the system.

This patch updates the tracing library in Auditbeat to fetch the list of
online CPUs from /sys/devices/system/cpu/online so that it can install
kprobes in all of them regardless of its own affinity mask, and correctly
skipping offline CPUs.

Related #18755

(cherry picked from commit 6356887)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants