-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libct/intelrdt: skip reading /proc/cpuinfo #3485
libct/intelrdt: skip reading /proc/cpuinfo #3485
Conversation
Since commit edeb3b3 (#3306) we only read cpuinfo if intelrdt is available in hardware, yet you're right, it does not make sense to read it at all, since we can get the needed info with a few stat(2) calls. Can we take one more step and ditch mountinfo parsing as well? This is only used to check for mbaScEnabled, which is apparently not used anywhere except to avoid stat of @xiaochenshen @marquiz PTAL |
@corhere Thank you very much for working on it.
I am really surprised for the performance hurt from reading /proc/cpuinfo. This issue should be fixed. Thank you.
Yes, in most of the cases the production kernels or older upstream kernels are without this fix. Reading /proc/cpuinfo is too expensive.
I agree with you. We could remove the redundant check on /proc/cpuinfo. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you!
I don't think the mountinfo check could be removed so far. MBA Software Controller is a "pure" software feature based on MBA by kernel, the only way of exposing the capability to user is through mount option "mba_MBps" in current kernel. An alternative hacking way is to check "MB:" line in default schemata file: without mount option "mba_MBps", the data is a percentage in range [0, 100%], but with mount option "mba_MBps", the data is a u32 value in range [0, ((u32)~0U)].
|
Given that Intel's own libraries also parse mountinfo to check whether the I suspect that @xiaochenshen's hacky check might result in false negatives. The That all being said, it looks like the only other place the |
Correct. This hacking way is not reliable.
Sounds good to me.
Yes. Parsing mountinfo is necessary to find the mount point of resctrl fs. We could not assume resctrl fs is always mounted to the recommended default mount point - |
... and elsewhere (I tried to search through some well known users of runc/libcontainer and haven't found any).
Well, this is easy -- first we do statfs(2) on default path ( If not, we fallback to parsing /proc/mountinfo. Currently we can't do the fast path because unfortunately there's no other way to read mount options (and I've actually added the comment explaining that sad situation in commit 5e201e7). If we can get rid of unconditional parsing of mountinfo, that would be awesome. @corhere @xiaochenshen thanks for looking into it! |
c551c6f
to
29e8a89
Compare
@kolyshkin @xiaochenshen I have updated the PR with the discussed fast path, plus some more improvements such that the slow path is only executed if RDT is configured in the container's OCI spec. PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for the fast path updates. Thank you.
I like the idea in this PR very much 😄 I haven't closely reviewed the code yet, though |
if _, err := os.Stat(filepath.Join(root, "info", "L3")); err == nil { | ||
catEnabled = true | ||
} | ||
if _, err := os.Stat(filepath.Join(root, "info", "L3")); err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, there's one consideration that I don't know if we should care about here. If resctrlfs is mounted with -o cdp
we will have L3CODE/
and L3DATA/
(instead of L3/
) under the info/
dir.
I don't remember if CDP is officially supported in the runtime spec but I guess specifying something like L3DATA:0=ff\nL3CODE:0=ff
in intelRdt.l3CacheSchema
would work in practice.
Any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That certainly sounds like we could get false negatives on IsCATEnabled()
if resctrlfs is mounted with -o cdp
and is worth investigating further. However, as the code in main would have the exact same deficiency, any improvements to feature detection or how a container's L3 allocation config is applied when RDT/CDP is enabled in the kernel probably should be made in a follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it seems this can be addressed in a follow-up. @marquiz feel free to create it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
K, sounds good. Especially as the old code has the same limitation as you pointed out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #3537 so that we don't forget 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had just one comment/question. Looks very good to me 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Left a couple of nits wrt two last commits, but overall LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
1 similar comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a go at reviewing this, and found some unused code that (as a result) could simplify this patch.
916f62a
29e8a89
to
916f62a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
// Has the resctrl fs been mounted to the default mount point? | ||
if statfs.Type == unix.RDTGROUP_SUPER_MAGIC { | ||
intelRdtRoot = defaultResctrlMountpoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heh (very!) random comment 😅; I recall some "lunch time" discussions I had with @kolyshkin about "where" comments should go for conditional statements.
My take was that putting the comment before the condition is an easy pitfal to describe the code, not the intent (and you may end up with describing the "if" statement, only in words instead of code), whereas putting it inside the branch "gently" forces one to write the branch itself (why we're doing it).
// check if foo is enabled
if foo {
do stuff
}
if foo {
// foo is enabled; start the flux capacitor to go back to the future.
do stuff
}
(Of course, it highly depends on the situation; in some cases you'd want to describe a more complex conditional block as a whole to describe the whole intent.)
@kolyshkin PTAL |
This function is unused, and removing it simplifies the changes which follow this commit. Signed-off-by: Cory Snider <csnider@mirantis.com>
Reading /proc/cpuinfo is a surprisingly expensive operation. Since kernel version 4.12 [1], opening /proc/cpuinfo on an x86 system can block for around 20 milliseconds while the kernel samples the current CPU frequency. There is a very recent patch [2] which gets rid of the delay, but has yet to make it into the mainline kenel. Regardless, kernels for which opening /proc/cpuinfo takes 20ms will continue to be run in production for years to come. libcontainer only opens /proc/cpuinfo to read the processor feature flags so all the delays to get an accurate snapshot of the CPU frequency are just wasted time. If we wanted to, we could interrogate the CPU features directly from userspace using the `CPUID` instruction. However, Intel and AMD CPUs have flags in different positions for their analogous sub-features and there are CPU quirks [3] which would need to be accounted for. Some Haswell server CPUs support RDT/CAT but are missing the `CPUID` flags advertising their support; the kernel checks for support on that processor family by probing the the hardware using privileged RDMSR/WRMSR instructions [4]. This sort of probing could not be implemented in userspace so it would not be possible to check for RDT feature support in userspace without false negatives on some hardware configurations. It looks like libcontainer reads the CPU feature flags as a kind of optimization so that it can skip checking whether the kernel supports an RDT sub-feature if the hardware support is missing. As the kernel only exposes subtrees in the `resctrl` filesystem for RDT sub-features with hardware and kernel support, checking the CPU feature flags is redundant from a correctness point of view. Remove the /proc/cpuinfo check as it is an optimization which actually hurts performance. [1]: https://unix.stackexchange.com/a/526679 [2]: https://lore.kernel.org/all/20220415161206.875029458@linutronix.de/ [3]: https://github.com/torvalds/linux/blob/7cf6a8a17f5b134b7e783c2d45c53298faef82a7/arch/x86/kernel/cpu/resctrl/core.c#L834-L851 [4]: https://github.com/torvalds/linux/blob/a6b450573b912316ad36262bfc70e7c3870c56d1/arch/x86/kernel/cpu/resctrl/core.c#L111-L153 Signed-off-by: Cory Snider <csnider@mirantis.com>
The intelrdt package only needs to parse mountinfo to find the mount point of the resctrl filesystem. Users are generally going to mount the resctrl filesystem to the pre-created /sys/fs/resctrl directory, so there is a common case where mountinfo parsing is not required. Optimize for the common case with a fast path which checks both for the existence of the /sys/fs/resctrl directory and whether the resctrl filesystem was mounted to that path using a single statfs syscall. Signed-off-by: Cory Snider <csnider@mirantis.com>
The OCI runtime spec mandates "[i]f intelRdt is not set, the runtime MUST NOT manipulate any resctrl pseudo-filesystems." Attempting to delete files counts as manipulating, so stop doing that when the container's RDT configuration is nil. Signed-off-by: Cory Snider <csnider@mirantis.com>
Unless the container's runtime config has intelRdt configuration set, any checks for whether Intel RDT is supported or the resctrl filesystem is mounted are a waste of time as, per the OCI Runtime Spec, "the runtime MUST NOT manipulate any resctrl pseudo-filesystems." And in the likely case where Intel RDT is supported by both the hardware and kernel but the resctrl filesystem is not mounted, these checks can get expensive as the intelrdt package needs to parse mountinfo to check whether the filesystem has been mounted to a non-standard path. Optimize for the common case of containers with no intelRdt configuration by only performing the checks when the container has opted in. Signed-off-by: Cory Snider <csnider@mirantis.com>
916f62a
to
ea0bd78
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Reading
/proc/cpuinfo
is a surprisingly expensive operation. Since kernel version 4.12, opening/proc/cpuinfo
on an x86 system can block for around 20 milliseconds while the kernel samples the current CPU frequency. There is a very recent patch which gets rid of the delay, but has yet to make it into the mainline kernel. Regardless, kernels for which opening/proc/cpuinfo
takes 20ms will continue to be run in production for years to come. libcontainer only opens/proc/cpuinfo
to read the processor feature flags so all the delays to get an accurate snapshot of the CPU frequency are just wasted time.If we wanted to, we could interrogate the CPU features directly from userspace using the
CPUID
instruction. However, Intel and AMD CPUs have flags in different positions for their analogous sub-features and there are CPU quirks which would need to be accounted for. Some Haswell server CPUs support RDT/CAT but are missing theCPUID
flags advertising their support; the kernel checks for support on that processor family by probing the the hardware using privilegedRDMSR
/WRMSR
instructions. This sort of probing could not be implemented in userspace so it would not be possible to check for RDT feature support in userspace without false negatives on some hardware configurations.It looks like libcontainer reads the CPU feature flags as a kind of optimization so that it can skip checking whether the kernel supports an RDT sub-feature if the hardware support is missing. As the kernel only exposes subtrees in the
resctrl
filesystem for RDT sub-features with hardware and kernel support, checking the CPU feature flags is redundant from a correctness point of view. Remove the/proc/cpuinfo
check as it is an optimization which actually hurts performance.Benchmarks: runc run is 17 ms faster when "intelRdt" is enabled in config.json
/sys/fs/resctrl
mounted"args": ["/bin/true"]
"linux": {"intelRdt": {"memBwSchema": "MB:0=20;1=70"}}