Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hugepages support for Firecracker #4360

Merged
merged 17 commits into from
Feb 14, 2024
Merged

Conversation

roypat
Copy link
Contributor

@roypat roypat commented Jan 12, 2024

Changes

Adds initial support for backing guest memory with huge pages. They can be configured via the /machine-config API, with Firecracker supporting 2M hugetlbfs pages. Huge page configuration is stored in the snapshot state, and a snapshot will be restored into a VM whose huge pages configuration matches that of the VM from which the snapshot was taken (with the operation erroring out if this is not possible for some reason, e.g. if trying to restore a hugetlbfs backed snapshot by mmaping the memory file). For the initial devpreview release, features that would require explicit integration with huge pages (such as memory ballooning) are mutually exclusive with huge page support.

This is linked to the #2139.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following
Developer Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • If a specific issue led to this PR, this PR closes the issue.
  • The description of changes is clear and encompassing.
  • Any required documentation changes (code and docs) are included in this PR.
  • API changes follow the Runbook for Firecracker API changes.
  • User-facing changes are mentioned in CHANGELOG.md.
  • All added/changed functionality is tested.
  • New TODOs link to an issue.
  • Commits meet contribution quality standards.

  • This functionality cannot be added in rust-vmm.

Copy link

codecov bot commented Jan 12, 2024

Codecov Report

Attention: 26 lines in your changes are missing coverage. Please review.

Comparison is base (1f3848f) 81.37% compared to head (e179b9f) 81.37%.

Files Patch % Lines
src/vmm/src/persist.rs 0.00% 15 Missing ⚠️
src/vmm/src/resources.rs 76.19% 5 Missing ⚠️
src/vmm/src/vmm_config/machine_config.rs 85.71% 5 Missing ⚠️
src/vmm/src/vstate/memory.rs 96.15% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4360   +/-   ##
=======================================
  Coverage   81.37%   81.37%           
=======================================
  Files         243      243           
  Lines       29431    29518   +87     
=======================================
+ Hits        23949    24021   +72     
- Misses       5482     5497   +15     
Flag Coverage Δ
4.14-c5n.metal 78.66% <74.75%> (?)
4.14-c7g.metal ?
4.14-m5d.metal ?
4.14-m5n.metal 78.65% <74.75%> (?)
4.14-m6a.metal 77.77% <74.75%> (-0.01%) ⬇️
4.14-m6g.metal 76.73% <74.75%> (-0.01%) ⬇️
4.14-m6i.metal 78.64% <74.75%> (-0.01%) ⬇️
4.14-m7g.metal 76.73% <74.75%> (?)
5.10-c5n.metal 81.32% <73.78%> (?)
5.10-c7g.metal ?
5.10-m5d.metal ?
5.10-m5n.metal 81.31% <73.78%> (?)
5.10-m6a.metal 80.53% <73.78%> (-0.02%) ⬇️
5.10-m6g.metal 79.63% <73.78%> (-0.02%) ⬇️
5.10-m6i.metal 81.31% <73.78%> (-0.03%) ⬇️
5.10-m7g.metal 79.63% <73.78%> (?)
6.1-c5n.metal 81.32% <73.78%> (?)
6.1-m5n.metal 81.31% <73.78%> (?)
6.1-m6a.metal 80.53% <73.78%> (-0.02%) ⬇️
6.1-m6g.metal 79.63% <73.78%> (-0.02%) ⬇️
6.1-m6i.metal 81.31% <73.78%> (-0.03%) ⬇️
6.1-m7g.metal 79.63% <73.78%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@roypat roypat force-pushed the hugepages branch 4 times, most recently from ac53626 to c50f991 Compare January 12, 2024 12:02
@roypat roypat self-assigned this Jan 12, 2024
@roypat
Copy link
Contributor Author

roypat commented Jan 12, 2024

@avirtuos tagging you in here for visibility. GitHub is not letting me explicitly add you as a review for some reason. This PR is where the initial huge pages work will take place :)

@roypat roypat force-pushed the hugepages branch 16 times, most recently from e8ee021 to c34853c Compare January 18, 2024 16:48
@roypat roypat force-pushed the hugepages branch 7 times, most recently from f1f946a to 4622838 Compare January 23, 2024 12:17
@ShadowCurse
Copy link
Contributor

I don't really see an obvious way to label things that arent "fix", "doc" or "chore", and adding prefixes would force me to rewrite some commit titles due to the 70 char limit.

I would mark most of them as feat(huge-pages) or feat(uffd). Also most of commits have huge pages words in them, so replacing them with feat(huge-pages) will not be a huge change (even thought it is +6 chars)
Unfortunately we don't enforce conventional commit style, but we unofficially do it anyway and I think it is not extremely difficult to stick to it here.

@roypat
Copy link
Contributor Author

roypat commented Feb 14, 2024

Unfortunately we don't enforce conventional commit style, but we unofficially do it anyway and I think it is not extremely difficult to stick to it here.

Yes, as you mention, we do not enforce it, so we also should not insist on it on reviews, as its up to the author. I prefer to not put them outside of very unambiguous cases (e.g. doc commits).

This field allows specifying whether guest memory for this microVM
should be backed by regular, 4K, pages or 2M hugetlbfs pages.
Configuration fails if guest memory size is not a multiple of selected
page size.

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Mark it as developer preview in case not everything gets worked out
before 1.7.

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Update the memory allocation code to be able to utilize hugetlbfs, and
pass the corresponding arguments through from the api. Currently
snapshots always restore on 4K pages, as huge page configuration is not
yet saved in the vmstate file.

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
We store the huge pages configuration in the snapshot's vmstate file,
and enforce that a snapshot gets restored with the same hugepages
configuration with which it was taken (for simplicity reasons).

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Support for memfd_create with `MFD_HUGETLB | MFD_ALLOW_SEALING` was only
added in 4.16, so trying to use hugetlbfs backed guest memory on 4.14
host will fail [1]. Make the error message shown to the customer a bit
nicer.

[1]: https://man7.org/linux/man-pages/man2/memfd_create.2.html

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Tests that explicitly check API responses now need to deal with the new
huge_pages parameter.

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Attempts to boot a microvm with guest memory backed by 2MB hugetlbfs
pages. Adjusts the test infrastructure to allocate 2MB pages prior to
test run (failing if it cannot do so). For now, we rely on no other
process on the host trying to use hugetlbfs.

The test is skipped on 4.14 because hugetlbfs support for sealable
memfds was only added in 4.16.

We put the test as a performance test to ensure it runs on ag=1 agents,
to avoid problems with different agents on the same metal concurrently
modifying the hugetlbfs pool.

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Make the `valid_handler.rs` code sample page-size agnostic, in
preparating for hugepages tests.

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
The test has to be UFFD based, as we cannot mmap the file with hugetlbfs
enabled (as `MAP_HUGETLB` is a modifier to `MAP_ANONYMOUS`, which
precludes file mappings).

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
The balloon device does not work with huge pages, so for now disallow
using them together.

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Booting our initrd artifact inside a huge-pages enabled VM causes it to
get stuck, so for now this is seemingly not supported.

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
An EPT_VIOLATION kvm_exit happens when the MMU's extended page tables
are missing an entry for some guest physical address (e.g. some page is
present in the guest page tables, but KVM has not yet set up a mapping
of guest-physical->host-physical address for it). This happens after
snapshot restore even if a page is faulted in via UFFD, as fauling in
via UFFD only maps the page into host userspace (e.g. Firecracker), but
does not set up the EPT entries.

We track the number of EPT_VIOLATIONS post restore when using UFFD for
both 4K and 2M pages, as we expect their number to be significantly
lower when using huge pages.

We use a special UFFD handler that faults in the entire guest memory
ahead of time, as otherwise we just track normal page faults.

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Differential snapshots work with hugetlbfs pages out of the box. This is
because despite guest memory being backed by 2M pages, KVM still keeps a
dirty log at 4K granularity. This means we do not need to adjust our
differential snapshot logic to handle 2M chunks, as the existing logic
for 4K chunks stays valid.

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Document how to use hugetlbfs with Firecracker.

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Add the huge_pages field to the /machine-config documentation.

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Add an entry about huge page support.

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
@roypat roypat force-pushed the hugepages branch 2 times, most recently from d72c786 to f68c09a Compare February 14, 2024 10:38
@roypat roypat merged commit ecd9a45 into firecracker-microvm:main Feb 14, 2024
6 of 7 checks passed
roypat added a commit to roypat/firecracker that referenced this pull request Feb 16, 2024
When regenerating artifacts for firecracker-microvm#4360, the guest kernels got updated to
a new patch version.

Fixes firecracker-microvm#4454

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
zulinx86 pushed a commit that referenced this pull request Feb 16, 2024
When regenerating artifacts for #4360, the guest kernels got updated to
a new patch version.

Fixes #4454

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Awaiting review Indicates that a pull request is ready to be reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants