Skip to content

Commit

Permalink
Merge branch 'main' into tests/parametrize_benchmark
Browse files Browse the repository at this point in the history
  • Loading branch information
cm-iwata authored Jan 21, 2025
2 parents 033ca8f + dfb45dc commit a10045b
Show file tree
Hide file tree
Showing 26 changed files with 679 additions and 424 deletions.
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,13 @@ and this project adheres to

### Added

- [#4987](https://github.com/firecracker-microvm/firecracker/pull/4987): Reset
physical counter register (`CNTPCT_EL0`) on VM startup. This avoids VM reading
the host physical counter value. This is only possible on 6.4 and newer
kernels. For older kernels physical counter will still be passed to the guest
unmodified. See more info
[here](https://github.com/firecracker-microvm/firecracker/blob/main/docs/prod-host-setup.md#arm-only-vm-physical-counter-behaviour)

### Changed

- [#4913](https://github.com/firecracker-microvm/firecracker/pull/4913): Removed
Expand Down
17 changes: 17 additions & 0 deletions docs/ballooning.md
Original file line number Diff line number Diff line change
Expand Up @@ -263,3 +263,20 @@ cannot be enabled later by providing a `polling_interval` non-zero value.
Furthermore, if the balloon was configured with statistics pre-boot through a
non-zero `stats_polling_interval_s` value, the statistics cannot be disabled
through a `polling_interval` value of zero post-boot.

## Balloon Caveats

- Firecracker has no control over the speed of inflation or deflation; this is
dictated by the guest kernel driver.

- The balloon will continually attempt to reach its target size, which can be a
CPU-intensive process. It is therefore recommended to set realistic targets
or, after a period of stagnation in the inflation, update the target size to
be close to the inflated size.

- The `deflate_on_oom` flag is a mechanism to prevent the guest from crashing or
terminating processes; it is not meant to be used continually to free memory.
Doing this will be a CPU-intensive process, as the balloon driver is designed
to deflate and release memory slowly. This is also compounded if the balloon
has yet to reach its target size, as it will attempt to inflate while also
deflating.
18 changes: 9 additions & 9 deletions docs/prod-host-setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -328,13 +328,16 @@ For vendor-specific recommendations, please consult the resources below:
- ARM:
[Speculative Processor Vulnerability](https://developer.arm.com/support/arm-security-updates/speculative-processor-vulnerability)

##### [ARM only] Physical counter directly passed through to the guest
##### [ARM only] VM Physical counter behaviour

On ARM, the physical counter (i.e `CNTPCT`) it is returning the
[actual EL1 physical counter value of the host][1]. From the discussions before
merging this change [upstream][2], this seems like a conscious design decision
of the ARM code contributors, giving precedence to performance over the ability
to trap and control this in the hypervisor.
On ARM, Firecracker tries to reset the `CNTPCT` physical counter on VM boot.
This is done in order to prevent VM from reading host physical counter value.
Firecracker will only try to reset the counter if the host KVM contains
`KVM_CAP_COUNTER_OFFSET` capability. This capability is only present in kernels
containing
[this](https://lore.kernel.org/all/20230330174800.2677007-1-maz@kernel.org/)
patch series (starting from 6.4 and newer). For older kernels the counter value
will be passed through from the host.

##### Verification

Expand Down Expand Up @@ -428,6 +431,3 @@ To validate that the change took effect, the file
[^1]: Look for `GRUB_CMDLINE_LINUX` in file `/etc/default/grub` in RPM-based
systems, and
[this doc for Ubuntu](https://wiki.ubuntu.com/Kernel/KernelBootParameters).

[1]: https://elixir.free-electrons.com/linux/v4.14.203/source/virt/kvm/arm/hyp/timer-sr.c#L63
[2]: https://lists.cs.columbia.edu/pipermail/kvmarm/2017-January/023323.html
25 changes: 19 additions & 6 deletions src/firecracker/examples/uffd/uffd_utils.rs
Original file line number Diff line number Diff line change
Expand Up @@ -61,17 +61,30 @@ impl UffdHandler {
let mut message_buf = vec![0u8; 1024];
let (bytes_read, file) = stream
.recv_with_fd(&mut message_buf[..])
.expect("Cannot recv_with_fd");
.expect("Cannot read from a stream");
message_buf.resize(bytes_read, 0);

let body = String::from_utf8(message_buf).unwrap();
let file = file.expect("Uffd not passed through UDS!");
let body = String::from_utf8(message_buf.clone()).unwrap_or_else(|_| {
panic!(
"Received body is not a utf-8 valid string. Raw bytes received: {message_buf:#?}"
)
});
let file =
file.unwrap_or_else(|| panic!("Did not receive Uffd from UDS. Received body: {body}"));

let mappings = serde_json::from_str::<Vec<GuestRegionUffdMapping>>(&body)
.expect("Cannot deserialize memory mappings.");
let mappings =
serde_json::from_str::<Vec<GuestRegionUffdMapping>>(&body).unwrap_or_else(|_| {
panic!("Cannot deserialize memory mappings. Received body: {body}")
});
let memsize: usize = mappings.iter().map(|r| r.size).sum();
// Page size is the same for all memory regions, so just grab the first one
let page_size = mappings.first().unwrap().page_size_kib;
let first_mapping = mappings.first().unwrap_or_else(|| {
panic!(
"Cannot get the first mapping. Mappings size is {}. Received body: {body}",
mappings.len()
)
});
let page_size = first_mapping.page_size_kib;

// Make sure memory size matches backing data size.
assert_eq!(memsize, size);
Expand Down
6 changes: 6 additions & 0 deletions src/vmm/src/arch/aarch64/regs.rs
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,12 @@ arm64_sys_reg!(SYS_CNTV_CVAL_EL0, 3, 3, 14, 3, 2);
// https://elixir.bootlin.com/linux/v6.8/source/arch/arm64/include/asm/sysreg.h#L459
arm64_sys_reg!(SYS_CNTPCT_EL0, 3, 3, 14, 0, 1);

// Physical Timer EL0 count Register
// The id of this register is same as SYS_CNTPCT_EL0, but KVM defines it
// separately, so we do as well.
// https://elixir.bootlin.com/linux/v6.12.6/source/arch/arm64/include/uapi/asm/kvm.h#L259
arm64_sys_reg!(KVM_REG_ARM_PTIMER_CNT, 3, 3, 14, 0, 1);

// Translation Table Base Register
// https://developer.arm.com/documentation/ddi0595/2021-03/AArch64-Registers/TTBR1-EL1--Translation-Table-Base-Register-1--EL1-
arm64_sys_reg!(TTBR1_EL1, 3, 0, 2, 0, 1);
Expand Down
60 changes: 49 additions & 11 deletions src/vmm/src/arch/aarch64/vcpu.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ use kvm_ioctls::VcpuFd;

use super::get_fdt_addr;
use super::regs::*;
use crate::vstate::kvm::OptionalCapabilities;
use crate::vstate::memory::GuestMemoryMmap;

/// Errors thrown while setting aarch64 registers.
Expand Down Expand Up @@ -78,6 +79,7 @@ pub fn setup_boot_regs(
cpu_id: u8,
boot_ip: u64,
mem: &GuestMemoryMmap,
optional_capabilities: &OptionalCapabilities,
) -> Result<(), VcpuError> {
let kreg_off = offset_of!(kvm_regs, regs);

Expand Down Expand Up @@ -106,6 +108,23 @@ pub fn setup_boot_regs(
vcpufd
.set_one_reg(id, &get_fdt_addr(mem).to_le_bytes())
.map_err(|err| VcpuError::SetOneReg(id, err))?;

// Reset the physical counter for the guest. This way we avoid guest reading
// host physical counter.
// Resetting KVM_REG_ARM_PTIMER_CNT for single vcpu is enough because there is only
// one timer struct with offsets per VM.
// Because the access to KVM_REG_ARM_PTIMER_CNT is only present starting 6.4 kernel,
// we only do the reset if KVM_CAP_COUNTER_OFFSET is present as it was added
// in the same patch series as the ability to set the KVM_REG_ARM_PTIMER_CNT register.
// Path series which introduced the needed changes:
// https://lore.kernel.org/all/20230330174800.2677007-1-maz@kernel.org/
// Note: the value observed by the guest will still be above 0, because there is a delta
// time between this resetting and first call to KVM_RUN.
if optional_capabilities.counter_offset {
vcpufd
.set_one_reg(KVM_REG_ARM_PTIMER_CNT, &[0; 8])
.map_err(|err| VcpuError::SetOneReg(id, err))?;
}
}
Ok(())
}
Expand Down Expand Up @@ -214,20 +233,21 @@ pub fn set_mpstate(vcpufd: &VcpuFd, state: kvm_mp_state) -> Result<(), VcpuError
#[cfg(test)]
mod tests {
#![allow(clippy::undocumented_unsafe_blocks)]
use kvm_ioctls::Kvm;

use super::*;
use crate::arch::aarch64::layout;
use crate::test_utils::arch_mem;
use crate::vstate::kvm::Kvm;

#[test]
fn test_setup_regs() {
let kvm = Kvm::new().unwrap();
let vm = kvm.create_vm().unwrap();
let kvm = Kvm::new(vec![]).unwrap();
let vm = kvm.fd.create_vm().unwrap();
let vcpu = vm.create_vcpu(0).unwrap();
let mem = arch_mem(layout::FDT_MAX_SIZE + 0x1000);
let optional_capabilities = kvm.optional_capabilities();

let res = setup_boot_regs(&vcpu, 0, 0x0, &mem);
let res = setup_boot_regs(&vcpu, 0, 0x0, &mem, &optional_capabilities);
assert!(matches!(
res.unwrap_err(),
VcpuError::SetOneReg(0x6030000000100042, _)
Expand All @@ -237,13 +257,31 @@ mod tests {
vm.get_preferred_target(&mut kvi).unwrap();
vcpu.vcpu_init(&kvi).unwrap();

setup_boot_regs(&vcpu, 0, 0x0, &mem).unwrap();
setup_boot_regs(&vcpu, 0, 0x0, &mem, &optional_capabilities).unwrap();

// Check that the register is reset on compatible kernels.
// Because there is a delta in time between we reset the register and time we
// read it, we cannot compare with 0. Instead we compare it with meaningfully
// small value.
if optional_capabilities.counter_offset {
let mut reg_bytes = [0_u8; 8];
vcpu.get_one_reg(SYS_CNTPCT_EL0, &mut reg_bytes).unwrap();
let counter_value = u64::from_le_bytes(reg_bytes);

// We are reading the SYS_CNTPCT_EL0 right after resetting it.
// If reset did happen successfully, the value should be quite small when we read it.
// If the reset did not happen, the value will be same as on the host and it surely
// will be more that MAX_VALUE.
let max_value = 1000;

assert!(counter_value < max_value);
}
}

#[test]
fn test_read_mpidr() {
let kvm = Kvm::new().unwrap();
let vm = kvm.create_vm().unwrap();
let kvm = Kvm::new(vec![]).unwrap();
let vm = kvm.fd.create_vm().unwrap();
let vcpu = vm.create_vcpu(0).unwrap();
let mut kvi: kvm_bindings::kvm_vcpu_init = kvm_bindings::kvm_vcpu_init::default();
vm.get_preferred_target(&mut kvi).unwrap();
Expand All @@ -261,8 +299,8 @@ mod tests {

#[test]
fn test_get_set_regs() {
let kvm = Kvm::new().unwrap();
let vm = kvm.create_vm().unwrap();
let kvm = Kvm::new(vec![]).unwrap();
let vm = kvm.fd.create_vm().unwrap();
let vcpu = vm.create_vcpu(0).unwrap();
let mut kvi: kvm_bindings::kvm_vcpu_init = kvm_bindings::kvm_vcpu_init::default();
vm.get_preferred_target(&mut kvi).unwrap();
Expand All @@ -283,8 +321,8 @@ mod tests {
fn test_mpstate() {
use std::os::unix::io::AsRawFd;

let kvm = Kvm::new().unwrap();
let vm = kvm.create_vm().unwrap();
let kvm = Kvm::new(vec![]).unwrap();
let vm = kvm.fd.create_vm().unwrap();
let vcpu = vm.create_vcpu(0).unwrap();
let mut kvi: kvm_bindings::kvm_vcpu_init = kvm_bindings::kvm_vcpu_init::default();
vm.get_preferred_target(&mut kvi).unwrap();
Expand Down
Loading

0 comments on commit a10045b

Please sign in to comment.