[10.4-stable] Estimating Memory Overhead Before Domain Creation #3599

OhmSpectator · 2023-11-17T15:34:28Z

It's a backport of #3588 into 10.4.

I had to refactor the code a bit:

The 10.4 version has no reworked overhead estimation introduced later by @rucoder, so I used the old implementation to get it. This implementation does not take adapters into account.

For the tests, I just checked that an app started successfully. I can perform the tests that touch the new logic, but it has not been done yet.

eriknordmark

Same issues/concerns as in #3598

OhmSpectator · 2023-11-20T15:21:31Z

The list of the commits to be backported:

Processing PR #3383...
PR #3383 - Master Hash: 10f3ba6
PR #3383 - Master Hash: d88091b
Processing PR #3504...
PR #3504 - Master Hash: c79d3fe

- Move PCI device related functions into a new pci.go - Rename typeAndPci to pciDevice - Implement functions on pciDevice structure - Fix linter warnings Signed-off-by: Mikhail Malyshev <mikem@zededa.com>

The original formula was "600 Mb + 0.25% RAM" The empirical constant 600 Mb doesn't cover all the cases and is too big for certain VMs. New formula dynamically adjusts VMM overhead taking into account No. vCPUs and the size of MMIO regions mapped for PCI devices. Some devices may have very large aperture size. E.g. NVIDIA A40 has 64 Gb apertura. If the VM has many GPUs assigned the overhead may be significant. The analysis showed that QEMU uses 0.6 - 0.85% of MMIO region size for internal management. We allocate 1% to be on a save side QEMU itself consumes ~20 Mb (constant) for code and shared libraries and ~300 Mb is still unknown. Each vCPU requires ~3 Mb. So the formula is (No. vCPU * 3Mb) + 1% total-MMIO + 0.25% RAM + 20 Mb QEMU + 300 MB unknown For VMs with 2 CPU and 1GB RAM we now have ~354 Mb overhead instead of 626 Mb Signed-off-by: Mikhail Malyshev <mikem@zededa.com>

The commit d88091b ("New formula to estimate VMM memory overhead (x86)") breaks the container case and does not account vRAM, which can be set to a big value, which causes QEMU process to be killed by the OOM killer. This patch removes the `IsOCIContainer()` check, which makes the default estimation path to be taken and eventually this amount of overhead limit should be set for all the cases (vms and containers): 350Mb (undefined mappings) + 20Mb (QEMU binaries mappings) + 3Mb * nvCPUs (vCPU stacks) + 0 (MMIO regions, which presumably 0 if no direct attach) + vRAM * 0.25 So for a minimal reasonable container configuration with 4 vCPUs and 500Mb of vRAM, without PCI devices direct attach the overhead limit will be set to ~500Mb, which should be safe. What is the other motivation to increase the value from 100Mb/300Mb to 500Mb for the container case rather than vRAM accounting? We notice increased number of customer issues on the latest EVE versions, when QEMU process was killed by the OOM killer. Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>

This commit significantly enhances the memory management strategy for new domain creation in ZEDManager by incorporating memory overhead estimation prior to domain creation. Key changes include: - Implemented memory overhead calculation in `updatestatus.go` using the `CountMemOverhead` function from the hypervisor package. - Updated `getRemainingMemory` in `memorysizemgmt.go` to account for memory overhead in the total used memory size. - Extended `pillar/hypervisor` package files to support memory overhead estimation across different hypervisors. - Significantly, added assignable adapters information to the `zedmanagerContext`. This involved creating a new subscription for assignable adapters within `zedmanager.go` and handling their updates. This addition is crucial for accurately computing memory overhead. This update ensures that domains are only started when there is enough memory, considering both the domain's requirements and the overhead, thereby enhancing system stability and performance. Signed-off-by: Nikolay Martyanov <nikolay@zededa.com>

…of DomainConfig. In this commit, the CountMemOverhead function across various hypervisors (containerd, kvm, null) is refactored. The change involves passing individual parameters (like domain UUID, memory size, CPU counts, etc.) directly to the function, rather than encapsulating them within a DomainConfig object. This approach eliminates the need for creating a mock DomainConfig just to calculate memory overhead, leading to cleaner and more explicit code. The changes are applied consistently across the relevant files in the zedmanager and hypervisor packages. Signed-off-by: Nikolay Martyanov <nikolay@zededa.com>

OhmSpectator · 2023-11-20T16:18:59Z

I've added the commits related to the overhead estimation that were not packported before.
Need to retest.

eriknordmark

LGTM

OhmSpectator requested review from eriknordmark and rvs as code owners November 17, 2023 15:34

OhmSpectator requested a review from rouming November 17, 2023 15:34

eriknordmark requested changes Nov 17, 2023

View reviewed changes

mikem-zed and others added 5 commits November 20, 2023 17:05

Refactor PCI related code for hypervisors

c3f7e3a

- Move PCI device related functions into a new pci.go - Rename typeAndPci to pciDevice - Implement functions on pciDevice structure - Fix linter warnings Signed-off-by: Mikhail Malyshev <mikem@zededa.com>

OhmSpectator force-pushed the bugfix/take-overhead-into-account-10.4 branch from a41e382 to d5899ae Compare November 20, 2023 16:17

eriknordmark approved these changes Nov 21, 2023

View reviewed changes

eriknordmark merged commit 55ab7aa into lf-edge:10.4-stable Nov 21, 2023
9 of 11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[10.4-stable] Estimating Memory Overhead Before Domain Creation #3599

[10.4-stable] Estimating Memory Overhead Before Domain Creation #3599

OhmSpectator commented Nov 17, 2023

eriknordmark left a comment

OhmSpectator commented Nov 20, 2023 •

edited

Loading

OhmSpectator commented Nov 20, 2023 •

edited

Loading

eriknordmark left a comment

[10.4-stable] Estimating Memory Overhead Before Domain Creation #3599

[10.4-stable] Estimating Memory Overhead Before Domain Creation #3599

Conversation

OhmSpectator commented Nov 17, 2023

eriknordmark left a comment

Choose a reason for hiding this comment

OhmSpectator commented Nov 20, 2023 • edited Loading

OhmSpectator commented Nov 20, 2023 • edited Loading

eriknordmark left a comment

Choose a reason for hiding this comment

OhmSpectator commented Nov 20, 2023 •

edited

Loading

OhmSpectator commented Nov 20, 2023 •

edited

Loading