Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

app/virtio-ha: add PF reset before DMA table clean up #105

Merged
merged 3 commits into from
Jul 9, 2024

Conversation

Ch3n60x
Copy link
Collaborator

@Ch3n60x Ch3n60x commented Jul 1, 2024

When vfe-vhostd crashes, it could happen that some adminQ command is in-flight, for example, some adminQ command is sent just before the crash. Before this commit, DMA mapping of global container will be cleaned up upon vfe-vhostd quit. So for PFs, it could happen that adminQ command response comes after DMA mapping clean-up, resulting in IO_PAGE_FAULT in kernel.

This commit fixes this issue by doing a PF reset before DMA clean-up.

RM: 3957706

When vfe-vhostd crashes, it could happen that some adminQ command is
in-flight, for example, some adminQ command is sent just before the
crash. Before this commit, DMA mapping of global container will be
cleaned up upon vfe-vhostd quit. So for PFs, it could happen that
adminQ command response comes after DMA mapping clean-up, resulting
in IO_PAGE_FAULT in kernel.

This commit fixes this issue by doing a PF reset before DMA clean-up.

RM: 3957706

Signed-off-by: Chenbo Xia <chenbox@nvidia.com>
When doing hot-upgrade, we need to know that vfe-vhostd and
vfe-vhostd-ha init finish or not. This commit adds the related
log and corresponding HA IPC message so that vfe-vhostd could
notify vfe-vhostd-ha that init finishes.

Signed-off-by: Chenbo Xia <chenbox@nvidia.com>
Before this commit, we use HPA for checking if old and new memory region
is the same or not. This is for a corner case that when vhostd restart,
qemu also restart, then qemu could send different memory region with same
info (QEMU_VA, GPA, SIZE). Previously we use HPA to handle this case,
but the side effect is we need to use MAP_POPULATE flag for mmap call,
which results in more time used in mmap. In real environment, the time
could be several seconds when mmap hundreds of GB memory.

This commit removes the usage of HPA and MAP_POPULATE flag, but use QEMU
process id to handle the corner case.

Signed-off-by: Chenbo Xia <chenbox@nvidia.com>
@kailiangz1 kailiangz1 merged commit 814e8b0 into Mellanox:main Jul 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants