Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Vfio multifunction devices #4394

Merged
merged 3 commits into from
Nov 14, 2024

Conversation

christoph-zededa
Copy link
Contributor

@christoph-zededa christoph-zededa commented Oct 22, 2024

see also #3369

@rene
Copy link
Contributor

rene commented Oct 23, 2024

@christoph-zededa , FYI, Lenovo SE70 (based on Jetson Xavier NX) is a good candidate to test these changes on arm64... it has some peripherals on PCIe bus and it implements SMMU....

@@ -387,12 +387,19 @@ const qemuRootPortPciPassthruTemplate = `
addr = "{{printf "0x%x" .PCIId}}"
`

const qemuPCIPassthruBridgeTemplate = `
[device "pcie-bridge.{{.PCIId}}"]
driver = "pcie-pci-bridge"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why pcir-pci and no pcie-pcie? Shown PCIe device ans PCI has consequences and usually not a good idea

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, because there is no need to translate. You either create a dedicated port to control BDF assignments or plug device into a PCIe switch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this:

[device "pci.7"]
  driver = "pcie-root-port"
  port = "17"
  chassis = "7"
  bus = "pcie.0"
  multifunction = "on"
  addr = "0x7"
  slot = "7"



[device "upstream_port.1"]
  driver = "x3130-upstream"
  bus = "pci.7"

[device "downstream_port1"]
  driver = "xio3130-downstream"
  bus = "upstream_port.1"
  slot = "71"

[device "downstream_port2"]
  driver = "xio3130-downstream"
  bus = "upstream_port.1"
  slot = "72"



#[device "pcie-bridge.7"]
#  driver = "pcie-pci-bridge"
#  bus = "pci.7"
#  addr = "0x0"

[device]
  driver = "vfio-pci"
  host = "00:0d.0"
#  bus = "pcie-bridge.7"
  bus = "downstream_port1"
  addr = "0x1"
[device]
  driver = "vfio-pci"
  host = "00:0d.2"
#  bus = "pcie-bridge.7"
  bus = "downstream_port2"
  addr = "0x2"

but I get the same error message:

qemu-system-x86_64:xen1.cfg:264: vfio 0000:00:0d.2: group 7 used in multiple address spaces

@github-actions github-actions bot requested a review from rucoder November 1, 2024 16:31
@christoph-zededa christoph-zededa changed the title WIP: Vfio multifunction Enable Vfio multifunction devices Nov 11, 2024
@christoph-zededa christoph-zededa marked this pull request as ready for review November 11, 2024 11:35
pkg/pillar/hypervisor/kvm.go Outdated Show resolved Hide resolved
pkg/pillar/hypervisor/kvm.go Show resolved Hide resolved
Copy link
Member

@OhmSpectator OhmSpectator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only real concern I put to the comment on line 1099 (#4394 (comment)); the rest are mostly recommendations and questions.

pkg/pillar/hypervisor/kvm.go Outdated Show resolved Hide resolved

for i, dev := range pciDevices.devs {
if p.ioType == dev.ioType && p.pciLong == dev.pciLong {
return i
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We depend on the slice remaining unmodified and unsorted after it is created. If we remove an element from it, the indices can change. Currently, we don't have this behaviour, but it would be beneficial to explicitly state the requirement that the slice remains unmodified. A comment added to the slice would be a good solution.

pkg/pillar/hypervisor/kvm.go Show resolved Hide resolved
pkg/pillar/hypervisor/kvm.go Outdated Show resolved Hide resolved
pkg/pillar/hypervisor/kvm.go Show resolved Hide resolved
pkg/pillar/hypervisor/kvm.go Show resolved Hide resolved
pkg/pillar/hypervisor/kvm.go Outdated Show resolved Hide resolved
pkg/pillar/hypervisor/kvm.go Outdated Show resolved Hide resolved
Copy link
Member

@OhmSpectator OhmSpectator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, it looks fine to me.
I hope @christoph-zededa retest it with the latest changes, and we can run the automated tests as well.

@OhmSpectator
Copy link
Member

@christoph-zededa could you please rebase onto the latest master? It should fix the Eden tests run. =)

@OhmSpectator
Copy link
Member

It would also be nice to document the trick with the bridge and explain why it's necessary... But I don't see any document where we have our passthrough approach explained. So I don't know where to add it =(

template for future restructuring

Signed-off-by: Christoph Ostarek <christoph@zededa.com>
into separate function to make it easier to test it

Signed-off-by: Christoph Ostarek <christoph@zededa.com>
add support for multifunction pci devices

Signed-off-by: Christoph Ostarek <christoph@zededa.com>
Copy link
Member

@OhmSpectator OhmSpectator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Release the Kraken!
(run the Eden tests!)

@OhmSpectator
Copy link
Member

"You have reached your pull rate limit." >=(

@OhmSpectator
Copy link
Member

OhmSpectator commented Nov 14, 2024

Another test fails with

The self-hosted runner: buildjet.com_e72335e4-67c2-4a64-8a95-ae86c04c78c0 lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.

UPD. Three of them failed with this.

@shjala
Copy link
Member

shjala commented Nov 14, 2024

at least Eden onboard issue is gone :

        # eden onboard (0.000s)
        > [!exec:bash] stop
        > [!exec:jq] stop
        # enable templates check if we run with TPM enabled (0.039s)
        > exec -t 1m bash set_template_check_enforce.sh true
        [stdout]
        time="2024-11-14T11:18:35Z" level=info msg="Options loaded"
        # Onboarding. (161.052s)
        > eden eve onboard
        [stdout]
        time="2024-11-14T11:18:35Z" level=info msg="Adam waiting for EVE registration (0) of (20)"
        time="2024-11-14T11:18:55Z" level=info msg="Adam waiting for EVE registration (1) of (20)"
        time="2024-11-14T11:19:15Z" level=info msg="Adam waiting for EVE registration (2) of (20)"
        time="2024-11-14T11:19:35Z" level=info msg="Adam waiting for EVE registration (3) of (20)"
        time="2024-11-14T11:19:55Z" level=info msg="Adam waiting for EVE registration (4) of (20)"
        time="2024-11-14T11:20:15Z" level=info msg="Adam waiting for EVE registration (5) of (20)"
        time="2024-11-14T11:20:35Z" level=info msg="Adam waiting for EVE registration (6) of (20)"
        time="2024-11-14T11:20:55Z" level=info msg="Adam waiting for EVE registration (7) of (20)"
        time="2024-11-14T11:21:15Z" level=info msg="Device uuid: 65eb4fe1-5290-41e5-9c0e-3fc99484f1a5"
        config changed, to see config run 'eden controller edge-node get-config'
        time="2024-11-14T11:21:16Z" level=info msg=onboarded
        time="2024-11-14T11:21:16Z" level=info msg="device UUID: 65eb4fe1-5290-41e5-9c0e-3fc99484f1a5"
        > stdout 'onboarded'
        PASS

@OhmSpectator
Copy link
Member

@christoph-zededa, all green. Merging. But if you have any thoughts regarding the failure, would be nice to figure out the reason.

@OhmSpectator OhmSpectator merged commit 905baf7 into lf-edge:master Nov 14, 2024
45 checks passed
@OhmSpectator
Copy link
Member

@christoph-zededa, don't we want to backport it to 13.4-stable? As far as it's mentioned in the discussion regarding the 13.4 LTS testing.

@christoph-zededa
Copy link
Contributor Author

@christoph-zededa, don't we want to backport it to 13.4-stable? As far as it's mentioned in the discussion regarding the 13.4 LTS testing.

done: #4457

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants