Support for allocating all VFs from a single PF (bin packing) #255

sseetharaman6 · 2020-07-21T06:23:50Z

What would you like to be added?

If I have a multiple PFs configured for SRIOV and advertised as the same resource pool (sriov_foo) , is it possible to enforce allocation of all VFs from a single PF before VFs from other PFs are allocated? It seems like pluginapi.AllocateRequest is picking devicesIDs at random, so I am not sure if this is possible/ can be supported.

What is the use case for this feature / enhancement?

The text was updated successfully, but these errors were encountered:

zshi-redhat · 2020-07-21T06:42:03Z

@sseetharaman6 you're right that kubelet randomly chooses one healthy device from the advertised pool (sriov_foo), so if all VFs from PFs are grouped as one pool, then it's not guaranteed which PF the allocated VF is from. you might want to group the VFs from single PF as one pool and request device directly from that pool.

sseetharaman6 · 2020-07-22T20:39:40Z

Yea, but say I have 2 VFs per PF and request for 3 VFs in the pod spec , advertising each PF as its own resource will make this pod unschedulable.
In order to allocate all VFs from a PF before moving on to the next, DP has to support some kind of resource ordering or preferential allocate (can something like kubernetes/enhancements#1121 be used? )

zshi-redhat · 2020-07-22T23:42:14Z

Yea, but say I have 2 VFs per PF and request for 3 VFs in the pod spec , advertising each PF as its own resource will make this pod unschedulable.

In this case, you will need to put two resource requests in the pod spec, the first request 2 VF resource, the second request 1 VF resource. I understand this may not be exactly what you have asked for.

In order to allocate all VFs from a PF before moving on to the next, DP has to support some kind of resource ordering or preferential allocate (can something like kubernetes/enhancements#1121 be used? )

Thanks for linking the reference!
First of all, I think we should update the device plugin to support this new interface GetPreferredAllocation change.
Regarding how device plugin shall decide the preferred allocation, my understanding is that if may differ per use cases.
For example, sometime user may want to distribute the workloads to different PFs to balance the load on each interface.
in other cases like you mentioned, it may be preferred to consume all resource from single PF before using the next one.
It looks to me that we may not have a unified solution on how device plugin shall decide the preferred allocation.
but maybe it is possible to define several preferred allocating polices and let user to choose which one to apply when launching the device plugin.

RahulG115 · 2020-08-06T09:48:28Z

Facing same
+1

killianmuldoon · 2020-08-06T09:55:55Z

@zshi-redhat we should be able to implement this on a per-pool level with some device pools "packers" and others marked as "spreaders". Is there anything else the preferred allocation could be used for that might fit in - or be more relevant even?

zshi-redhat · 2020-08-06T12:30:51Z

@zshi-redhat we should be able to implement this on a per-pool level with some device pools "packers" and others marked as "spreaders". Is there anything else the preferred allocation could be used for that might fit in - or be more relevant even?

@killianmuldoon I think we could have two, as you already mentioned, one for allocating the VFs evenly across multiple PFs (in the same pool), the other for allocating all VFs from one PF until it's exhausted, then the next PF.

sseetharaman6 · 2020-08-10T18:53:46Z

@zshi-redhat - this approach makes sense to me. is there work underway to add interface for GetPreferredAllocation ?

martinkennelly · 2020-08-11T13:06:19Z

@zshi-redhat - this approach makes sense to me. is there work underway to add interface for GetPreferredAllocation ?

I do not think anyone is working on this. It will be discussed at the next meeting of network and resource mgnt.

zshi-redhat · 2020-08-19T08:29:55Z

@zshi-redhat - this approach makes sense to me. is there work underway to add interface for GetPreferredAllocation ?

I do not think anyone is working on this. It will be discussed at the next meeting of network and resource mgnt.

Update: this was discussed on Monday meeting, we agreed to support this new API update in sriov device plugin. However, this is not currently assigned to anyone, please feel free to take it if you have interest working on this.

zshi-redhat · 2020-09-10T01:27:56Z

@sseetharaman6 FYI, this feature is added via PR #267 if you'd like to do some testing or have any suggestions.

qingshanyinyin · 2021-09-02T00:42:42Z

First scenario: I have two PFs (PF-A、PF-B)，and i define two resources (R-A、R-B). Then, I create the pod resquestes two resources (R-A:1,R-B:1).
Second scenario: I have two PFs (PF-A、PF-B)，and i define one resources (R). Then, I create the pod resquestes the resource(R:2), and the kubelet allocate the two VFs in the single PF-A or PF-B.
I wonder to know whether there is some difference between this two scenarioes for pod network. For example, which one is best for deep Learning (tensorflow、pytorch and so on ).
Thanks!

adrianchiris · 2021-09-02T08:37:44Z

I wonder to know whether there is some difference between this two scenarioes for pod network.

if you need to have two additional network interfaces for the Pod, configured by a supporting CNI plugin then IIRC only the second scenario will work.

if you just want to have two VFs allocated to the pod (and no CNI conifg required) then sending traffic from different PFs (different uplinks) would probably be faster.

there is also another consideration which affects performance, the NUMA alignment for memory, CPU and PCI.
in this case you would want all to be aligned.

martinkennelly · 2021-09-02T10:13:56Z

if you need to have two additional network interfaces for the Pod, configured by a supporting CNI plugin then IIRC only the second scenario will work.

For first scenario, couldn't you just define two NADs (net-a, net-b) with associated DP selectors (PFName) each selecting an individual PF? Then in your network request annotation put in net-a and net-b. You get VF from each PF then. What am I missing?

adrianchiris · 2021-09-02T10:39:51Z

if you need to have two additional network interfaces for the Pod, configured by a supporting CNI plugin then IIRC only the second scenario will work.

correction, i meant first scenario. having two network-attachment-definition each associated with a different resource will work.
having both network-atttachment-definition associated with the same resource (i think) will not work.

since multus would need to provide each attachment with a different DeviceID from same resource on CmdAdd call.
(i.e pass to delegate CNI first device ID on first call and second device ID for second call )

qingshanyinyin · 2021-09-09T00:25:55Z

I have sovle it First scenario! Thanks! @adrianchiris @martinkennelly
Now i need to do annother task!
I will define only one resource for different PFs(8 or more), and i will make kubelet to allocate VFs from each PFs. For example:
request, sriov-resource: 1
allocate, VFs:8(if there are 8 PFs in the node, and 8VFs is from different PFs!)
I wonder to know whether this will be ok? if i do not edit the multus and only edit the sriov-device-plugin.

zshi-redhat mentioned this issue Sep 8, 2020

API preferred allocation: GetPreferredAllocation #267

Closed

wattmto linked a pull request Aug 30, 2022 that will close this issue

Support GetPreferredAllocation #443

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for allocating all VFs from a single PF (bin packing) #255

Support for allocating all VFs from a single PF (bin packing) #255

sseetharaman6 commented Jul 21, 2020

zshi-redhat commented Jul 21, 2020

sseetharaman6 commented Jul 22, 2020

zshi-redhat commented Jul 22, 2020 •

edited

Loading

RahulG115 commented Aug 6, 2020

killianmuldoon commented Aug 6, 2020

zshi-redhat commented Aug 6, 2020

sseetharaman6 commented Aug 10, 2020

martinkennelly commented Aug 11, 2020 •

edited

Loading

zshi-redhat commented Aug 19, 2020

zshi-redhat commented Sep 10, 2020

qingshanyinyin commented Sep 2, 2021

adrianchiris commented Sep 2, 2021

martinkennelly commented Sep 2, 2021

adrianchiris commented Sep 2, 2021 •

edited

Loading

qingshanyinyin commented Sep 9, 2021 •

edited

Loading

Support for allocating all VFs from a single PF (bin packing) #255

Support for allocating all VFs from a single PF (bin packing) #255

Comments

sseetharaman6 commented Jul 21, 2020

What would you like to be added?

What is the use case for this feature / enhancement?

zshi-redhat commented Jul 21, 2020

sseetharaman6 commented Jul 22, 2020

zshi-redhat commented Jul 22, 2020 • edited Loading

RahulG115 commented Aug 6, 2020

killianmuldoon commented Aug 6, 2020

zshi-redhat commented Aug 6, 2020

sseetharaman6 commented Aug 10, 2020

martinkennelly commented Aug 11, 2020 • edited Loading

zshi-redhat commented Aug 19, 2020

zshi-redhat commented Sep 10, 2020

qingshanyinyin commented Sep 2, 2021

adrianchiris commented Sep 2, 2021

martinkennelly commented Sep 2, 2021

adrianchiris commented Sep 2, 2021 • edited Loading

qingshanyinyin commented Sep 9, 2021 • edited Loading

zshi-redhat commented Jul 22, 2020 •

edited

Loading

martinkennelly commented Aug 11, 2020 •

edited

Loading

adrianchiris commented Sep 2, 2021 •

edited

Loading

qingshanyinyin commented Sep 9, 2021 •

edited

Loading