[FEA] Trigger host memory spilling when more host memory is needed #8881

revans2 · 2023-07-31T18:38:44Z

Is your feature request related to a problem? Please describe.
The HostMemoryStore has ways to spill memory to disk. But right now it only happens when spilling from the GPU needs more memory to complete a spill. The goal here is to expose a API so that the new APIs from #8879 can call into it when memory is needed.

At the same time we need to tie the two APIs together. This gets a little complicated though. The spill storage would then need to decide if it should spill pinned or pageable memory to satisfy the request. We should favor spilling pageable memory over pinned unless we have no other choice. In fact spilling pinned memory can be a follow on issue if we need to.

The host memory spill storage would be updated to use the new allocation APIs, and the new allocation APIs would be updated to call into the host spill storage spill API if an allocation would would violate a limit, but there is enough spillable memory to cover it. Ideally reservations will not trigger a spill, but because memory can be made to be not spillable dynamically it might be simplest to just spill as much as is needed for the reservation when it is allocated. If we do this we should have a follow on issue to understand what it would take to not do this. In order to avoid any deadlocks we will have the spill storage use the maxPriority flag when doing host memory allocations.

In addition to this if the spill storage code is informed that an allocation is too large to ever fit in the pool, then instead of allocating it anyways, which happens today, we will need to spill the data to disk from GPU memory using bounce buffers (possibly on heap buffers) and bypass a single large CPU allocation all together.

We also should update the code that gets data out of the disk spill storage and puts it on the GPU. It should use the new allocation APIs and if an allocation would never work it will also need to move the data in a guaranteed to work way back to the GPU. We still need this code to operate at super-priority levels.

Ideally this should expose callbacks to the allocation APIs so as more memory is made spillable blocked threads can be woken up. If this needs to wait for #8882 a follow on issue needs to be filed to so we don't drop this.

revans2 added ? - Needs Triage Need team to review and classify task Work required that improves the product but is not user facing reliability Features to improve reliability or bugs that severly impact the reliability of the plugin labels Jul 31, 2023

revans2 mentioned this issue Jul 31, 2023

[FEA] Limit Host Memory Usage #8874

Open

30 tasks

mattahrens removed the ? - Needs Triage Need team to review and classify label Aug 8, 2023

mattahrens assigned abellina and revans2 Sep 1, 2023

This was referenced Sep 5, 2023

Expose host store spill #9189

Merged

Allow skipping host spill for a direct device->disk spill #9211

Merged

revans2 mentioned this issue Sep 19, 2023

Have host spill use the new HostAlloc API #9257

Merged

revans2 closed this as completed in #9257 Sep 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Trigger host memory spilling when more host memory is needed #8881

[FEA] Trigger host memory spilling when more host memory is needed #8881

revans2 commented Jul 31, 2023 •

edited

Loading

[FEA] Trigger host memory spilling when more host memory is needed #8881

[FEA] Trigger host memory spilling when more host memory is needed #8881

Comments

revans2 commented Jul 31, 2023 • edited Loading

revans2 commented Jul 31, 2023 •

edited

Loading