get_max_memory() returns allocated memory for XPU instead of total device memory #2929

dvrogozh · 2024-07-12T01:12:26Z

Here:

accelerate/src/accelerate/utils/modeling.py

Line 843 in 12a007d

max_memory[i] = torch.xpu.max_memory_allocated(i)

XPU is queried for the max allocated memory while other devices, for example cuda, is queried for total free memory:

accelerate/src/accelerate/utils/modeling.py

Line 819 in 12a007d

max_memory[i] = torch.npu.mem_get_info(i)[0]

This seems a bug. However, I believe that mem_get_info() is not currently supported by XPU backend in pytorch (as of pytorch/pytorch@3477ee3) and needs to be requested.

I would also like to note that pytorch/pytorch#129919 will provide implementation for torch.xpu.max_memory_allocated(). For me on an idle device it returned 512 bytes which caused an issue running HF models with pipeline(device_map="auto") - model was dispatched to CPU instead of XPU with this printout (see huggingface/transformers#31922 for details):

/home/gta/git/huggingface/accelerate/src/accelerate/utils/modeling.py:1399: UserWarning: Current model requires 4096 bytes of buffer for offloaded layers, which seems does not fit any GPU's remaining memory. If you are experiencing a OOM later, please consider using offload_buffers=True.

CC: @gujinghui @EikanWang @fengyuan14 @guangyey @jgong5 @sywangyi @yao-matrix
CC: @muellerzr @SunMarc

The text was updated successfully, but these errors were encountered:

dvrogozh · 2024-07-12T01:27:10Z

However, I believe that mem_get_info() is not currently supported by XPU backend in pytorch and needs to be requested.

Filed request in pytorch/pytorch#130599

SunMarc · 2024-07-19T13:51:27Z

Indeed, thanks for the report ! Keep us updated when this fixed @dvrogozh ! cc @abhilash1910

abhilash1910 · 2024-07-19T14:46:25Z

Thanks @SunMarc for the ping. I believe that in the existence of XPU, it should trigger 0th device memory params , but I think
that it maybe due to this commit (this was seen before) : 30cb7ec
@faaany could you take a look on this?
I agree with @dvrogozh that mem_get_info() api is needed.

faaany · 2024-07-29T01:30:54Z

Hi @abhilash1910 , the issue mentioned by @dvrogozh is a known issue. And it is not related to my commit.

torch.xpu.mem_get_info API is available starting from PyTorch 2.6 (and in nightly 2.6.0.dev20241206+xpu or later). To work properly this method requires PyTorch built with the SYCL runtime which supports API to query device memory stats. If not available, exception will be raised. Requires: pytorch/pytorch#141230 Fixes: huggingface#2929 Fixes: huggingface/transformers#31922 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

dvrogozh · 2024-12-06T19:57:00Z

torch.xpu.mem_get_info() API has landed in PyTorch this week (thru pytorch/pytorch#141230) making it for PyTorch 2.6 upcoming release. Here is a corresponding fix on Accelerate side which addresses the issue:

Use torch.xpu.mem_get_info for XPU #3275

For IPEX API became available earlier and Accelerate was already adjusted to cover this case in 4b4c036.

torch.xpu.mem_get_info API is available starting from PyTorch 2.6 (and in nightly 2.6.0.dev20241206+xpu or later). To work properly this method requires PyTorch built with the SYCL runtime which supports API to query device memory stats. If not available, exception will be raised. Requires: pytorch/pytorch#141230 Fixes: huggingface#2929 Fixes: huggingface/transformers#31922 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

This was referenced Jul 12, 2024

xpu device is not used running pipeline(device_map="auto") huggingface/transformers#31922

Closed

xpu: implement torch.xpu.mem_get_info() to support huggingface auto dispatch modes pytorch/pytorch#130599

Closed

muellerzr added the wip Work in progress label Aug 21, 2024

dvrogozh mentioned this issue Dec 6, 2024

Use torch.xpu.mem_get_info for XPU #3275

Merged

SunMarc closed this as completed in #3275 Dec 24, 2024

SunMarc closed this as completed in d6d3e03 Dec 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

get_max_memory() returns allocated memory for XPU instead of total device memory #2929

get_max_memory() returns allocated memory for XPU instead of total device memory #2929

dvrogozh commented Jul 12, 2024 •

edited

Loading

dvrogozh commented Jul 12, 2024

SunMarc commented Jul 19, 2024

abhilash1910 commented Jul 19, 2024 •

edited

Loading

faaany commented Jul 29, 2024

dvrogozh commented Dec 6, 2024

get_max_memory() returns allocated memory for XPU instead of total device memory #2929

get_max_memory() returns allocated memory for XPU instead of total device memory #2929

Comments

dvrogozh commented Jul 12, 2024 • edited Loading

dvrogozh commented Jul 12, 2024

SunMarc commented Jul 19, 2024

abhilash1910 commented Jul 19, 2024 • edited Loading

faaany commented Jul 29, 2024

dvrogozh commented Dec 6, 2024

dvrogozh commented Jul 12, 2024 •

edited

Loading

abhilash1910 commented Jul 19, 2024 •

edited

Loading