-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
get_max_memory() returns allocated memory for XPU instead of total device memory #2929
Comments
Filed request in pytorch/pytorch#130599 |
Indeed, thanks for the report ! Keep us updated when this fixed @dvrogozh ! cc @abhilash1910 |
Hi @abhilash1910 , the issue mentioned by @dvrogozh is a known issue. And it is not related to my commit. |
torch.xpu.mem_get_info API is available starting from PyTorch 2.6 (and in nightly 2.6.0.dev20241206+xpu or later). To work properly this method requires PyTorch built with the SYCL runtime which supports API to query device memory stats. If not available, exception will be raised. Requires: pytorch/pytorch#141230 Fixes: huggingface#2929 Fixes: huggingface/transformers#31922 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
torch.xpu.mem_get_info API is available starting from PyTorch 2.6 (and in nightly 2.6.0.dev20241206+xpu or later). To work properly this method requires PyTorch built with the SYCL runtime which supports API to query device memory stats. If not available, exception will be raised. Requires: pytorch/pytorch#141230 Fixes: huggingface#2929 Fixes: huggingface/transformers#31922 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
For IPEX API became available earlier and Accelerate was already adjusted to cover this case in 4b4c036. |
torch.xpu.mem_get_info API is available starting from PyTorch 2.6 (and in nightly 2.6.0.dev20241206+xpu or later). To work properly this method requires PyTorch built with the SYCL runtime which supports API to query device memory stats. If not available, exception will be raised. Requires: pytorch/pytorch#141230 Fixes: huggingface#2929 Fixes: huggingface/transformers#31922 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Here:
accelerate/src/accelerate/utils/modeling.py
Line 843 in 12a007d
XPU is queried for the max allocated memory while other devices, for example cuda, is queried for total free memory:
accelerate/src/accelerate/utils/modeling.py
Line 819 in 12a007d
This seems a bug. However, I believe that
mem_get_info()
is not currently supported by XPU backend in pytorch (as of pytorch/pytorch@3477ee3) and needs to be requested.I would also like to note that pytorch/pytorch#129919 will provide implementation for
torch.xpu.max_memory_allocated()
. For me on an idle device it returned 512 bytes which caused an issue running HF models withpipeline(device_map="auto")
- model was dispatched to CPU instead of XPU with this printout (see huggingface/transformers#31922 for details):CC: @gujinghui @EikanWang @fengyuan14 @guangyey @jgong5 @sywangyi @yao-matrix
CC: @muellerzr @SunMarc
The text was updated successfully, but these errors were encountered: