-
Notifications
You must be signed in to change notification settings - Fork 657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Vulkan] Unexpected creation of buffer larger than 4GB failing at runtime. #13196
Comments
@antiagainst Please help to assign priority here. |
@antiagainst / @benvanik any thoughts on how we should handle this ? We run into this when we go from Stable Diffusion 512x512 to SD 768 base model since the weights are larger (@pashu123 ?). Running the base SD 512x512 model at a 768x768 resolution works ok. We will soon have to support 1024x1024 models (https://stable-diffusion-art.com/sdxl-beta/) so any guidance appreciated on this issue. |
This is actually a different issue than the one we discussed internally about 4GB storage buffer allocation limit. This reads like we are allocating a larger-than-allowed device local + host visible buffer, with a 257,949,696 byte limit. I recall some previous generation NVIDIA card had such 256MB limit. For context, note that 4GB is a specification limit on how large a storage buffer can go. Although For storage buffer in this particular case, I checked the IR generated at stream level. |
#splitResourceConstantsConfig = #stream.resource_config<{
max_allocation_size = 16,
min_buffer_offset_alignment = 16,
max_buffer_range = 1073741824,
min_buffer_range_alignment = 16,
index_bits = 32
}> you can set this as a compiler flag: in this case we shouldn't be allocating either transients or constants as host-visible - that sounds like a bug if we are - only staging buffers and external buffers should be host visible (today) |
@benvanik: IIUC |
(also, would be good to look into the model - needing a 3.3gb transient tensor is weird unless this is training) |
ah yeah, it's mostly used for constants today - doing it for allocations is harder as they're dynamic - I think the imminent fix here is to make sure this memory is not host-visible (it shouldn't be) |
Yeah. There are actually two issues mixed together. This particular issue has title about 4GB limit but the validation error was not for that. We were discussing another issue internally that is about 4GB limit with the following validation error:
So it's confusing here. |
For the validation error originally reported in this issue, I cannot find allocations with a size of |
Sorry for the confusion - I also see the same error on A100 |
Since I was running the above problem on RTX 3090 and hence they are giving different validation errors. |
@antiagainst Let me know if you need more info. |
Can we use https://gpuopen-librariesandsdks.github.io/VulkanMemoryAllocator/html/enabling_buffer_device_address.html ? There were some references to it from KhronosGroup/Vulkan-Docs#1016 |
What happened?
[VULKAN] ! Validation Error: [ VUID-vkAllocateMemory-pAllocateInfo-01713 ] Object 0: handle = 0x55ba3e384b30, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0xe9a2b96f | vkAllocateMemory: attempting to allocate 1698693120 bytes from heap 2,but size of that heap is only 257949696 bytes. The Vulkan spec states: pAllocateInfo->allocationSize must be less than or equal to VkPhysicalDeviceMemoryProperties::memoryHeaps[memindex].size where memindex = VkPhysicalDeviceMemoryProperties::memoryTypes[pAllocateInfo->memoryTypeIndex].heapIndex as returned by vkGetPhysicalDeviceMemoryProperties for the VkPhysicalDevice that device was created from (https://vulkan.lunarg.com/doc/view/1.3.239.0/linux/1.3-extensions/vkspec.html#VUID-vkAllocateMemory-pAllocateInfo-01713)
Steps to reproduce your issue
Model IR: https://storage.googleapis.com/shark-public/prashant/unet_upcast/unet.mlir
Compile command:
iree-compile --iree-input-type=none --iree-hal-target-backends=vulkan -iree-vulkan-target-triple=ampere-rtx3090-linux --iree-stream-resource-index-bits=64 --iree-vm-target-index-bits=64 --iree-preprocessing-pass-pipeline='builtin.module(func.func(iree-flow-detach-elementwise-from-named-ops,iree-flow-convert-1x1-filter-conv2d-to-matmul,iree-preprocessing-convert-conv2d-to-img2col,iree-preprocessing-pad-linalg-ops{pad-size=32}))' unet_check.mlir -o out.vmfb
Run command:
iree-run-module --device=vulkan --function=forward --input=2x4x96x96xf16=0.5 --input=1xf16=1.0 --input=2x77x1024xf16=0.5 --module=out.vmfb --vulkan_debug_utils=true --vulkan_debug_verbosity=4 --vulkan_validation_layers=true
What component(s) does this issue relate to?
Runtime
Version information
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: