Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Vulkan] Unexpected creation of buffer larger than 4GB failing at runtime. #13196

Open
pashu123 opened this issue Apr 20, 2023 · 13 comments
Open
Assignees
Labels
bug 🐞 Something isn't working codegen/spirv SPIR-V code generation compiler backend hal/vulkan Runtime Vulkan GPU HAL backend

Comments

@pashu123
Copy link
Contributor

What happened?

[VULKAN] ! Validation Error: [ VUID-vkAllocateMemory-pAllocateInfo-01713 ] Object 0: handle = 0x55ba3e384b30, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0xe9a2b96f | vkAllocateMemory: attempting to allocate 1698693120 bytes from heap 2,but size of that heap is only 257949696 bytes. The Vulkan spec states: pAllocateInfo->allocationSize must be less than or equal to VkPhysicalDeviceMemoryProperties::memoryHeaps[memindex].size where memindex = VkPhysicalDeviceMemoryProperties::memoryTypes[pAllocateInfo->memoryTypeIndex].heapIndex as returned by vkGetPhysicalDeviceMemoryProperties for the VkPhysicalDevice that device was created from (https://vulkan.lunarg.com/doc/view/1.3.239.0/linux/1.3-extensions/vkspec.html#VUID-vkAllocateMemory-pAllocateInfo-01713)

Steps to reproduce your issue

Model IR: https://storage.googleapis.com/shark-public/prashant/unet_upcast/unet.mlir

Compile command:
iree-compile --iree-input-type=none --iree-hal-target-backends=vulkan -iree-vulkan-target-triple=ampere-rtx3090-linux --iree-stream-resource-index-bits=64 --iree-vm-target-index-bits=64 --iree-preprocessing-pass-pipeline='builtin.module(func.func(iree-flow-detach-elementwise-from-named-ops,iree-flow-convert-1x1-filter-conv2d-to-matmul,iree-preprocessing-convert-conv2d-to-img2col,iree-preprocessing-pad-linalg-ops{pad-size=32}))' unet_check.mlir -o out.vmfb

Run command:
iree-run-module --device=vulkan --function=forward --input=2x4x96x96xf16=0.5 --input=1xf16=1.0 --input=2x77x1024xf16=0.5 --module=out.vmfb --vulkan_debug_utils=true --vulkan_debug_verbosity=4 --vulkan_validation_layers=true

What component(s) does this issue relate to?

Runtime

Version information

No response

Additional context

No response

@pashu123 pashu123 added the bug 🐞 Something isn't working label Apr 20, 2023
@ScottTodd ScottTodd added hal/vulkan Runtime Vulkan GPU HAL backend codegen/spirv SPIR-V code generation compiler backend labels Apr 20, 2023
@powderluv powderluv added this to the Collab: Nod.ai milestone Apr 20, 2023
@allieculp
Copy link

@antiagainst Please help to assign priority here.

@allieculp allieculp moved this from Inbox to Needs Scheduling in (Deprecated) IREE Apr 20, 2023
@powderluv
Copy link
Collaborator

@antiagainst / @benvanik any thoughts on how we should handle this ? We run into this when we go from Stable Diffusion 512x512 to SD 768 base model since the weights are larger (@pashu123 ?). Running the base SD 512x512 model at a 768x768 resolution works ok.

We will soon have to support 1024x1024 models (https://stable-diffusion-art.com/sdxl-beta/) so any guidance appreciated on this issue.

@antiagainst
Copy link
Contributor

This is actually a different issue than the one we discussed internally about 4GB storage buffer allocation limit. This reads like we are allocating a larger-than-allowed device local + host visible buffer, with a 257,949,696 byte limit. I recall some previous generation NVIDIA card had such 256MB limit.

For context, note that 4GB is a specification limit on how large a storage buffer can go. Although VkMemoryAllocateInfo uses VkDeviceSize (uint64_t) for allocationSize, To specify storage buffer descriptors, we need VkDescriptorBufferInfo, whose range field is required to be less than maxStorageBufferRange. maxStorageBufferRange is inside VkPhysicalDeviceLimits, and it has a type of uint32_t. That caps it as 4GB. To really have allocations larger than 4GB, we'd need to push for spec change.

For storage buffer in this particular case, I checked the IR generated at stream level. ScheduleAllocation packs all transient buffers into one allocation (https://github.com/openxla/iree/blob/a88bfe9167da4832725f2efc26efbabc75138588/compiler/src/iree/compiler/Dialect/Stream/Transforms/ScheduleAllocation.cpp#L1007), causing us to see a large 6,926,017,728 bytes buffer: https://gist.github.com/antiagainst/07e3bffc314ace011f9175fc3182dab6. Sorting the transient buffers packed together we an see the largest one was 3,397,386,240 bytes, so that's less than 4GB threshold. (Albeit not far away too.) So for this case, we should still be good w.r.t. transient storage buffers, given that I'd assume slices used for descriptors are still within 4GB range.

@benvanik
Copy link
Collaborator

benvanik commented May 3, 2023

#stream.resource_config can be used to control the packing
e.g.

#splitResourceConstantsConfig = #stream.resource_config<{
  max_allocation_size = 16,
  min_buffer_offset_alignment = 16,
  max_buffer_range = 1073741824,
  min_buffer_range_alignment = 16,
  index_bits = 32
}>

you can set this as a compiler flag: --iree-stream-resource-max-allocation-size=

in this case we shouldn't be allocating either transients or constants as host-visible - that sounds like a bug if we are - only staging buffers and external buffers should be host visible (today)

@antiagainst
Copy link
Contributor

@benvanik: IIUC #stream.resource_config only controls PackAllocations, but not ScheduleAllocations where the transient buffers are initially packed together? We may need to connect it to ScheduleAllocations too.

@benvanik
Copy link
Collaborator

benvanik commented May 3, 2023

(also, would be good to look into the model - needing a 3.3gb transient tensor is weird unless this is training)

@benvanik
Copy link
Collaborator

benvanik commented May 3, 2023

ah yeah, it's mostly used for constants today - doing it for allocations is harder as they're dynamic - I think the imminent fix here is to make sure this memory is not host-visible (it shouldn't be)

@benvanik benvanik changed the title [VULKAN] Creation of buffer larger than 4GB. [Vulkan] Unexpected creation of buffer larger than 4GB failing at runtime. May 3, 2023
@antiagainst
Copy link
Contributor

Yeah. There are actually two issues mixed together. This particular issue has title about 4GB limit but the validation error was not for that. We were discussing another issue internally that is about 4GB limit with the following validation error:

[VULKAN] ! Validation Error: [ VUID-VkWriteDescriptorSet-descriptorType-00333 ] Object 0: handle = 0x980f360000000011, type = VK_OBJECT_TYPE_DESCRIPTOR_SET_LAYOUT; | MessageID = 0xf2fc081c | vkCmdPushDescriptorSetKHR() VkWriteDescriptorSet[1] failed update validation: Write update to Push Descriptors defined with VkDescriptorSetLayout 0x980f360000000011[] binding #1 failed with error message: Attempted write update to buffer descriptor failed due to: For buffer VkBuffer 0x8f226e000000025f[] VkDescriptorBufferInfo range is 6926017728 which is greater than this device's maxStorageBufferRange (4294967295). The Vulkan spec states: If descriptorType is VK_DESCRIPTOR_TYPE_STORAGE_BUFFER or VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC, the range member of each element of pBufferInfo, or the effective range if range is VK_WHOLE_SIZE, must be less than or equal to VkPhysicalDeviceLimits::maxStorageBufferRange (https://vulkan.lunarg.com/doc/view/1.3.243.0/linux/1.3-extensions/vkspec.html#VUID-VkWriteDescriptorSet-descriptorType-00333)

So it's confusing here.

@antiagainst
Copy link
Contributor

For the validation error originally reported in this issue, I cannot find allocations with a size of 1698693120 when --compile-to=hal. So I'm not sure this is still an issue. @pashu123 please double check and see whether that's still relevant.

@pashu123
Copy link
Contributor Author

Sorry for the confusion - I also see the same error on A100 [VULKAN] ! Validation Error: [ VUID-VkWriteDescriptorSet-descriptorType-00333 ] Object 0: handle = 0x967dd1000000000e, type = VK_OBJECT_TYPE_DESCRIPTOR_SET_LAYOUT; | MessageID = 0xf2fc081c | vkCmdPushDescriptorSetKHR() VkWriteDescriptorSet[1] failed update validation: Write update to Push Descriptors defined with VkDescriptorSetLayout 0x967dd1000000000e[] binding #1 failed with error message: Attempted write update to buffer descriptor failed due to: For buffer VkBuffer 0x891e2c0000000284[] VkDescriptorBufferInfo range is 6937997888 which is greater than this device's maxStorageBufferRange (4294967295). The Vulkan spec states: If descriptorType is VK_DESCRIPTOR_TYPE_STORAGE_BUFFER or VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC, the range member of each element of pBufferInfo, or the effective range if range is VK_WHOLE_SIZE, must be less than or equal to VkPhysicalDeviceLimits::maxStorageBufferRange (https://vulkan.lunarg.com/doc/view/1.3.243.0/linux/1.3-extensions/vkspec.html#VUID-VkWriteDescriptorSet-descriptorType-00333)

@pashu123
Copy link
Contributor Author

Since I was running the above problem on RTX 3090 and hence they are giving different validation errors.

@pashu123
Copy link
Contributor Author

@antiagainst Let me know if you need more info.

@powderluv
Copy link
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working codegen/spirv SPIR-V code generation compiler backend hal/vulkan Runtime Vulkan GPU HAL backend
Projects
None yet
Development

No branches or pull requests

6 participants