-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[vulkan] Fix command serial ordering & uses less queue submits #3818
Conversation
✔️ Deploy Preview for jovial-fermat-aa59dc ready! 🔨 Explore the source changes: 63e32c1 🔍 Inspect the deploy log: https://app.netlify.com/sites/jovial-fermat-aa59dc/deploys/61befb642f9d830007d65f52 😎 Browse the preview: https://deploy-preview-3818--jovial-fermat-aa59dc.netlify.app |
taichi/backends/vulkan/runtime.cpp
Outdated
if (ti_kernel->get_ctx_buffer_size()) { | ||
ctx_buffer = device_->allocate_memory_unique( | ||
{ti_kernel->get_ctx_buffer_size(), | ||
/*host_write=*/true, /*host_read=*/false, | ||
/*export_sharing=*/false, AllocUsage::Storage}); | ||
ctx_buffer_host = device_->allocate_memory_unique( | ||
{ti_kernel->get_ctx_buffer_size(), | ||
/*host_write=*/false, /*host_read=*/true, | ||
/*export_sharing=*/false, AllocUsage::Storage}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r we creating new buffers every time we launch a kernel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. that can't be good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is fine, it is actually faster than before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be that as it may, we should still at least try to reuse some of these buffers. It is fast now because VMA is doing a superb job, but what if we want to unify the runtime code for different backends?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also for DX11 and OpenGL this is going to be faster than a ring buffer because the drivers know for sure there isn't overlap and is free to use them async.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in the end only maybe Vulkan / DX12 / and Metal will benefit from a ring buffer, and all of these requires a robust memory allocator anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the overhead of allocating buffers on DX and GL? Are we confident that the performance gain from having things async outweighs these overheads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's buffer reuse it should be implemented in the allocator instead of the runtime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do the naive way for now, we can figure out a better way later
/format |
taichi/backends/vulkan/runtime.cpp
Outdated
} | ||
|
||
void CompiledTaichiKernel::command_list(CommandList *cmdlist) const { | ||
void CompiledTaichiKernel::command_list( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, what does the function name command_list
mean here, run_command_list
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means record command list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can we just rename it to record_command_list
then :-) ?
std::unique_ptr<DeviceAllocationGuard> ctx_buffer_host, ctx_buffer; | ||
|
||
if (ti_kernel->get_ctx_buffer_size()) { | ||
ctx_buffer = device_->allocate_memory_unique( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this result in memory fragmentation? Or the VMA should take care of it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would to some extent, but VMA is generally pretty good, let's do the naive one for now until we kick ext arr out of context buffer and add a ring of buffer (not ring buffer but ring of buffer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SG
Co-authored-by: Ye Kuang <k-ye@users.noreply.github.com>
/format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
taichi/backends/vulkan/runtime.cpp
Outdated
} | ||
|
||
void CompiledTaichiKernel::command_list(CommandList *cmdlist) const { | ||
void CompiledTaichiKernel::command_list( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can we just rename it to record_command_list
then :-) ?
std::unique_ptr<DeviceAllocationGuard> ctx_buffer_host, ctx_buffer; | ||
|
||
if (ti_kernel->get_ctx_buffer_size()) { | ||
ctx_buffer = device_->allocate_memory_unique( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SG
/format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Thanks!
Fixes #3791