Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vulkan] Fix command serial ordering & uses less queue submits #3818

Merged
merged 7 commits into from
Dec 20, 2021

Conversation

bobcao3
Copy link
Collaborator

@bobcao3 bobcao3 commented Dec 16, 2021

Fixes #3791

@bobcao3 bobcao3 requested a review from k-ye December 16, 2021 20:33
@netlify
Copy link

netlify bot commented Dec 16, 2021

✔️ Deploy Preview for jovial-fermat-aa59dc ready!

🔨 Explore the source changes: 63e32c1

🔍 Inspect the deploy log: https://app.netlify.com/sites/jovial-fermat-aa59dc/deploys/61befb642f9d830007d65f52

😎 Browse the preview: https://deploy-preview-3818--jovial-fermat-aa59dc.netlify.app

Comment on lines 413 to 422
if (ti_kernel->get_ctx_buffer_size()) {
ctx_buffer = device_->allocate_memory_unique(
{ti_kernel->get_ctx_buffer_size(),
/*host_write=*/true, /*host_read=*/false,
/*export_sharing=*/false, AllocUsage::Storage});
ctx_buffer_host = device_->allocate_memory_unique(
{ti_kernel->get_ctx_buffer_size(),
/*host_write=*/false, /*host_read=*/true,
/*export_sharing=*/false, AllocUsage::Storage});
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r we creating new buffers every time we launch a kernel?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.. that can't be good

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is fine, it is actually faster than before

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

be that as it may, we should still at least try to reuse some of these buffers. It is fast now because VMA is doing a superb job, but what if we want to unify the runtime code for different backends?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also for DX11 and OpenGL this is going to be faster than a ring buffer because the drivers know for sure there isn't overlap and is free to use them async.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in the end only maybe Vulkan / DX12 / and Metal will benefit from a ring buffer, and all of these requires a robust memory allocator anyways.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the overhead of allocating buffers on DX and GL? Are we confident that the performance gain from having things async outweighs these overheads?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's buffer reuse it should be implemented in the allocator instead of the runtime

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do the naive way for now, we can figure out a better way later

@bobcao3
Copy link
Collaborator Author

bobcao3 commented Dec 16, 2021

/format

}

void CompiledTaichiKernel::command_list(CommandList *cmdlist) const {
void CompiledTaichiKernel::command_list(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, what does the function name command_list mean here, run_command_list?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means record command list

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we just rename it to record_command_list then :-) ?

taichi/backends/vulkan/runtime.cpp Outdated Show resolved Hide resolved
std::unique_ptr<DeviceAllocationGuard> ctx_buffer_host, ctx_buffer;

if (ti_kernel->get_ctx_buffer_size()) {
ctx_buffer = device_->allocate_memory_unique(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this result in memory fragmentation? Or the VMA should take care of it?

Copy link
Collaborator Author

@bobcao3 bobcao3 Dec 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would to some extent, but VMA is generally pretty good, let's do the naive one for now until we kick ext arr out of context buffer and add a ring of buffer (not ring buffer but ring of buffer)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG

Co-authored-by: Ye Kuang <k-ye@users.noreply.github.com>
@bobcao3 bobcao3 requested a review from k-ye December 18, 2021 07:04
@bobcao3
Copy link
Collaborator Author

bobcao3 commented Dec 18, 2021

/format

Copy link
Member

@k-ye k-ye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

}

void CompiledTaichiKernel::command_list(CommandList *cmdlist) const {
void CompiledTaichiKernel::command_list(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we just rename it to record_command_list then :-) ?

std::unique_ptr<DeviceAllocationGuard> ctx_buffer_host, ctx_buffer;

if (ti_kernel->get_ctx_buffer_size()) {
ctx_buffer = device_->allocate_memory_unique(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG

@bobcao3
Copy link
Collaborator Author

bobcao3 commented Dec 19, 2021

/format

Copy link
Contributor

@ailzhang ailzhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Thanks!

@strongoier strongoier merged commit f7d526f into taichi-dev:master Dec 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Vulkan] ti.sync() changes program behavior on Vulkan
6 participants