Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Runtime] Flush L2 cache in time eval #15305

Merged
merged 1 commit into from
Jul 18, 2023
Merged

[Runtime] Flush L2 cache in time eval #15305

merged 1 commit into from
Jul 18, 2023

Conversation

spectrometerHBH
Copy link
Contributor

@spectrometerHBH spectrometerHBH commented Jul 12, 2023

This PR introduces an optional cache flush functionality to time_evaluator. It is implemented by allocating two large empty NDArrays on the device so that the L2 cache are flushed. This gives us more accurate evaluation on the performance of a runtime function.

@tvm-bot
Copy link
Collaborator

tvm-bot commented Jul 12, 2023

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

Generated by tvm-bot

src/runtime/profiling.cc Outdated Show resolved Hide resolved
@junrushao
Copy link
Member

Also CC: @yzh119

@yzh119
Copy link
Member

yzh119 commented Jul 13, 2023

I suppose we already have a l2_cache_flush_cuda flag in time_evalutor function: #13726

and you can reuse the l2_cache_flush_cuda function defined in l2_cache_flush.cc.

@spectrometerHBH
Copy link
Contributor Author

I suppose we already have a l2_cache_flush_cuda flag in time_evalutor function: #13726

and you can reuse the l2_cache_flush_cuda function defined in l2_cache_flush.cc.

I don't want it to be cuda only

@yzh119
Copy link
Member

yzh119 commented Jul 13, 2023

Yes the major concern is that L2 cache size is device specific, and later architectures may have L2 cache greater than 256mb

@tqchen
Copy link
Member

tqchen commented Jul 13, 2023

To make it generalized. how about we instead introduce a l2_cache_flush_bytes, which default to 0, and use that as a parameter to indicate what array to allocate. This way it would generalize across GPUs as long as we set this argument right

@junrushao
Copy link
Member

The implementation per se is not specific to L2 either. We could say it’s cache_flush_bytes

@yzh119
Copy link
Member

yzh119 commented Jul 13, 2023

cache_flush_bytes sounds good to me

Copy link
Member

@yzh119 yzh119 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@junrushao junrushao changed the title Flush L2 cache in time eval [Runtime] Flush L2 cache in time eval Jul 14, 2023
@MasterJH5574 MasterJH5574 changed the base branch from unity to main July 16, 2023 17:32
@tqchen
Copy link
Member

tqchen commented Jul 17, 2023

@tvm-bot rerun

This PR introduces an optional cache flush functionality to
`time_evaluator`. It is implemented by allocating two large empty
NDArrays on the device so that the L2 cache are flushed. This gives us
more accurate evaluation on the performance of a runtime function.
MasterJH5574 pushed a commit that referenced this pull request Jul 18, 2023
Followup of #15305 , this PR creates API to query device L2 cache size in bytes.
Currently, the API-supported devices includes CUDA, OpenCL, and ROCM.

Note that OpenCL's API does not return the accurate device L2 cache size.
I cannot find a Vulkan API that returns L2 texture cache size, but the `vkCmdPipelineBarrier` call will flush the L2 texture cache automatically(https://zeux.io/2020/02/27/writing-an-efficient-vulkan-renderer/), thus we return 0 by default.
@tqchen tqchen merged commit c0946e1 into apache:main Jul 18, 2023
junrushao pushed a commit to junrushao/tvm that referenced this pull request Jul 24, 2023
Followup of apache#15305 , this PR creates API to query device L2 cache size in bytes.
Currently, the API-supported devices includes CUDA, OpenCL, and ROCM.

Note that OpenCL's API does not return the accurate device L2 cache size.
I cannot find a Vulkan API that returns L2 texture cache size, but the `vkCmdPipelineBarrier` call will flush the L2 texture cache automatically(https://zeux.io/2020/02/27/writing-an-efficient-vulkan-renderer/), thus we return 0 by default.
junrushao pushed a commit to junrushao/tvm that referenced this pull request Jul 24, 2023
This PR introduces an optional cache flush functionality to
`time_evaluator`. It is implemented by allocating two large empty
NDArrays on the device so that the L2 cache are flushed. This gives us
more accurate evaluation on the performance of a runtime function.
junrushao pushed a commit to junrushao/tvm that referenced this pull request Jul 27, 2023
Followup of apache#15305 , this PR creates API to query device L2 cache size in bytes.
Currently, the API-supported devices includes CUDA, OpenCL, and ROCM.

Note that OpenCL's API does not return the accurate device L2 cache size.
I cannot find a Vulkan API that returns L2 texture cache size, but the `vkCmdPipelineBarrier` call will flush the L2 texture cache automatically(https://zeux.io/2020/02/27/writing-an-efficient-vulkan-renderer/), thus we return 0 by default.
junrushao pushed a commit to junrushao/tvm that referenced this pull request Jul 27, 2023
This PR introduces an optional cache flush functionality to
`time_evaluator`. It is implemented by allocating two large empty
NDArrays on the device so that the L2 cache are flushed. This gives us
more accurate evaluation on the performance of a runtime function.
junrushao pushed a commit to junrushao/tvm that referenced this pull request Jul 30, 2023
Followup of apache#15305 , this PR creates API to query device L2 cache size in bytes.
Currently, the API-supported devices includes CUDA, OpenCL, and ROCM.

Note that OpenCL's API does not return the accurate device L2 cache size.
I cannot find a Vulkan API that returns L2 texture cache size, but the `vkCmdPipelineBarrier` call will flush the L2 texture cache automatically(https://zeux.io/2020/02/27/writing-an-efficient-vulkan-renderer/), thus we return 0 by default.
junrushao pushed a commit to junrushao/tvm that referenced this pull request Jul 30, 2023
This PR introduces an optional cache flush functionality to
`time_evaluator`. It is implemented by allocating two large empty
NDArrays on the device so that the L2 cache are flushed. This gives us
more accurate evaluation on the performance of a runtime function.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants