[Runtime] Flush L2 cache in time eval #15305

spectrometerHBH · 2023-07-12T23:18:38Z

This PR introduces an optional cache flush functionality to time_evaluator. It is implemented by allocating two large empty NDArrays on the device so that the L2 cache are flushed. This gives us more accurate evaluation on the performance of a runtime function.

tvm-bot · 2023-07-12T23:18:42Z

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

cc @areusch _{See #10317 for details}

_{Generated by tvm-bot}

src/runtime/profiling.cc

junrushao · 2023-07-13T00:03:41Z

Also CC: @yzh119

yzh119 · 2023-07-13T05:09:01Z

I suppose we already have a l2_cache_flush_cuda flag in time_evalutor function: #13726

and you can reuse the l2_cache_flush_cuda function defined in l2_cache_flush.cc.

spectrometerHBH · 2023-07-13T05:37:53Z

I suppose we already have a l2_cache_flush_cuda flag in time_evalutor function: #13726

and you can reuse the l2_cache_flush_cuda function defined in l2_cache_flush.cc.

I don't want it to be cuda only

yzh119 · 2023-07-13T06:24:53Z

Yes the major concern is that L2 cache size is device specific, and later architectures may have L2 cache greater than 256mb

tqchen · 2023-07-13T14:24:54Z

To make it generalized. how about we instead introduce a l2_cache_flush_bytes, which default to 0, and use that as a parameter to indicate what array to allocate. This way it would generalize across GPUs as long as we set this argument right

junrushao · 2023-07-13T20:17:01Z

The implementation per se is not specific to L2 either. We could say it’s cache_flush_bytes

yzh119 · 2023-07-13T21:22:04Z

cache_flush_bytes sounds good to me

yzh119

LGTM

tqchen · 2023-07-17T16:54:44Z

@tvm-bot rerun

This PR introduces an optional cache flush functionality to `time_evaluator`. It is implemented by allocating two large empty NDArrays on the device so that the L2 cache are flushed. This gives us more accurate evaluation on the performance of a runtime function.

Followup of #15305 , this PR creates API to query device L2 cache size in bytes. Currently, the API-supported devices includes CUDA, OpenCL, and ROCM. Note that OpenCL's API does not return the accurate device L2 cache size. I cannot find a Vulkan API that returns L2 texture cache size, but the `vkCmdPipelineBarrier` call will flush the L2 texture cache automatically(https://zeux.io/2020/02/27/writing-an-efficient-vulkan-renderer/), thus we return 0 by default.

Followup of apache#15305 , this PR creates API to query device L2 cache size in bytes. Currently, the API-supported devices includes CUDA, OpenCL, and ROCM. Note that OpenCL's API does not return the accurate device L2 cache size. I cannot find a Vulkan API that returns L2 texture cache size, but the `vkCmdPipelineBarrier` call will flush the L2 texture cache automatically(https://zeux.io/2020/02/27/writing-an-efficient-vulkan-renderer/), thus we return 0 by default.

This PR introduces an optional cache flush functionality to `time_evaluator`. It is implemented by allocating two large empty NDArrays on the device so that the L2 cache are flushed. This gives us more accurate evaluation on the performance of a runtime function.

Followup of apache#15305 , this PR creates API to query device L2 cache size in bytes. Currently, the API-supported devices includes CUDA, OpenCL, and ROCM. Note that OpenCL's API does not return the accurate device L2 cache size. I cannot find a Vulkan API that returns L2 texture cache size, but the `vkCmdPipelineBarrier` call will flush the L2 texture cache automatically(https://zeux.io/2020/02/27/writing-an-efficient-vulkan-renderer/), thus we return 0 by default.

This PR introduces an optional cache flush functionality to `time_evaluator`. It is implemented by allocating two large empty NDArrays on the device so that the L2 cache are flushed. This gives us more accurate evaluation on the performance of a runtime function.

Followup of apache#15305 , this PR creates API to query device L2 cache size in bytes. Currently, the API-supported devices includes CUDA, OpenCL, and ROCM. Note that OpenCL's API does not return the accurate device L2 cache size. I cannot find a Vulkan API that returns L2 texture cache size, but the `vkCmdPipelineBarrier` call will flush the L2 texture cache automatically(https://zeux.io/2020/02/27/writing-an-efficient-vulkan-renderer/), thus we return 0 by default.

This PR introduces an optional cache flush functionality to `time_evaluator`. It is implemented by allocating two large empty NDArrays on the device so that the L2 cache are flushed. This gives us more accurate evaluation on the performance of a runtime function.

tqchen reviewed Jul 12, 2023

View reviewed changes

src/runtime/profiling.cc Outdated Show resolved Hide resolved

yzh119 approved these changes Jul 14, 2023

View reviewed changes

junrushao approved these changes Jul 14, 2023

View reviewed changes

tqchen approved these changes Jul 14, 2023

View reviewed changes

junrushao changed the title ~~Flush L2 cache in time eval~~ [Runtime] Flush L2 cache in time eval Jul 14, 2023

MasterJH5574 changed the base branch from unity to main July 16, 2023 17:32

yzh119 mentioned this pull request Jul 16, 2023

[Runtime] Device API to query L2 cache size #15332

Merged

MasterJH5574 mentioned this pull request Jul 18, 2023

[Unity][Runtime] Flush L2 cache in time eval #15343

Closed

tqchen merged commit c0946e1 into apache:main Jul 18, 2023

ysh329 mentioned this pull request Oct 18, 2023

[Release] v0.14.0 Release Candidate Notes #15948

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Runtime] Flush L2 cache in time eval #15305

[Runtime] Flush L2 cache in time eval #15305

spectrometerHBH commented Jul 12, 2023 •

edited by MasterJH5574

Loading

tvm-bot commented Jul 12, 2023 •

edited

Loading

junrushao commented Jul 13, 2023

yzh119 commented Jul 13, 2023

spectrometerHBH commented Jul 13, 2023

yzh119 commented Jul 13, 2023

tqchen commented Jul 13, 2023

junrushao commented Jul 13, 2023

yzh119 commented Jul 13, 2023

yzh119 left a comment

tqchen commented Jul 17, 2023

[Runtime] Flush L2 cache in time eval #15305

[Runtime] Flush L2 cache in time eval #15305

Conversation

spectrometerHBH commented Jul 12, 2023 • edited by MasterJH5574 Loading

tvm-bot commented Jul 12, 2023 • edited Loading

junrushao commented Jul 13, 2023

yzh119 commented Jul 13, 2023

spectrometerHBH commented Jul 13, 2023

yzh119 commented Jul 13, 2023

tqchen commented Jul 13, 2023

junrushao commented Jul 13, 2023

yzh119 commented Jul 13, 2023

yzh119 left a comment

Choose a reason for hiding this comment

tqchen commented Jul 17, 2023

spectrometerHBH commented Jul 12, 2023 •

edited by MasterJH5574

Loading

tvm-bot commented Jul 12, 2023 •

edited

Loading