-
Notifications
You must be signed in to change notification settings - Fork 523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize the cache fetch for forward split, pt. 1 (#2187) #2218
Conversation
✅ Deploy Preview for pytorch-fbgemm-docs canceled.
|
This pull request was exported from Phabricator. Differential Revision: D51865590 |
384e08a
to
2b0b81b
Compare
Summary: Rewrite the kernel to use cache_hit_rate enum as template argument. We first check if the cache is empty and pass that value as a template argument. Inside the first kernel, we then determine the cache conflict miss rate, and use this value to as a template parameter when invoking the second kernel, which performs the actual lookup work. We pass in uvm_cache_stats as a run-time argument here instead of passing the cache miss rate as a compile-time argument, because uvm_cache_stats data is only available on the GPU, and incoking a templatized kernel with the cache miss rate as a template argument requires the cache misse information to first be passed back to the host, which is an expensive operation. This is based on the earlier work in stacks D48937380 and D49675672, which have been based on very outdated branches of fbcode. Differential Revision: D51865590
This pull request was exported from Phabricator. Differential Revision: D51865590 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D51865590 |
Summary: Rewrite the kernel to use cache_hit_rate enum as template argument. We first check if the cache is empty and pass that value as a template argument. Inside the first kernel, we then determine the cache conflict miss rate, and use this value to as a template parameter when invoking the second kernel, which performs the actual lookup work. We pass in uvm_cache_stats as a run-time argument here instead of passing the cache miss rate as a compile-time argument, because uvm_cache_stats data is only available on the GPU, and incoking a templatized kernel with the cache miss rate as a template argument requires the cache misse information to first be passed back to the host, which is an expensive operation. This is based on the earlier work in stacks D48937380 and D49675672, which have been based on very outdated branches of fbcode. Differential Revision: D51865590
Summary: Rewrite the kernel to use cache_hit_rate enum as template argument. We first check if the cache is empty and pass that value as a template argument. Inside the first kernel, we then determine the cache conflict miss rate, and use this value to as a template parameter when invoking the second kernel, which performs the actual lookup work. We pass in uvm_cache_stats as a run-time argument here instead of passing the cache miss rate as a compile-time argument, because uvm_cache_stats data is only available on the GPU, and incoking a templatized kernel with the cache miss rate as a template argument requires the cache misse information to first be passed back to the host, which is an expensive operation. This is based on the earlier work in stacks D48937380 and D49675672, which have been based on very outdated branches of fbcode. Differential Revision: D51865590
Summary: Rewrite the kernel to use cache_hit_rate enum as template argument. We first check if the cache is empty and pass that value as a template argument. Inside the first kernel, we then determine the cache conflict miss rate, and use this value to as a template parameter when invoking the second kernel, which performs the actual lookup work. We pass in uvm_cache_stats as a run-time argument here instead of passing the cache miss rate as a compile-time argument, because uvm_cache_stats data is only available on the GPU, and incoking a templatized kernel with the cache miss rate as a template argument requires the cache misse information to first be passed back to the host, which is an expensive operation. This is based on the earlier work in stacks D48937380 and D49675672, which have been based on very outdated branches of fbcode. Differential Revision: D51865590
2b0b81b
to
ee553e2
Compare
This pull request was exported from Phabricator. Differential Revision: D51865590 |
Summary: Rewrite the kernel to use cache_hit_rate enum as template argument. We first check if the cache is empty and pass that value as a template argument. Inside the first kernel, we then determine the cache conflict miss rate, and use this value to as a template parameter when invoking the second kernel, which performs the actual lookup work. We pass in uvm_cache_stats as a run-time argument here instead of passing the cache miss rate as a compile-time argument, because uvm_cache_stats data is only available on the GPU, and incoking a templatized kernel with the cache miss rate as a template argument requires the cache misse information to first be passed back to the host, which is an expensive operation. This is based on the earlier work in stacks D48937380 and D49675672, which have been based on very outdated branches of fbcode. Differential Revision: D51865590
ee553e2
to
5e3adb6
Compare
This pull request was exported from Phabricator. Differential Revision: D51865590 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D51865590 |
Summary: Rewrite the kernel to use cache_hit_rate enum as template argument. We first check if the cache is empty and pass that value as a template argument. Inside the first kernel, we then determine the cache conflict miss rate, and use this value to as a template parameter when invoking the second kernel, which performs the actual lookup work. We pass in uvm_cache_stats as a run-time argument here instead of passing the cache miss rate as a compile-time argument, because uvm_cache_stats data is only available on the GPU, and incoking a templatized kernel with the cache miss rate as a template argument requires the cache misse information to first be passed back to the host, which is an expensive operation. This is based on the earlier work in stacks D48937380 and D49675672, which have been based on very outdated branches of fbcode. Differential Revision: D51865590
Summary: Rewrite the kernel to use cache_hit_rate enum as template argument. We first check if the cache is empty and pass that value as a template argument. Inside the first kernel, we then determine the cache conflict miss rate, and use this value to as a template parameter when invoking the second kernel, which performs the actual lookup work. We pass in uvm_cache_stats as a run-time argument here instead of passing the cache miss rate as a compile-time argument, because uvm_cache_stats data is only available on the GPU, and incoking a templatized kernel with the cache miss rate as a template argument requires the cache misse information to first be passed back to the host, which is an expensive operation. This is based on the earlier work in stacks D48937380 and D49675672, which have been based on very outdated branches of fbcode. Differential Revision: D51865590
Summary: Rewrite the kernel to use cache_hit_rate enum as template argument. We first check if the cache is empty and pass that value as a template argument. Inside the first kernel, we then determine the cache conflict miss rate, and use this value to as a template parameter when invoking the second kernel, which performs the actual lookup work. We pass in uvm_cache_stats as a run-time argument here instead of passing the cache miss rate as a compile-time argument, because uvm_cache_stats data is only available on the GPU, and incoking a templatized kernel with the cache miss rate as a template argument requires the cache misse information to first be passed back to the host, which is an expensive operation. This is based on the earlier work in stacks D48937380 and D49675672, which have been based on very outdated branches of fbcode. Differential Revision: D51865590
Summary: Rewrite the kernel to use cache_hit_rate enum as template argument. We first check if the cache is empty and pass that value as a template argument. Inside the first kernel, we then determine the cache conflict miss rate, and use this value to as a template parameter when invoking the second kernel, which performs the actual lookup work. We pass in uvm_cache_stats as a run-time argument here instead of passing the cache miss rate as a compile-time argument, because uvm_cache_stats data is only available on the GPU, and incoking a templatized kernel with the cache miss rate as a template argument requires the cache misse information to first be passed back to the host, which is an expensive operation. This is based on the earlier work in stacks D48937380 and D49675672, which have been based on very outdated branches of fbcode. Reviewed By: spcyppt Differential Revision: D51865590
Summary: Rewrite the kernel to use cache_hit_rate enum as template argument. We first check if the cache is empty and pass that value as a template argument. Inside the first kernel, we then determine the cache conflict miss rate, and use this value to as a template parameter when invoking the second kernel, which performs the actual lookup work. We pass in uvm_cache_stats as a run-time argument here instead of passing the cache miss rate as a compile-time argument, because uvm_cache_stats data is only available on the GPU, and incoking a templatized kernel with the cache miss rate as a template argument requires the cache misse information to first be passed back to the host, which is an expensive operation. This is based on the earlier work in stacks D48937380 and D49675672, which have been based on very outdated branches of fbcode. Reviewed By: spcyppt Differential Revision: D51865590
Summary: Rewrite the kernel to use cache_hit_rate enum as template argument. We first check if the cache is empty and pass that value as a template argument. Inside the first kernel, we then determine the cache conflict miss rate, and use this value to as a template parameter when invoking the second kernel, which performs the actual lookup work. We pass in uvm_cache_stats as a run-time argument here instead of passing the cache miss rate as a compile-time argument, because uvm_cache_stats data is only available on the GPU, and incoking a templatized kernel with the cache miss rate as a template argument requires the cache misse information to first be passed back to the host, which is an expensive operation. This is based on the earlier work in stacks D48937380 and D49675672, which have been based on very outdated branches of fbcode. Reviewed By: spcyppt Differential Revision: D51865590
This pull request has been merged in 7dd0c7f. |
Summary:
Rewrite the kernel to use cache_hit_rate enum as template argument. We first check if the cache is empty and pass that value as a template argument. Inside the first kernel, we then determine the cache conflict miss rate, and use this value to as a template parameter when invoking the second kernel, which performs the actual lookup work.
We pass in uvm_cache_stats as a run-time argument here instead of passing the cache miss rate as a compile-time argument, because uvm_cache_stats data is only available on the GPU, and incoking a templatized kernel with the cache miss rate as a template argument requires the cache misse information to first be passed back to the host, which is an expensive operation.
This is based on the earlier work in stacks D48937380 and D49675672, which have been based on very outdated branches of fbcode.
Differential Revision: D51865590