Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the cache fetch for forward split, pt. 1 (#2187) #2218

Closed
wants to merge 1 commit into from

Conversation

q10
Copy link
Contributor

@q10 q10 commented Dec 14, 2023

Summary:

Rewrite the kernel to use cache_hit_rate enum as template argument. We first check if the cache is empty and pass that value as a template argument. Inside the first kernel, we then determine the cache conflict miss rate, and use this value to as a template parameter when invoking the second kernel, which performs the actual lookup work.

We pass in uvm_cache_stats as a run-time argument here instead of passing the cache miss rate as a compile-time argument, because uvm_cache_stats data is only available on the GPU, and incoking a templatized kernel with the cache miss rate as a template argument requires the cache misse information to first be passed back to the host, which is an expensive operation.

This is based on the earlier work in stacks D48937380 and D49675672, which have been based on very outdated branches of fbcode.

Differential Revision: D51865590

Copy link

netlify bot commented Dec 14, 2023

Deploy Preview for pytorch-fbgemm-docs canceled.

Name Link
🔨 Latest commit 5e3adb6
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/6581e9e6d172fc0008bc910e

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D51865590

@q10 q10 force-pushed the export-D51865590 branch from 384e08a to 2b0b81b Compare December 18, 2023 19:58
q10 added a commit to q10/FBGEMM that referenced this pull request Dec 18, 2023
Summary:


Rewrite the kernel to use cache_hit_rate enum as template argument.  We first check if the cache is empty and pass that value as a template argument.  Inside the first kernel, we then determine the cache conflict miss rate, and use this value to as a template parameter when invoking the second kernel, which performs the actual lookup work.

We pass in uvm_cache_stats as a run-time argument here instead of passing the cache miss rate as a compile-time argument, because uvm_cache_stats data is only available on the GPU, and incoking a templatized kernel with the cache miss rate as a template argument requires the cache misse information to first be passed back to the host, which is an expensive operation.

This is based on the earlier work in stacks D48937380 and D49675672, which have been based on very outdated branches of fbcode.

Differential Revision: D51865590
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D51865590

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D51865590

q10 added a commit to q10/FBGEMM that referenced this pull request Dec 18, 2023
Summary:


Rewrite the kernel to use cache_hit_rate enum as template argument.  We first check if the cache is empty and pass that value as a template argument.  Inside the first kernel, we then determine the cache conflict miss rate, and use this value to as a template parameter when invoking the second kernel, which performs the actual lookup work.

We pass in uvm_cache_stats as a run-time argument here instead of passing the cache miss rate as a compile-time argument, because uvm_cache_stats data is only available on the GPU, and incoking a templatized kernel with the cache miss rate as a template argument requires the cache misse information to first be passed back to the host, which is an expensive operation.

This is based on the earlier work in stacks D48937380 and D49675672, which have been based on very outdated branches of fbcode.

Differential Revision: D51865590
q10 added a commit to q10/FBGEMM that referenced this pull request Dec 18, 2023
Summary:


Rewrite the kernel to use cache_hit_rate enum as template argument.  We first check if the cache is empty and pass that value as a template argument.  Inside the first kernel, we then determine the cache conflict miss rate, and use this value to as a template parameter when invoking the second kernel, which performs the actual lookup work.

We pass in uvm_cache_stats as a run-time argument here instead of passing the cache miss rate as a compile-time argument, because uvm_cache_stats data is only available on the GPU, and incoking a templatized kernel with the cache miss rate as a template argument requires the cache misse information to first be passed back to the host, which is an expensive operation.

This is based on the earlier work in stacks D48937380 and D49675672, which have been based on very outdated branches of fbcode.

Differential Revision: D51865590
q10 added a commit to q10/FBGEMM that referenced this pull request Dec 18, 2023
Summary:


Rewrite the kernel to use cache_hit_rate enum as template argument.  We first check if the cache is empty and pass that value as a template argument.  Inside the first kernel, we then determine the cache conflict miss rate, and use this value to as a template parameter when invoking the second kernel, which performs the actual lookup work.

We pass in uvm_cache_stats as a run-time argument here instead of passing the cache miss rate as a compile-time argument, because uvm_cache_stats data is only available on the GPU, and incoking a templatized kernel with the cache miss rate as a template argument requires the cache misse information to first be passed back to the host, which is an expensive operation.

This is based on the earlier work in stacks D48937380 and D49675672, which have been based on very outdated branches of fbcode.

Differential Revision: D51865590
@q10 q10 force-pushed the export-D51865590 branch from 2b0b81b to ee553e2 Compare December 18, 2023 20:10
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D51865590

Summary:


Rewrite the kernel to use cache_hit_rate enum as template argument.  We first check if the cache is empty and pass that value as a template argument.  Inside the first kernel, we then determine the cache conflict miss rate, and use this value to as a template parameter when invoking the second kernel, which performs the actual lookup work.

We pass in uvm_cache_stats as a run-time argument here instead of passing the cache miss rate as a compile-time argument, because uvm_cache_stats data is only available on the GPU, and incoking a templatized kernel with the cache miss rate as a template argument requires the cache misse information to first be passed back to the host, which is an expensive operation.

This is based on the earlier work in stacks D48937380 and D49675672, which have been based on very outdated branches of fbcode.

Differential Revision: D51865590
@q10 q10 force-pushed the export-D51865590 branch from ee553e2 to 5e3adb6 Compare December 19, 2023 19:07
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D51865590

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D51865590

q10 added a commit to q10/FBGEMM that referenced this pull request Dec 19, 2023
Summary:


Rewrite the kernel to use cache_hit_rate enum as template argument.  We first check if the cache is empty and pass that value as a template argument.  Inside the first kernel, we then determine the cache conflict miss rate, and use this value to as a template parameter when invoking the second kernel, which performs the actual lookup work.

We pass in uvm_cache_stats as a run-time argument here instead of passing the cache miss rate as a compile-time argument, because uvm_cache_stats data is only available on the GPU, and incoking a templatized kernel with the cache miss rate as a template argument requires the cache misse information to first be passed back to the host, which is an expensive operation.

This is based on the earlier work in stacks D48937380 and D49675672, which have been based on very outdated branches of fbcode.

Differential Revision: D51865590
q10 added a commit to q10/FBGEMM that referenced this pull request Dec 19, 2023
Summary:


Rewrite the kernel to use cache_hit_rate enum as template argument.  We first check if the cache is empty and pass that value as a template argument.  Inside the first kernel, we then determine the cache conflict miss rate, and use this value to as a template parameter when invoking the second kernel, which performs the actual lookup work.

We pass in uvm_cache_stats as a run-time argument here instead of passing the cache miss rate as a compile-time argument, because uvm_cache_stats data is only available on the GPU, and incoking a templatized kernel with the cache miss rate as a template argument requires the cache misse information to first be passed back to the host, which is an expensive operation.

This is based on the earlier work in stacks D48937380 and D49675672, which have been based on very outdated branches of fbcode.

Differential Revision: D51865590
q10 added a commit to q10/FBGEMM that referenced this pull request Dec 19, 2023
Summary:


Rewrite the kernel to use cache_hit_rate enum as template argument.  We first check if the cache is empty and pass that value as a template argument.  Inside the first kernel, we then determine the cache conflict miss rate, and use this value to as a template parameter when invoking the second kernel, which performs the actual lookup work.

We pass in uvm_cache_stats as a run-time argument here instead of passing the cache miss rate as a compile-time argument, because uvm_cache_stats data is only available on the GPU, and incoking a templatized kernel with the cache miss rate as a template argument requires the cache misse information to first be passed back to the host, which is an expensive operation.

This is based on the earlier work in stacks D48937380 and D49675672, which have been based on very outdated branches of fbcode.

Differential Revision: D51865590
q10 added a commit to q10/FBGEMM that referenced this pull request Dec 27, 2023
Summary:


Rewrite the kernel to use cache_hit_rate enum as template argument.  We first check if the cache is empty and pass that value as a template argument.  Inside the first kernel, we then determine the cache conflict miss rate, and use this value to as a template parameter when invoking the second kernel, which performs the actual lookup work.

We pass in uvm_cache_stats as a run-time argument here instead of passing the cache miss rate as a compile-time argument, because uvm_cache_stats data is only available on the GPU, and incoking a templatized kernel with the cache miss rate as a template argument requires the cache misse information to first be passed back to the host, which is an expensive operation.

This is based on the earlier work in stacks D48937380 and D49675672, which have been based on very outdated branches of fbcode.

Reviewed By: spcyppt

Differential Revision: D51865590
q10 added a commit to q10/FBGEMM that referenced this pull request Dec 27, 2023
Summary:


Rewrite the kernel to use cache_hit_rate enum as template argument.  We first check if the cache is empty and pass that value as a template argument.  Inside the first kernel, we then determine the cache conflict miss rate, and use this value to as a template parameter when invoking the second kernel, which performs the actual lookup work.

We pass in uvm_cache_stats as a run-time argument here instead of passing the cache miss rate as a compile-time argument, because uvm_cache_stats data is only available on the GPU, and incoking a templatized kernel with the cache miss rate as a template argument requires the cache misse information to first be passed back to the host, which is an expensive operation.

This is based on the earlier work in stacks D48937380 and D49675672, which have been based on very outdated branches of fbcode.

Reviewed By: spcyppt

Differential Revision: D51865590
q10 added a commit to q10/FBGEMM that referenced this pull request Dec 27, 2023
Summary:


Rewrite the kernel to use cache_hit_rate enum as template argument.  We first check if the cache is empty and pass that value as a template argument.  Inside the first kernel, we then determine the cache conflict miss rate, and use this value to as a template parameter when invoking the second kernel, which performs the actual lookup work.

We pass in uvm_cache_stats as a run-time argument here instead of passing the cache miss rate as a compile-time argument, because uvm_cache_stats data is only available on the GPU, and incoking a templatized kernel with the cache miss rate as a template argument requires the cache misse information to first be passed back to the host, which is an expensive operation.

This is based on the earlier work in stacks D48937380 and D49675672, which have been based on very outdated branches of fbcode.

Reviewed By: spcyppt

Differential Revision: D51865590
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 7dd0c7f.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants