Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mps matrix multiplication kernel #1

Merged
merged 8 commits into from
Oct 31, 2023

Conversation

ivarflakstad
Copy link
Owner

Output from running cargo run --release --example mps-matrix-multiplication --features=mps

Correctness:
[=================================================] ✅

Performance:
Generating input matrices: (f32 4096x4096 and f16 4096x4096)
Running with transpose left: false, transpose right: false, alpha: 1, beta: 0
[=================================================] ✅
Avg GFLOPS: 2714.017354808345
Total time: 2.534734497s
Avg time: 50.694689ms

Running with transpose left: true, transpose right: false, alpha: 1, beta: 0
[=================================================] ✅
Avg GFLOPS: 2544.2273256244875
Total time: 2.702094294s
Avg time: 54.041885ms

Running with transpose left: false, transpose right: true, alpha: 1, beta: 0
[=================================================] ✅
Avg GFLOPS: 2696.778482443825
Total time: 2.550065042s
Avg time: 51.0013ms

Running with transpose left: false, transpose right: false, alpha: 0.5, beta: 0
[=================================================] ✅
Avg GFLOPS: 2668.0019304379844
Total time: 2.577410417s
Avg time: 51.548208ms

Running with transpose left: false, transpose right: false, alpha: 1, beta: 0.5
[=================================================] ✅
Avg GFLOPS: 2667.1376704607605
Total time: 2.578379166s
Avg time: 51.567583ms

@ivarflakstad ivarflakstad self-assigned this Oct 30, 2023
@ivarflakstad ivarflakstad merged commit c130b1c into master Oct 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants