Releases: exo-lang/ExoBLAS
Releases · exo-lang/ExoBLAS
v0.0.2
What's Changed
- Update with exo #637 Extern changes by @yamaguchi1024 in #123
- Update asum reference by @yamaguchi1024 in #124
- Update imports for exo src refactoring by @yamaguchi1024 in #125
- Update documentation by @yamaguchi1024 in #126
- fix graphing script by @yamaguchi1024 in #127
- Update the graphing script and fix copy not being benchmarked by @yamaguchi1024 in #128
- Organize benchmark_results into levels by @yamaguchi1024 in #129
- Add C++ loc script by @yamaguchi1024 in #130
Full Changelog: v0.0.1...v0.0.2
v0.0.1
What's Changed
- Restructure tests correctness/bench by @SamirDroubi in #1
- Parse json output to generate graph by @yamaguchi1024 in #3
- Merge correct and okay fast gemm by @yamaguchi1024 in #5
- Revert "testing branch protection" by @SamirDroubi in #6
- Fail-fast check in hoist by @SamirDroubi in #7
- update graph for GEMM by @yamaguchi1024 in #8
- Do instruction selection early in vectorize by @SamirDroubi in #9
- N and M must be lowercase for the graph script by @yamaguchi1024 in #10
- Rewrite gemm microk and improve vectorizer by @SamirDroubi in #11
- Tile_loops and auto_stage_mem operations by @SamirDroubi in #12
- Rerwite Trans algos + retune parameters for NonTrans case by @SamirDroubi in #13
- Add cmath header for fabs on mac by @yamaguchi1024 in #14
- use original algorithm and call_eqv on the scheduled version by @yamaguchi1024 in #17
- First refactor of composed schedules by @SamirDroubi in #18
- Reshift loops after cut_loop by @SamirDroubi in #19
- Change various compile flags by @SamirDroubi in #20
- Composed schedule refactor 2 by @SamirDroubi in #21
- Rewrite asum by @SamirDroubi in #22
- Rewrote copy by @SamirDroubi in #23
- Rewrite rot by @SamirDroubi in #24
- Rewrite rotm by @SamirDroubi in #25
- Rewrite scal by @SamirDroubi in #26
- Rewrite swap by @SamirDroubi in #27
- Rewrite dsdot by @SamirDroubi in #28
- Update remove-if to eliminate-dead-code by @SamirDroubi in #29
- Implement clean Gemmini Gemm and preliminary SAD by @yamaguchi1024 in #30
- Implement sgemm by @SamirDroubi in #32
- Renamed ForSeqCursor to ForCursor by @skeqiqevian in #31
- Update github ci workflow files by @yamaguchi1024 in #33
- Deprecate CSE argument from bind_expr by @SamirDroubi in #34
- Test codegen by @SamirDroubi in #35
- Implement a more robust hoist_stmt by @SamirDroubi in #36
- Cleanup introspection code by @SamirDroubi in #37
- Make names created by ops unique by @SamirDroubi in #38
- Cleanup par red by @SamirDroubi in #39
- Optimize level 1 operation by @SamirDroubi in #40
- Automate reduction parallelism by @SamirDroubi in #41
- Implement higher-order functions by @SamirDroubi in #42
- Implement hoist_from_loop op by @SamirDroubi in #43
- Rewrite gemv and start an optimize_level_2 op by @SamirDroubi in #44
- Simplify vectorize and support various tail strats by @SamirDroubi in #45
- Rewrite interleave_execution by @SamirDroubi in #46
- Implement unroll_buffers and get_inner_loop ops by @SamirDroubi in #47
- Implement stage_compute by @SamirDroubi in #48
- Vectorization stage compute by @SamirDroubi in #49
- Rewrite asum as optimize_level_1 by @SamirDroubi in #50
- Rewrite dsdot via optimize level 1 by @SamirDroubi in #51
- Support rc mode for hoist operations by @SamirDroubi in #53
- Remove unsafe_disable_checks from expand_dim by @yamaguchi1024 in #54
- Restructure vectorization by @SamirDroubi in #55
- Deprecate bound_alloc by @SamirDroubi in #56
- Automate nth_loop in parallelize_all_reductions by @SamirDroubi in #57
- Interleave restructure by @SamirDroubi in #58
- Rewrite optimize_level_2 by @SamirDroubi in #59
- Generate gemv variant by @SamirDroubi in #60
- Rewrite auto-stage-mem to handle general blocks by @SamirDroubi in #61
- Check codegen hash instead of sources by @SamirDroubi in #62
- Add Preliminary CSE support by @SamirDroubi in #63
- Update CI to use the new M1 macOS runner by @yamaguchi1024 in #64
- Fix Github workflows by @SamirDroubi in #65
- Support triangular matrices in optimize_level_2 by @SamirDroubi in #66
- Deprecate parameters class by @SamirDroubi in #68
- Minimize Build/Test time by @SamirDroubi in #69
- Automation of precision-stride variants generation by @SamirDroubi in #70
- Rewrite ger by @SamirDroubi in #71
- Rewrite symv by @SamirDroubi in #72
- Rewrite syr2 by @SamirDroubi in #73
- Rewrite syr by @SamirDroubi in #74
- Rewrite trsv by @SamirDroubi in #75
- New graphing script and start metaprogramming tests by @SamirDroubi in #76
- Restructure lib and deprecate legacy code by @SamirDroubi in #77
- Parameterize Tail Strategies for level 1 and 2 by @SamirDroubi in #78
- Changes to speedup compilation time by @SamirDroubi in #79
- Enable inner tails in symmetric/triangular matrices by @SamirDroubi in #80
- Fix extract_subproc call by @SamirDroubi in #81
- Make workflows reusable by @SamirDroubi in #84
- Generate various performance features with kernels by @SamirDroubi in #83
- Apply yapf fix to workflow by @SamirDroubi in #85
- New scheduling flow for gemm by @SamirDroubi in #87
- Remove Gemmini matmul code from BLAS by @yamaguchi1024 in #86
- Rewrite gemm testing by @SamirDroubi in #88
- Implement filter_cursors op by @SamirDroubi in #89
- Gemm macro micro by @SamirDroubi in #90
- Add AVX512 target by @SamirDroubi in #91
- Bump black from 22.10.0 to 24.3.0 by @dependabot in #92
- Write the uk schedule by hand to manage comp time by @SamirDroubi in #93
- syrk schedule (missing calls to gemm after tiling) by @SamirDroubi in #94
- call gemms from syrk after tiling by @SamirDroubi in #95
- Implement symm algorithms and testing by @SamirDroubi in #96
- Schedule symm by @SamirDroubi in #97
- Generate symm variants by @SamirDroubi in #98
- Implement syr2k algorithm and testing by @SamirDroubi in #99
- Level 3 Testing Fixes by @SamirDroubi in #100
- Support output matrix scaling in level 3 by @SamirDroubi in #101
- Support transposition options in gemm and syrk by @SamirDroubi in #102
- Automate cutting of iteration space for Uplo by @SamirDroubi in #103
- LoC analytics script by @SamirDroubi in #104
- Rewrite trmv, trsv, and ger in the new testing format by @SamirDroubi in #105
- Heatmap summary plots by @SamirDroubi in #106
- Various improvements to gemm by @SamirDroubi in #108
- Profile tool by @SamirDroubi in #109
- Fix exo_gemm hash based on circular buffer PR by @skeqiqevian in https://github.com/exo-lan...