Skip to content

Commit

Permalink
Enhance Loops kernel for XPU device
Browse files Browse the repository at this point in the history
Optimized loops backbone, including,
1. Used global range stride kernel instead of legacy kernel for no-cast case.
2. For broadcast case, we vectorized it wherever possible.
3. Reduce the number of loops kernels.
4. Add UTs.
  • Loading branch information
xytintel authored Mar 12, 2024
1 parent 842f9c4 commit a6da433
Show file tree
Hide file tree
Showing 4 changed files with 220 additions and 99 deletions.
Loading

0 comments on commit a6da433

Please sign in to comment.