Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Optimized loops backbone, including, 1. Used global range stride kernel instead of legacy kernel for no-cast case. 2. For broadcast case, we vectorized it wherever possible. 3. Reduce the number of loops kernels. 4. Add UTs.
- Loading branch information