Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance loops backbone #32

Merged
merged 6 commits into from
Mar 12, 2024
Merged

Enhance loops backbone #32

merged 6 commits into from
Mar 12, 2024

Conversation

xytintel
Copy link
Contributor

@xytintel xytintel commented Mar 11, 2024

Optimized loops backbone, includes:

  1. Used global range stride kernel instead of legacy kernel for no-cast case.
  2. For broadcast case, we vectorized it wherever possible.
  3. Reduce the number of loops kernels.
  4. Add UTs.

@xytintel xytintel requested a review from fengyuan14 March 12, 2024 02:20
@fengyuan14 fengyuan14 merged commit a6da433 into main Mar 12, 2024
1 check passed
@fengyuan14 fengyuan14 deleted the xyt/enhance_loops branch March 12, 2024 03:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants