Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v0.3.3] Release Tracker #3097

Closed
4 of 5 tasks
WoosukKwon opened this issue Feb 28, 2024 · 5 comments · Fixed by #3129
Closed
4 of 5 tasks

[v0.3.3] Release Tracker #3097

WoosukKwon opened this issue Feb 28, 2024 · 5 comments · Fixed by #3129
Labels
release Related to new version release

Comments

@WoosukKwon
Copy link
Collaborator

WoosukKwon commented Feb 28, 2024

ETA: Feb 29th - Mar 1st

Major changes

  • StarCoder2 support
  • Performance optimization and LoRA support for Gemma
  • Performance optimization for MoE kernel
  • 2/3/8-bit GPTQ support
  • [Experimental] AWS Inferentia2 support

PRs to be merged before the release

@WoosukKwon WoosukKwon added the release Related to new version release label Feb 28, 2024
@simon-mo
Copy link
Collaborator

#2819
#3087
Starcoder2

@njhill
Copy link
Member

njhill commented Feb 29, 2024

#3099 is a fix for the #3087 regression

@hanzhi713
Copy link
Contributor

#2760 Fixes for custom all reduce on some platforms

@robertgshaw2-redhat
Copy link
Collaborator

#2497 Adds support for Marlin INT4 kernels ~3x faster than current GPTQ kernels

@HyperdriveHustle
Copy link
Contributor

#3016
Fix: Output text is always truncated in some models

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release Related to new version release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants