-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Mamba2 #31204
Comments
Hopefully that conversion script can be re-used to create the |
cc @3outeille do you want to have a go at this? |
@3outeille I would love to help out! |
Glad to be @ when a PR is opened 👀 |
Not sure if this counts as ad but I've created my own HF-compatible version of Mamba2: https://github.com/vasqu/mamba2-torch Feel free to remove it if this is unwanted.
|
Hybrid variant will probably be here, as it's Jamba like |
@vasqu Awesome work! Sorry for the late response, I was busy finishing up a paper. I will take a look tomorrow so we can work together and finalize it. Once Mamba2 is done, we can move on to Mamba2Hybrid. |
@vasqu Can you make me a collaborator on the repo you shared so I can open up PRs? |
@Adibvafa No worries and good luck on the paper! Let's move the discussion elsewhere: Do you have discord? Add me I think we should directly work on a fork. I want the repo above to stay as a standalone modular thing for now. |
Hi everyone, Thanks so much for all the effort being put into porting mamba2 in transformers! I'm really excited about this. Just checking in to see if there's any update or estimated timeline on when the PR might be created? Thank you! |
@vasqu Thank you for the reply, and yes, I'd be grateful if you can tag me when you guys will open your PR. Thanks again! |
Codestral Mamba just got released and it uses the Mamba2 architecture, with untied embeddings unlike every other Mamba2 models, which use tied embeddings. |
Feature request
Support of a new version of the mamba architecture. See paper @ https://arxiv.org/abs/2405.21060. Base code has been released in their repo https://github.com/state-spaces/mamba.
Motivation
It's still very competitive in NLP (according to the benchmarks) but also a good chunk faster due to new parallelism mechanisms.
Your contribution
Not sure, I could look into it but it's definitely a bigger undertaking :)
The text was updated successfully, but these errors were encountered: