Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Mamba2 #31204

Closed
vasqu opened this issue Jun 3, 2024 · 16 comments
Closed

Add Mamba2 #31204

vasqu opened this issue Jun 3, 2024 · 16 comments
Labels
Feature request Request for a new feature New model

Comments

@vasqu
Copy link
Contributor

vasqu commented Jun 3, 2024

Feature request

Support of a new version of the mamba architecture. See paper @ https://arxiv.org/abs/2405.21060. Base code has been released in their repo https://github.com/state-spaces/mamba.

Motivation

It's still very competitive in NLP (according to the benchmarks) but also a good chunk faster due to new parallelism mechanisms.

Your contribution

Not sure, I could look into it but it's definitely a bigger undertaking :)

@vasqu vasqu added the Feature request Request for a new feature label Jun 3, 2024
@amyeroberts
Copy link
Collaborator

cc @ArthurZucker

@riklopfer
Copy link
Contributor

#29631

Hopefully that conversion script can be re-used to create the mamba2*-hf models

@ArthurZucker
Copy link
Collaborator

cc @3outeille do you want to have a go at this?

@Adibvafa
Copy link
Contributor

Adibvafa commented Jun 5, 2024

@3outeille I would love to help out!

@vasqu
Copy link
Contributor Author

vasqu commented Jun 5, 2024

Glad to be @ when a PR is opened 👀

@zzhhjjj zzhhjjj mentioned this issue Jun 8, 2024
@vasqu
Copy link
Contributor Author

vasqu commented Jun 14, 2024

Not sure if this counts as ad but I've created my own HF-compatible version of Mamba2: https://github.com/vasqu/mamba2-torch

Feel free to remove it if this is unwanted.

  • It's still very experimental
  • It doesn't support the hybrid variant with attention blocks.
  • I haven't tested too much so it might actually be crap.
  • It can also be used as a reference for an actual transformers implementation (?)

@ArthurZucker
Copy link
Collaborator

Hybrid variant will probably be here, as it's Jamba like

@Adibvafa
Copy link
Contributor

@vasqu Awesome work! Sorry for the late response, I was busy finishing up a paper. I will take a look tomorrow so we can work together and finalize it. Once Mamba2 is done, we can move on to Mamba2Hybrid.

@Adibvafa
Copy link
Contributor

@vasqu Can you make me a collaborator on the repo you shared so I can open up PRs?

@vasqu
Copy link
Contributor Author

vasqu commented Jun 27, 2024

@Adibvafa No worries and good luck on the paper! Let's move the discussion elsewhere: Do you have discord? Add me _vasquez or contact me via mail a.vlasjuk@t-online.de.

I think we should directly work on a fork. I want the repo above to stay as a standalone modular thing for now.

@pglorio
Copy link
Contributor

pglorio commented Jul 8, 2024

Hi everyone,

Thanks so much for all the effort being put into porting mamba2 in transformers! I'm really excited about this. Just checking in to see if there's any update or estimated timeline on when the PR might be created? Thank you!

@vasqu
Copy link
Contributor Author

vasqu commented Jul 8, 2024

@pglorio Hey there, there's no specific timeline but @Adibvafa and I will start working on it soon, hopefully in the next couple of days :)

No idea how long this will take though. I can tag you when we open our PR (if you want to).

@pglorio
Copy link
Contributor

pglorio commented Jul 8, 2024

@vasqu Thank you for the reply, and yes, I'd be grateful if you can tag me when you guys will open your PR. Thanks again!

@theo77186
Copy link

Codestral Mamba just got released and it uses the Mamba2 architecture, with untied embeddings unlike every other Mamba2 models, which use tied embeddings.

@vasqu vasqu mentioned this issue Jul 17, 2024
5 tasks
@ArthurZucker
Copy link
Collaborator

We are adding Codestral Mamba with @molbap, it's full mamba layers so keep #32027 open ! 🫂

@vasqu
Copy link
Contributor Author

vasqu commented Aug 17, 2024

Closing this for now since it has been added recently with #32080 but if wanted hybrid variants can be worked on top now and maybe #32027 can be used as reference :)

@vasqu vasqu closed this as completed Aug 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature request Request for a new feature New model
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants