Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speculative sampling and Llmoe? #27

Open
SabinStargem opened this issue Sep 4, 2023 · 2 comments
Open

Speculative sampling and Llmoe? #27

SabinStargem opened this issue Sep 4, 2023 · 2 comments

Comments

@SabinStargem
Copy link

I heard about a new feature coming to Llama, where a method is used to speed up a model's inference. The benefit ranges, around 2x the speed, but probably closer to 1.5x. How it works is that a big model like 34b, uses a smaller draft model like 7b to sample input. The Github thread has video showing the performance benefits of the method.

My thoughts immediately jumped to Airoboros's Llmoe. Would it be possible to integrate a "Inference" Llmoe into vanilla Airoboros to benefit from speculative sampling?

speculative : PoC for speeding-up inference via speculative sampling

@jondurbin
Copy link
Owner

jondurbin commented Sep 4, 2023

I've been looking that that very closely, it's a great idea! Perhaps fine-tuning the new 1.1b or something would be a start.

@SabinStargem
Copy link
Author

SabinStargem commented Sep 5, 2023

One of the people posting in the Llama Github mentioned that chaining draft models might have potential. Something like a 3b->7b->13b->34b->70b. My gut says that much like b-parameters, there would probably be a sweet spot in the amount of draft models and their respective sizes in that configuration.

Fortunately, I believe that it would be relatively easy to objectively test multiple permutations - the metric is speed, which can be recorded easily. Provided that speculative sampling doesn't impact output quality, it should be a painless concept to test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants