Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speculative decoding in llama.cpp : PoC for speeding-up inference via speculative sampling by ggerganov · Pull Request #2926 · ggerganov/llama.cpp #492

Open
1 task
irthomasthomas opened this issue Feb 1, 2024 · 0 comments
Labels
Algorithms Sorting, Learning or Classifying. All algorithms go here. llm-experiments experiments with large language models llm-serving-optimisations Tips, tricks and tools to speedup inference of large language models prompt-engineering Developing and optimizing prompts to efficiently use language models for various applications and re TIL Short notes or tips on coding, linux, llms, ml, etc

Comments

@irthomasthomas
Copy link
Owner

Title: speculative : PoC for speeding-up inference via speculative sampling #292

Suggested labels

{ "label-name": "LLM-speed-optimization", "description": "Optimizing LLama model inference speed", "confidence": 80.85 }

@irthomasthomas irthomasthomas added Algorithms Sorting, Learning or Classifying. All algorithms go here. llm-experiments experiments with large language models llm-serving-optimisations Tips, tricks and tools to speedup inference of large language models New-Label Choose this option if the existing labels are insufficient to describe the content accurately prompt-engineering Developing and optimizing prompts to efficiently use language models for various applications and re TIL Short notes or tips on coding, linux, llms, ml, etc labels Feb 1, 2024
@irthomasthomas irthomasthomas changed the title speculative : PoC for speeding-up inference via speculative sampling by ggerganov · Pull Request #2926 · ggerganov/llama.cpp speculative decoding in llama.cpp : PoC for speeding-up inference via speculative sampling by ggerganov · Pull Request #2926 · ggerganov/llama.cpp Feb 1, 2024
@irthomasthomas irthomasthomas removed the New-Label Choose this option if the existing labels are insufficient to describe the content accurately label Feb 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algorithms Sorting, Learning or Classifying. All algorithms go here. llm-experiments experiments with large language models llm-serving-optimisations Tips, tricks and tools to speedup inference of large language models prompt-engineering Developing and optimizing prompts to efficiently use language models for various applications and re TIL Short notes or tips on coding, linux, llms, ml, etc
Projects
None yet
Development

No branches or pull requests

1 participant