Skip to content

Commit

Permalink
Merge pull request #242 from EricLBuehler/speculative
Browse files Browse the repository at this point in the history
Implement Speculative Decoding
  • Loading branch information
EricLBuehler authored May 11, 2024
2 parents 7ed6157 + d630c4a commit ce8028e
Show file tree
Hide file tree
Showing 37 changed files with 982 additions and 1,189 deletions.
5 changes: 2 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ Mistral.rs is a fast LLM inference platform supporting inference on a variety of
**Powerful**:
- Fast LoRA support with weight merging.
- First X-LoRA inference platform with first class support.
- Speculative Decoding: Mix supported models as the draft model or the target model


This is a demo of interactive mode with streaming running Mistral GGUF:
Expand Down Expand Up @@ -121,9 +122,7 @@ OpenAI API compatible API server

**Llama Index integration**

- [Source](integrations/llama_index_integration.py).
- [Example](examples/llama_index/xlora_gguf.py)
- [Cookbook](examples/llama_index/cookbook.ipynb)
- Docs: https://docs.llamaindex.ai/en/stable/examples/llm/mistral_rs/

---

Expand Down
Loading

0 comments on commit ce8028e

Please sign in to comment.