Skip to content

seanmor5/honeycomb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Honeycomb

Fast LLM inference built on Elixir, Bumblebee, and EXLA.

Usage

Honeycomb can be used as a standalone inference service or as a dependency in an existing Elixir project.

As a separate service

To use Honeycomb as a separate service, you just need to clone the project and run:

mix honeycomb.serve <config>

The following arguments are required:

  • --model - HuggingFace model repo to use

  • --chat-template - Chat template to use

The following arguments are optional:

  • --max-sequence-length - Text generation max sequence length. Total sequence length accounts for both input and output tokens.

  • --hf-auth-token - HuggingFace auth token for accessing private or gated repos.

The Honeycomb server is compatible with the OpenAI API, so you can use it as a drop-in replacement by changing the api_url in the OpenAI client.

As a dependency

To use Honeycomb as a dependency, first add it to your deps:

defp deps do
  [{:honeycomb, github: "seanmor5/honeycomb"}]
end

Next, you'll need to configure the serving options:

config :honeycomb, Honeycomb.Serving,
  model: "microsoft/Phi-3-mini-4k-instruct",
  chat_template: "phi3",
  auth_token: System.fetch_env!("HF_TOKEN")

Then you can call Honeycomb directly:

messages = [%{role: "user", content: "Hello!"}]
Honeycomb.chat_completion(messages: messages)

Benchmarks

Honeycomb ships with some basic benchmarks and profiling utilities. You can benchmark and/or profile your inference configuration by running:

mix honeycomb.benchmark <config>

About

Fast LLM inference with Elixir and Bumblebee

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published