Skip to content
This repository has been archived by the owner on Nov 18, 2023. It is now read-only.

Latest commit

 

History

History
61 lines (52 loc) · 2.21 KB

README.md

File metadata and controls

61 lines (52 loc) · 2.21 KB

llama2.rs.wasm 🦀

A dirty and minimal port of @Gaxler llama2.rs

Cute Llama

How to run?

  1. Clone repo
git clone https://github.com/mtb0x1/llama2.rs.wasm
cd llama2.rs.wasm/port1/
  1. Download @Karpathy's baby Llama2 (Orig instructions) pretrained on TinyStories dataset and place them in www folder.
wget -P www/ https://huggingface.co/karpathy/tinyllamas/resolve/main/stories15M.bin
wget -P www/ https://huggingface.co/karpathy/tinyllamas/resolve/main/stories42M.bin
wget -P www/ https://huggingface.co/karpathy/tinyllamas/resolve/main/stories110M.bin

stories42M is used by default (for now @todo), you can change this in index.html

  1. Run (requires wasm-pack)
    wasm-pack build --release --target web --out-dir www/pkg/
  2. Run a minimal webserver with www folder :
    1. Run (requires python 3), you can use other webservers if you want
    cd www && python3 -m http.server 8080
    1. go to http://localhost:8080/
    2. open browser console (@todo)
  3. (Optional) if you want to make changes :(reload browser/clear cache after changes)
    1. Changing lib.rs content :
      wasm-pack build --release --target web --out-dir www/pkg/
    2. Changing the frontend index.html
    3. Changing model/tokenizer :
      • Follow @Karpathy's instructions in llama2.c
      • Place new files in www folder and edit index.html if needed

Performance

  • Temperature : 0.9
  • Sequence length: 20
tok/s 15M 42M 110M 7B
wasm v1 ~50 ~20 ~7 ?

Not really sure about result (yet!).

todo/Next ?

  • Tests
  • Display bench result in webpage instead of browser console (wip need cleaning and remove console.info hack)
  • Infrence based on user inputs (done)
  • Optmization : simd, rayon (wip) ... etc

License

MIT