llama2.rs.wasm 🦀

A dirty and minimal port of @Gaxler llama2.rs

How to run?

Clone repo

git clone https://github.com/mtb0x1/llama2.rs.wasm
cd llama2.rs.wasm/port1/

Download @Karpathy's baby Llama2 (Orig instructions) pretrained on TinyStories dataset and place them in www folder.

wget -P www/ https://huggingface.co/karpathy/tinyllamas/resolve/main/stories15M.bin
wget -P www/ https://huggingface.co/karpathy/tinyllamas/resolve/main/stories42M.bin
wget -P www/ https://huggingface.co/karpathy/tinyllamas/resolve/main/stories110M.bin

stories42M is used by default (for now @todo), you can change this in index.html

Run (requires wasm-pack)

wasm-pack build --release --target web --out-dir www/pkg/

Run a minimal webserver with www folder :
1. Run (requires python 3), you can use other webservers if you want
```
cd www && python3 -m http.server 8080
```
1. go to http://localhost:8080/
2. open browser console (@todo)
(Optional) if you want to make changes :(reload browser/clear cache after changes)
1. Changing lib.rs content :
```
wasm-pack build --release --target web --out-dir www/pkg/
```
2. Changing the frontend index.html
3. Changing model/tokenizer :
  - Follow @Karpathy's instructions in llama2.c
  - Place new files in www folder and edit index.html if needed

Performance

Temperature : 0.9
Sequence length: 20

tok/s	15M	42M	110M	7B
wasm v1	~50	~20	~7	?

Not really sure about result (yet!).

todo/Next ?

Tests
Display bench result in webpage instead of browser console (wip need cleaning and remove console.info hack)
Infrence based on user inputs (done)
Optmization : simd, rayon (wip) ... etc

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

llama2.rs.wasm 🦀

How to run?

Performance

todo/Next ?

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

llama2.rs.wasm 🦀

How to run?

Performance

todo/Next ?

License