Support llama_encode (WIP) #91

ngxson · 2024-07-08T23:00:06Z

await wllama.loadModelFromUrl("https://huggingface.co/Felladrin/gguf-flan-t5-large/resolve/main/flan-t5-large.Q2_K.gguf", {
  n_ctx: 1024,
});

output = await wllama.createCompletion("translate English to French: How old are you?", {
  nPredict: 20,
  sampling: { temp: 0 },
});

// output:  Les âges de vous êtes-vous?
// expected: Vous avez quel âge ?

ngxson · 2024-07-09T09:46:16Z

Quantized model is not usable (seems like flan requires a lot of precision)

FP16 (answer is a bit more correct, but in french we never use "être" for asking age):

INT8 (answer is wrong):

ngxson · 2024-07-10T09:01:09Z

@felladrin I still can't get reliable results, but seems like problem comes from llama.cpp and not wllama.

This PR will be merged now.

felladrin · 2024-07-15T08:52:32Z

Thank you for implementing it, @ngxson!
I tested it with https://huggingface.co/Felladrin/gguf-MaxMini-Instruct-248M and it worked great!
Inference was considerably slower than a 248M decoder-only, but encoder-decoder models still have their uses!

ngxson · 2024-07-15T10:28:04Z

@felladrin Thanks for the info. I'm not sure why it's significant slower, probably something to be optimized from upstream.

And yeah I agree that encoder-decoder models are still useful. Personally I found that for more deterministic tasks like translating languages, it hallucinates less than decoder-only.

ngxson added 2 commits July 9, 2024 00:41

add support for llama_encoder

8074790

add encoder to createCompletion

475c7a8

ngxson linked an issue Jul 8, 2024 that may be closed by this pull request

T5 and Flan-T5 models support (llama_encode) #86

Closed

Merge branch 'master' into xsn/llama_encoder

d78e4fe

ngxson marked this pull request as ready for review July 10, 2024 09:01

ngxson merged commit 97db6f5 into master Jul 10, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support llama_encode (WIP) #91

Support llama_encode (WIP) #91

ngxson commented Jul 8, 2024 •

edited

Loading

ngxson commented Jul 9, 2024 •

edited

Loading

ngxson commented Jul 10, 2024

felladrin commented Jul 15, 2024

ngxson commented Jul 15, 2024

Support llama_encode (WIP) #91

Support llama_encode (WIP) #91

Conversation

ngxson commented Jul 8, 2024 • edited Loading

ngxson commented Jul 9, 2024 • edited Loading

ngxson commented Jul 10, 2024

felladrin commented Jul 15, 2024

ngxson commented Jul 15, 2024

ngxson commented Jul 8, 2024 •

edited

Loading

ngxson commented Jul 9, 2024 •

edited

Loading