Anyone keeping tabs on Vicuna, a new LLaMA-based model? #643
Replies: 21 comments 42 replies
-
Dep is pissed that they stole his name. |
Beta Was this translation helpful? Give feedback.
-
There are ggml weights on 🤗uploaded just yesterday. Haven’t had the chance
to try them yet
On Tue, 4 Apr 2023 at 02:23, edmundronald ***@***.***> wrote:
So what's the news on this? Are the quantified weights available?
—
Reply to this email directly, view it on GitHub
<#643 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABJS2AIFDGFXB5YM42NW65LW7OAZHANCNFSM6AAAAAAWOAZEQQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
Regards,
Jesse Jojo Johnson
http://www.jessejojojohnson.com/
|
Beta Was this translation helpful? Give feedback.
-
Tried the one on huggingface hub,
|
Beta Was this translation helpful? Give feedback.
-
First tries with vicunia 13b 4bit here. A zero-shot example here below. The answer is not that bad and written in the style of its big bro. Looks quite interesting. |
Beta Was this translation helpful? Give feedback.
-
Here is prompt for viscuna for llama.cpp I run it like so: Btw. any idea how do I redirect Vicuna output to some TTSpeech program ? |
Beta Was this translation helpful? Give feedback.
-
Yes, I keep tab on all llama descendants, under "models": https://github.com/underlines/awesome-marketing-datascience/blob/master/awesome-ai.md |
Beta Was this translation helpful? Give feedback.
-
Sorry if this is obvious, but is there a way currently to run the quantized Vicuna model in Python interactively on CPU (any bindings)? Or a stable way to call the executable from Python interactively? |
Beta Was this translation helpful? Give feedback.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
-
i found this vicuna-7b-4bit in hf. https://huggingface.co/chharlesonfire/ggml-vicuna-7b-4bit and with and here is my result: |
Beta Was this translation helpful? Give feedback.
-
I just use an additional |
Beta Was this translation helpful? Give feedback.
-
Anyone getting very slow performance on llama.cpp on M1 Pro 16gb RAM? Running |
Beta Was this translation helpful? Give feedback.
-
Seems to work with StableVicuna as well https://stability.ai/blog/stablevicuna-open-source-rlhf-chatbot Here are the steps if someone finds it useful: git clone https://huggingface.co/CarperAI/stable-vicuna-13b-delta
cd stable-vicuna-13b-delta
pip install torch tqdm transformers sentencepiece
python3 apply_delta.py --base-model-path <path-to-llama-weights-in-transformers-format> --target-model-path stable-vicuna-13b --delta-path .
cd llama.cpp
python convert-pth-to-ggml.py ./models/stable-vicuna-13b 1
./quantize ./models/stable-vicuna-13b/ggml-model-f16.bin ./my-models/stable-vicuna-13b/ggml-model-q4_0.bin q4_0 To convert LLaMA weights to the transformers format, I used this guide git clone git@github.com:huggingface/transformers.git
cd transformers
pip install accelerate protobuf==3.20 sentencepiece tokenizers
python src/transformers/models/llama/convert_llama_weights_to_hf.py --input_dir <path_to_llama_weights> --model_size 13B --output_dir <path-to-llama-weights-in-transformers-format> (this requires ~30 GB of RAM) |
Beta Was this translation helpful? Give feedback.
-
How are folks running these models w/ reasonable latency? I've tested |
Beta Was this translation helpful? Give feedback.
-
Slightly off topic, but on the subject of mlock, is there any credence to the idea that pushing the limits of your ram and resorting to swap space on a modern SSD could burn it out relatively quickly is true? I was considering getting an external thunderbolt SSD just for swap because with 32gb ram these 30b parameter models really do seem to swap a lot, even though I have mlock on and they say they fit within my available ram (I guess the swapping is happening in other apps such as chrome, if I don't kill it). |
Beta Was this translation helpful? Give feedback.
-
Slightly off topic, but on the subject of mlock, is there any
credence to the idea that pushing the limits of your ram and
resorting to swap space on a modern SSD could burn it out relatively
quickly is true? I was considering getting an external thunderbolt
SSD just for swap because with 32gb ram these 30b parameter models
really do seem to swap a lot, even though I have mlock on and they
say they fit within my available ram (I guess the swapping is
happening in other apps such as chrome, if I don't kill it).
This isn't true. However, this is not a solution either.
Swapping even on fast modern NVME will increase token generation time
by orders of magnitude.
BTW you should be able to fully fit 30b model in 32gb ram after some
swapping out of the unnecessary programs.
Regards,
Serhii.
|
Beta Was this translation helpful? Give feedback.
-
If merge ton of loras, will the 13b beat the 65b version like llama beat gpt3? |
Beta Was this translation helpful? Give feedback.
-
I've been using Vicuna for Question-Answering. I'm using the py-bindings ( My prompt template is:
I'm initializing the model:
I notice it will answer questions in a rhetorical style with
Have others seen this and / or should I be using an alternative prompt? |
Beta Was this translation helpful? Give feedback.
-
Just notice, |
Beta Was this translation helpful? Give feedback.
-
It is going to be necessary to provide an automatic version conversion
script. People who quantise a model aren't going to redo them and users
don't have the ability to figure out compatibility issues
…On Wed, Jul 26, 2023 at 9:58 AM MaratZakirov ***@***.***> wrote:
I guess ggml version might be the issue. Thank you for link will try it
—
Reply to this email directly, view it on GitHub
<#643 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABWGTOK5WYKHIJOH5ATCCDLXSDEY5ANCNFSM6AAAAAAWOAZEQQ>
.
You are receiving this because you commented.Message ID: <ggerganov/llama.
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
The software should output a diagnostic explaining that it is dealing with
a prior version.
On Wed, Jul 26, 2023 at 1:37 PM edmund ronald ***@***.***>
wrote:
… It is going to be necessary to provide an automatic version conversion
script. People who quantise a model aren't going to redo them and users
don't have the ability to figure out compatibility issues
On Wed, Jul 26, 2023 at 9:58 AM MaratZakirov ***@***.***>
wrote:
> I guess ggml version might be the issue. Thank you for link will try it
>
> —
> Reply to this email directly, view it on GitHub
> <#643 (reply in thread)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ABWGTOK5WYKHIJOH5ATCCDLXSDEY5ANCNFSM6AAAAAAWOAZEQQ>
> .
> You are receiving this because you commented.Message ID: <ggerganov/llama
> .***@***.***>
>
|
Beta Was this translation helpful? Give feedback.
-
Link to blog post, demo and GH: https://vicuna.lmsys.org/, https://chat.lmsys.org/, https://github.com/lm-sys/FastChat
This looks like the most capable LLaMA right now. They're yet to release the weights :)
Beta Was this translation helpful? Give feedback.
All reactions