Has anyone tried Dolly-like models? #569

kha84 · 2023-03-27T16:55:59Z

kha84
Mar 27, 2023

I just watched the latest video of my favorite youtuber - https://www.youtube.com/watch?v=AWAo4iyNWGc&t=14s and was wondering, if someone has already quantized & converted one of these to be compatible with llama.cpp?
The beauty of Dolly-like models is that they're based on open source gpt-j-6B from EleutherAI, so noone will be hunting us for using them without an ask.

pikalover6 · 2023-03-27T17:38:38Z

pikalover6
Mar 27, 2023

ggml already supports gpt-j, you should just be able to convert and quantize them.

0 replies

gorborukov · 2023-03-27T22:06:57Z

gorborukov
Mar 27, 2023

I did not succeed with the convert-h5-to-ggml.py script and the model Dolly_GPT-J-6b. Script thinks for a while and crashes with the status "Killed".

0 replies

kha84 · 2023-03-27T23:48:27Z

kha84
Mar 27, 2023
Author

I did not succeed with the convert-h5-to-ggml.py script and the model Dolly_GPT-J-6b. Script thinks for a while and crashes with the status "Killed".

Prob out-of-memory killer played his part? Check journalctl / syslog and try to monitor system resources while running it

31 replies

Ghatage Apr 15, 2023

@leroycep I'm wondering if I should try to use float32 during quantization of databricks/dolly-v2-12b. Perhaps that would give better accuracy (but require 2x the memory)

Ghatage Apr 15, 2023

Just checking back, I converted the new dolly model with using float32 for the weights. The accuracy has greatly improved compared to the float16 linked above. However the model size grown 6 times the original (7gb -> 47 gb).

leroycep Apr 15, 2023

It looks like the dolly (and pythia/GPTNeoX) were trained using 16 bit, so if the accuracy is improving, it's likely that only the computations need to changed, and not the stored values.

12b config shows torch_dtype as bfloat16: https://huggingface.co/databricks/dolly-v2-12b/blob/main/config.json
GPTNeoX was trained using fp16: https://huggingface.co/docs/transformers/model_doc/gpt_neox
The pythia dataset uses fp16 precision: https://github.com/EleutherAI/pythia/blob/main/models/12B/pythia-12b.yml#L61-L69

I've started work on separating the GPTNeoX runner from the cformers library into a standalone executable. I'm using the Zig language, and I've tweaked the library to mmap the file, which makes the file load much faster. I haven't gotten evaluation working yet. There have been changes to GGML that break the cformers code for GPTNeoX, and I don't understand enough about GGML or GPTNeoX to fix them. I think I'll work through Karpathy's Neural Network course before trying to continue this effort.

ghost Apr 15, 2023

I think there'll be a lot more interest on these courses going forwards with all these new models coming out. The other applications didn't really do it for me but with LLMs I feel that I've been bitten by the AI bug...

ggerganov Apr 20, 2023
Maintainer

GPT-NeoX basic inference is now demonstrated in the ggml repo, via the new StableLM Alpha models:

https://github.com/ggerganov/ggml/tree/master/examples/stablelm

jon-chuang · 2023-04-14T11:16:38Z

jon-chuang
Apr 14, 2023

I will investigate this as I would like to see foundation models and their variants outside of llama be blessed with the gift of ggml.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Has anyone tried Dolly-like models? #569

{{title}}

Replies: 4 comments 31 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Has anyone tried Dolly-like models? #569

Replies: 4 comments · 31 replies

kha84 Mar 27, 2023 Author

ggerganov Apr 20, 2023 Maintainer

Replies: 4 comments 31 replies

kha84
Mar 27, 2023
Author

ggerganov Apr 20, 2023
Maintainer