-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Together Computer's Red Pajama 3B Base/Chat model #1337
Comments
Following #1333 (comment) maybe it's a good idea to start at ggml first |
+1 on this |
this should be another gptneox model. did someone look into running it? using https://github.com/ggerganov/ggml/tree/master/examples/stablelm |
Something is not right here. edit:
|
since they explicitly state pythia architecture, i tested pythia-70m-deduped, and that worked just fine.
|
The problem is that stableLM uses the gptneoX with use_parallel_residual=True (so that each block is x + mlp(x) + attn(x)) and RedPajama uses use_parallel_residual=False. I implemented it in https://github.com/amirza1/ggml and will submit a PR |
Together Computer already did it here: https://github.com/togethercomputer/redpajama.cpp/tree/support_redpajama |
I've also quantized for q4_0 -> q5_1 ggml the 3B red pajamas chat model https://huggingface.co/keldenl/RedPajama-INCITE-Chat-3B-v1-GGML, doing the rest and uploading them too soon |
Thanks for sharing the GGML models. I tried to deploy the q4_0 and q4_2 to AWS lambda, both models hitting the Edited: my bad, just got the following answer on the readme:
|
@limcheekin yup, your only options currently are this PR gets merged or you use the fork in the ggml repo or if u use my hacky gpt-llama.cpp (which leans on ggml repo, will make it less tacky tonight) anyways, here's RedPajamas 3B instruct model: https://huggingface.co/keldenl/RedPajama-INCITE-Instruct-3B-v1-GGML/ (i'll be quantizing the 7B models tonight) |
ggml for instruct 7b uploaded: https://huggingface.co/keldenl/RedPajama-INCITE-Instruct-7B-v0.1-GGML |
It looks like the most up to date stuff is here: https://github.com/togethercomputer/redpajama.cpp/tree/support_redpajama/examples/redpajama. However it needs to pull in the upstream quantization changes, so it's a bit of a mess right now. @ggerganov @slaren @Green-Sky based on other discussions I've seen scattered across other issues/PRs, what might be best is to consolidate ggml and llama.cpp into a common low level llm repo that can support all these different models. My C/C++ is so bad that I'm not gonna be much help there but happy to help in any other way to make this happen/share info. |
somewhat yea. but i feel like either more llm stuff is going to land in the ggml repo, or a separate ggml repo will be needed.
if you use the gpt-neox code example from the ggml repo, not yet since @ggerganov did not yet backport the ggml changes from llama.cpp. |
actually @ekryski you're right, i need to recompile with gpt-neox – i just started the work there I've uploaded 3B Chat for q4_0 & q5_1 https://huggingface.co/keldenl/RedPajama-INCITE-Chat-3B-v1-GGML (deleted the old ones) Uploading 3B Instruct right now and I'll do 7B tmr |
Here's 3B Instruct: https://huggingface.co/keldenl/RedPajama-INCITE-Instruct-3B-v1-GGML Going to just post q4_0 and q5_1, unless people are really really eager for the other quantized methods |
Just backported the Also, updated the https://github.com/ggerganov/ggml/tree/master/examples/gpt-neox If using a quantized model, make sure it is quantized using the latest |
gpt-neox and llama are different architectures. you need to look here https://github.com/ggerganov/ggml/tree/master/examples/gpt-neox |
7B now available https://www.together.xyz/blog/redpajama-7b |
I moved my original comment from another PR because it was out of scope and I only posted it there because I was exhausted at that moment. I'm posting here because it's in scope of this thread. @strutive07 was kind enough to point out my mistake and point me in the right direction. Below is a modified comment to align it with this thread. I get 21:46:57 | ~/Valerie/llama.cpp
(.venv) git:(HEAD | Δ) λ python convert-gptneox-hf-to-gguf.py mods/togethercomputer/RedPajama-INCITE-Chat-3B-v1 1
gguf: loading model RedPajama-INCITE-Chat-3B-v1
This gguf file is for Little Endian only
gguf: get model metadata
gguf: get tokenizer metadata
gguf: get gpt2 tokenizer vocab
gguf: Adding 50009 merge(s).
gguf: Setting special token type bos to 0
gguf: Setting special token type eos to 0
gguf: Setting special token type unk to 0
gguf: get tensor metadata
gguf: loading model part 'pytorch_model.bin'
token_embd.weight, n_dims = 2, torch.float16 --> float16
blk.0.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.0.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.0.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.0.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.0.attn_qkv.weight, n_dims = 2, torch.float16 --> float16
blk.0.attn_qkv.bias, n_dims = 1, torch.float16 --> float32
blk.0.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.0.attn_output.bias, n_dims = 1, torch.float16 --> float32
blk.0.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.0.ffn_up.bias, n_dims = 1, torch.float16 --> float32
blk.0.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.0.ffn_down.bias, n_dims = 1, torch.float16 --> float32
blk.1.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.1.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.1.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.1.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.1.attn_qkv.weight, n_dims = 2, torch.float16 --> float16
blk.1.attn_qkv.bias, n_dims = 1, torch.float16 --> float32
blk.1.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.1.attn_output.bias, n_dims = 1, torch.float16 --> float32
blk.1.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.1.ffn_up.bias, n_dims = 1, torch.float16 --> float32
blk.1.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.1.ffn_down.bias, n_dims = 1, torch.float16 --> float32
#
# omitted for brevity...
#
blk.31.ffn_down.bias, n_dims = 1, torch.float16 --> float32
output_norm.weight, n_dims = 1, torch.float16 --> float32
output_norm.bias, n_dims = 1, torch.float16 --> float32
output.weight, n_dims = 2, torch.float16 --> float16
gguf: write header
gguf: write metadata
gguf: write tensors
gguf: model successfully exported to 'mods/togethercomputer/RedPajama-INCITE-Chat-3B-v1/ggml-model-f16.gguf' I can convert it because import torch
from transformers import AutoTokenizer, GPTNeoXForCausalLM, TextStreamer
tok = AutoTokenizer.from_pretrained(
"mods/togethercomputer/RedPajama-INCITE-Chat-3B-v1",
local_files_only=True,
)
model = GPTNeoXForCausalLM.from_pretrained(
"mods/togethercomputer/RedPajama-INCITE-Chat-3B-v1",
torch_dtype=torch.bfloat16,
local_files_only=True,
)
model.to("cpu")
inputs = tok(
[
"My name is Red and I am a helpful assistant.\n"
"<human>: What is your name?\n"
"<bot>: My name is Red and I am a helpful assistant.\n"
"<human>: What can you do?\n"
"<bot>: I can assist you with various tasks, including providing helpful responses for certain queries.\n"
"<human>: How can you assist me?\n"
"<bot>: As a helpful assistant, I can assist you in your programming projects by:\n\n1. Providing suggestions and ideas for your project\n2. Helping you brainstorm and problem-solve\n3. Offering language syntax corrections and code optimization tips\n4. Assisting with debugging and troubleshooting\n5. Generating code snippets and examples based on your requirements\n6. Answering questions about programming concepts and best practices\n7. Providing information on various programming languages and frameworks\n8. Helping you stay up-to-date with the latest programming trends and technologies\n9. Offering tips and resources for improving your coding skills and productivity\n10. Any other way I can assist you in your programming projects, feel free to ask!\n\nPlease let me know if there's anything specific you need help with.\n"
"<human>: What else can you do?\n"
"<bot>:"
],
return_tensors="pt",
)
streamer = TextStreamer(tok)
# Configure additional options for generation
_ = model.generate(
**inputs,
streamer=streamer,
max_new_tokens=512,
repetition_penalty=1.8,
no_repeat_ngram_size=3,
temperature=0.7,
do_sample=True,
) At this point, it's simple to quantize from here. 21:47:22 | ~/Valerie/llama.cpp
(.venv) git:(HEAD | Δ) λ ./quantize mods/togethercomputer/RedPajama-INCITE-Chat-3B-v1/ggml-model-f16.gguf mods/togethercomputer/RedPajama-INCITE-Chat-3B-v1/ggml-model-q4_0.gguf q4_0 16
main: build = 1441 (ff3bad8)
main: built with cc (GCC) 13.2.1 20230801 for x86_64-pc-linux-gnu
main: quantizing 'mods/togethercomputer/RedPajama-INCITE-Chat-3B-v1/ggml-model-f16.gguf' to 'mods/togethercomputer/RedPajama-INCITE-Chat-3B-v1/ggml-model-q4_0.gguf' as Q4_0 using 16 threads
llama_model_loader: loaded meta data with 17 key-value pairs and 388 tensors from mods/togethercomputer/RedPajama-INCITE-Chat-3B-v1/ggml-model-f16.gguf (version GGUF V3 (latest))
llama_model_loader: - tensor 0: token_embd.weight f16 [ 2560, 50432, 1, 1 ]
llama_model_loader: - tensor 1: blk.0.attn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 2: blk.0.attn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 3: blk.0.ffn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 4: blk.0.ffn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 5: blk.0.attn_qkv.weight f16 [ 2560, 7680, 1, 1 ]
llama_model_loader: - tensor 6: blk.0.attn_qkv.bias f32 [ 7680, 1, 1, 1 ]
llama_model_loader: - tensor 7: blk.0.attn_output.weight f16 [ 2560, 2560, 1, 1 ]
llama_model_loader: - tensor 8: blk.0.attn_output.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 9: blk.0.ffn_up.weight f16 [ 2560, 10240, 1, 1 ]
llama_model_loader: - tensor 10: blk.0.ffn_up.bias f32 [ 10240, 1, 1, 1 ]
llama_model_loader: - tensor 11: blk.0.ffn_down.weight f16 [ 10240, 2560, 1, 1 ]
llama_model_loader: - tensor 12: blk.0.ffn_down.bias f32 [ 2560, 1, 1, 1 ]
#
# omitted for brevity
#
llama_model_loader: - tensor 373: blk.31.attn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 374: blk.31.attn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 375: blk.31.ffn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 376: blk.31.ffn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 377: blk.31.attn_qkv.weight f16 [ 2560, 7680, 1, 1 ]
llama_model_loader: - tensor 378: blk.31.attn_qkv.bias f32 [ 7680, 1, 1, 1 ]
llama_model_loader: - tensor 379: blk.31.attn_output.weight f16 [ 2560, 2560, 1, 1 ]
llama_model_loader: - tensor 380: blk.31.attn_output.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 381: blk.31.ffn_up.weight f16 [ 2560, 10240, 1, 1 ]
llama_model_loader: - tensor 382: blk.31.ffn_up.bias f32 [ 10240, 1, 1, 1 ]
llama_model_loader: - tensor 383: blk.31.ffn_down.weight f16 [ 10240, 2560, 1, 1 ]
llama_model_loader: - tensor 384: blk.31.ffn_down.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 385: output_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 386: output_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 387: output.weight f16 [ 2560, 50432, 1, 1 ]
llama_model_loader: - kv 0: general.architecture str
llama_model_loader: - kv 1: general.name str
llama_model_loader: - kv 2: gptneox.context_length u32
llama_model_loader: - kv 3: gptneox.embedding_length u32
llama_model_loader: - kv 4: gptneox.block_count u32
llama_model_loader: - kv 5: gptneox.feed_forward_length u32
llama_model_loader: - kv 6: gptneox.rope.dimension_count u32
llama_model_loader: - kv 7: gptneox.attention.head_count u32
llama_model_loader: - kv 8: gptneox.use_parallel_residual bool
llama_model_loader: - kv 9: gptneox.attention.layer_norm_epsilon f32
llama_model_loader: - kv 10: tokenizer.ggml.model str
llama_model_loader: - kv 11: tokenizer.ggml.tokens arr
llama_model_loader: - kv 12: tokenizer.ggml.token_type arr
llama_model_loader: - kv 13: tokenizer.ggml.merges arr
llama_model_loader: - kv 14: tokenizer.ggml.bos_token_id u32
llama_model_loader: - kv 15: tokenizer.ggml.eos_token_id u32
llama_model_loader: - kv 16: tokenizer.ggml.unknown_token_id u32
llama_model_loader: - type f32: 258 tensors
llama_model_loader: - type f16: 130 tensors
llama_model_quantize_internal: meta size = 1793120 bytes
[ 1/ 388] token_embd.weight - [ 2560, 50432, 1, 1], type = f16, quantizing to q4_0 .. size = 246.25 MB -> 69.26 MB | hist: 0.037 0.016 0.025 0.039 0.057 0.077 0.096 0.111 0.116 0.111 0.096 0.077 0.057 0.039 0.026 0.021
[ 2/ 388] blk.0.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MB
[ 3/ 388] blk.0.attn_norm.bias - [ 2560, 1, 1, 1], type = f32, size = 0.010 MB
[ 4/ 388] blk.0.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MB
[ 5/ 388] blk.0.ffn_norm.bias - [ 2560, 1, 1, 1], type = f32, size = 0.010 MB
[ 6/ 388] blk.0.attn_qkv.weight - [ 2560, 7680, 1, 1], type = f16, quantizing to q4_0 .. size = 37.50 MB -> 10.55 MB | hist: 0.036 0.015 0.024 0.038 0.055 0.076 0.097 0.114 0.121 0.114 0.097 0.076 0.055 0.038 0.024 0.020
[ 7/ 388] blk.0.attn_qkv.bias - [ 7680, 1, 1, 1], type = f32, size = 0.029 MB
[ 8/ 388] blk.0.attn_output.weight - [ 2560, 2560, 1, 1], type = f16, quantizing to q4_0 .. size = 12.50 MB -> 3.52 MB | hist: 0.036 0.013 0.021 0.033 0.051 0.074 0.099 0.122 0.132 0.122 0.099 0.074 0.051 0.033 0.021 0.017
[ 9/ 388] blk.0.attn_output.bias - [ 2560, 1, 1, 1], type = f32, size = 0.010 MB
[ 10/ 388] blk.0.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, quantizing to q4_0 .. size = 50.00 MB -> 14.06 MB | hist: 0.037 0.016 0.025 0.039 0.057 0.077 0.096 0.111 0.116 0.111 0.096 0.077 0.057 0.039 0.025 0.021
[ 11/ 388] blk.0.ffn_up.bias - [10240, 1, 1, 1], type = f32, size = 0.039 MB
[ 12/ 388] blk.0.ffn_down.weight - [10240, 2560, 1, 1], type = f16, quantizing to q4_0 .. size = 50.00 MB -> 14.06 MB | hist: 0.036 0.016 0.025 0.039 0.056 0.077 0.097 0.111 0.117 0.111 0.097 0.077 0.056 0.039 0.025 0.021
[ 13/ 388] blk.0.ffn_down.bias - [ 2560, 1, 1, 1], type = f32, size = 0.010 MB
#
# omitted for brevity
#
[ 373/ 388] blk.30.ffn_down.bias - [ 2560, 1, 1, 1], type = f32, size = 0.010 MB
[ 374/ 388] blk.31.attn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MB
[ 375/ 388] blk.31.attn_norm.bias - [ 2560, 1, 1, 1], type = f32, size = 0.010 MB
[ 376/ 388] blk.31.ffn_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MB
[ 377/ 388] blk.31.ffn_norm.bias - [ 2560, 1, 1, 1], type = f32, size = 0.010 MB
[ 378/ 388] blk.31.attn_qkv.weight - [ 2560, 7680, 1, 1], type = f16, quantizing to q4_0 .. size = 37.50 MB -> 10.55 MB | hist: 0.036 0.015 0.024 0.038 0.055 0.076 0.097 0.113 0.120 0.113 0.097 0.076 0.055 0.038 0.025 0.020
[ 379/ 388] blk.31.attn_qkv.bias - [ 7680, 1, 1, 1], type = f32, size = 0.029 MB
[ 380/ 388] blk.31.attn_output.weight - [ 2560, 2560, 1, 1], type = f16, quantizing to q4_0 .. size = 12.50 MB -> 3.52 MB | hist: 0.036 0.015 0.025 0.038 0.056 0.076 0.097 0.113 0.119 0.112 0.097 0.076 0.056 0.038 0.025 0.021
[ 381/ 388] blk.31.attn_output.bias - [ 2560, 1, 1, 1], type = f32, size = 0.010 MB
[ 382/ 388] blk.31.ffn_up.weight - [ 2560, 10240, 1, 1], type = f16, quantizing to q4_0 .. size = 50.00 MB -> 14.06 MB | hist: 0.036 0.016 0.025 0.039 0.057 0.077 0.097 0.111 0.117 0.111 0.097 0.077 0.056 0.039 0.025 0.021
[ 383/ 388] blk.31.ffn_up.bias - [10240, 1, 1, 1], type = f32, size = 0.039 MB
[ 384/ 388] blk.31.ffn_down.weight - [10240, 2560, 1, 1], type = f16, quantizing to q4_0 .. size = 50.00 MB -> 14.06 MB | hist: 0.036 0.014 0.022 0.035 0.052 0.074 0.099 0.120 0.129 0.120 0.099 0.074 0.052 0.035 0.022 0.018
[ 385/ 388] blk.31.ffn_down.bias - [ 2560, 1, 1, 1], type = f32, size = 0.010 MB
[ 386/ 388] output_norm.weight - [ 2560, 1, 1, 1], type = f32, size = 0.010 MB
[ 387/ 388] output_norm.bias - [ 2560, 1, 1, 1], type = f32, size = 0.010 MB
[ 388/ 388] output.weight - [ 2560, 50432, 1, 1], type = f16, quantizing to q6_K .. size = 246.25 MB -> 101.00 MB | hist:
llama_model_quantize_internal: model size = 5296.58 MB
llama_model_quantize_internal: quant size = 1524.34 MB
llama_model_quantize_internal: hist: 0.036 0.015 0.025 0.038 0.056 0.077 0.097 0.112 0.118 0.112 0.097 0.077 0.056 0.038 0.025 0.021
main: quantize time = 4940.46 ms
main: total time = 4940.46 ms Then I attempt to run the model at this point. 21:54:48 | ~/Valerie/llama.cpp
(.venv) git:(HEAD | Δ) λ ./main -f prompts/redpajama.txt -m mods/togethercomputer/RedPajama-INCITE-Chat-3B-v1/ggml-model-q4_0.gguf --color -e -i --multiline-input -s 1337 --in-prefix "<human>:" --in-suffix "<bot>:" --verbose-prompt
Log start
main: build = 1441 (ff3bad8)
main: built with cc (GCC) 13.2.1 20230801 for x86_64-pc-linux-gnu
main: seed = 1337
llama_model_loader: loaded meta data with 19 key-value pairs and 388 tensors from mods/togethercomputer/RedPajama-INCITE-Chat-3B-v1/ggml-model-q4_0.gguf (version GGUF V3 (latest))
llama_model_loader: - tensor 0: token_embd.weight q4_0 [ 2560, 50432, 1, 1 ]
llama_model_loader: - tensor 1: blk.0.attn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 2: blk.0.attn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 3: blk.0.ffn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 4: blk.0.ffn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 5: blk.0.attn_qkv.weight q4_0 [ 2560, 7680, 1, 1 ]
llama_model_loader: - tensor 6: blk.0.attn_qkv.bias f32 [ 7680, 1, 1, 1 ]
llama_model_loader: - tensor 7: blk.0.attn_output.weight q4_0 [ 2560, 2560, 1, 1 ]
llama_model_loader: - tensor 8: blk.0.attn_output.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 9: blk.0.ffn_up.weight q4_0 [ 2560, 10240, 1, 1 ]
llama_model_loader: - tensor 10: blk.0.ffn_up.bias f32 [ 10240, 1, 1, 1 ]
llama_model_loader: - tensor 11: blk.0.ffn_down.weight q4_0 [ 10240, 2560, 1, 1 ]
llama_model_loader: - tensor 12: blk.0.ffn_down.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 13: blk.1.attn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 14: blk.1.attn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 15: blk.1.ffn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 16: blk.1.ffn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 17: blk.1.attn_qkv.weight q4_0 [ 2560, 7680, 1, 1 ]
llama_model_loader: - tensor 18: blk.1.attn_qkv.bias f32 [ 7680, 1, 1, 1 ]
llama_model_loader: - tensor 19: blk.1.attn_output.weight q4_0 [ 2560, 2560, 1, 1 ]
llama_model_loader: - tensor 20: blk.1.attn_output.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 21: blk.1.ffn_up.weight q4_0 [ 2560, 10240, 1, 1 ]
llama_model_loader: - tensor 22: blk.1.ffn_up.bias f32 [ 10240, 1, 1, 1 ]
llama_model_loader: - tensor 23: blk.1.ffn_down.weight q4_0 [ 10240, 2560, 1, 1 ]
llama_model_loader: - tensor 24: blk.1.ffn_down.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 25: blk.2.attn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 26: blk.2.attn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 27: blk.2.ffn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 28: blk.2.ffn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 29: blk.2.attn_qkv.weight q4_0 [ 2560, 7680, 1, 1 ]
llama_model_loader: - tensor 30: blk.2.attn_qkv.bias f32 [ 7680, 1, 1, 1 ]
llama_model_loader: - tensor 31: blk.2.attn_output.weight q4_0 [ 2560, 2560, 1, 1 ]
llama_model_loader: - tensor 32: blk.2.attn_output.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 33: blk.2.ffn_up.weight q4_0 [ 2560, 10240, 1, 1 ]
llama_model_loader: - tensor 34: blk.2.ffn_up.bias f32 [ 10240, 1, 1, 1 ]
llama_model_loader: - tensor 35: blk.2.ffn_down.weight q4_0 [ 10240, 2560, 1, 1 ]
llama_model_loader: - tensor 36: blk.2.ffn_down.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 37: blk.3.attn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 38: blk.3.attn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 39: blk.3.ffn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 40: blk.3.ffn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 41: blk.3.attn_qkv.weight q4_0 [ 2560, 7680, 1, 1 ]
llama_model_loader: - tensor 42: blk.3.attn_qkv.bias f32 [ 7680, 1, 1, 1 ]
llama_model_loader: - tensor 43: blk.3.attn_output.weight q4_0 [ 2560, 2560, 1, 1 ]
llama_model_loader: - tensor 44: blk.3.attn_output.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 45: blk.3.ffn_up.weight q4_0 [ 2560, 10240, 1, 1 ]
llama_model_loader: - tensor 46: blk.3.ffn_up.bias f32 [ 10240, 1, 1, 1 ]
llama_model_loader: - tensor 47: blk.3.ffn_down.weight q4_0 [ 10240, 2560, 1, 1 ]
llama_model_loader: - tensor 48: blk.3.ffn_down.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 49: blk.4.attn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 50: blk.4.attn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 51: blk.4.ffn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 52: blk.4.ffn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 53: blk.4.attn_qkv.weight q4_0 [ 2560, 7680, 1, 1 ]
llama_model_loader: - tensor 54: blk.4.attn_qkv.bias f32 [ 7680, 1, 1, 1 ]
llama_model_loader: - tensor 55: blk.4.attn_output.weight q4_0 [ 2560, 2560, 1, 1 ]
llama_model_loader: - tensor 56: blk.4.attn_output.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 57: blk.4.ffn_up.weight q4_0 [ 2560, 10240, 1, 1 ]
llama_model_loader: - tensor 58: blk.4.ffn_up.bias f32 [ 10240, 1, 1, 1 ]
llama_model_loader: - tensor 59: blk.4.ffn_down.weight q4_0 [ 10240, 2560, 1, 1 ]
llama_model_loader: - tensor 60: blk.4.ffn_down.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 61: blk.5.attn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 62: blk.5.attn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 63: blk.5.ffn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 64: blk.5.ffn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 65: blk.5.attn_qkv.weight q4_0 [ 2560, 7680, 1, 1 ]
llama_model_loader: - tensor 66: blk.5.attn_qkv.bias f32 [ 7680, 1, 1, 1 ]
llama_model_loader: - tensor 67: blk.5.attn_output.weight q4_0 [ 2560, 2560, 1, 1 ]
llama_model_loader: - tensor 68: blk.5.attn_output.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 69: blk.5.ffn_up.weight q4_0 [ 2560, 10240, 1, 1 ]
llama_model_loader: - tensor 70: blk.5.ffn_up.bias f32 [ 10240, 1, 1, 1 ]
llama_model_loader: - tensor 71: blk.5.ffn_down.weight q4_0 [ 10240, 2560, 1, 1 ]
llama_model_loader: - tensor 72: blk.5.ffn_down.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 73: blk.6.attn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 74: blk.6.attn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 75: blk.6.ffn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 76: blk.6.ffn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 77: blk.6.attn_qkv.weight q4_0 [ 2560, 7680, 1, 1 ]
llama_model_loader: - tensor 78: blk.6.attn_qkv.bias f32 [ 7680, 1, 1, 1 ]
llama_model_loader: - tensor 79: blk.6.attn_output.weight q4_0 [ 2560, 2560, 1, 1 ]
llama_model_loader: - tensor 80: blk.6.attn_output.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 81: blk.6.ffn_up.weight q4_0 [ 2560, 10240, 1, 1 ]
llama_model_loader: - tensor 82: blk.6.ffn_up.bias f32 [ 10240, 1, 1, 1 ]
llama_model_loader: - tensor 83: blk.6.ffn_down.weight q4_0 [ 10240, 2560, 1, 1 ]
llama_model_loader: - tensor 84: blk.6.ffn_down.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 85: blk.7.attn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 86: blk.7.attn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 87: blk.7.ffn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 88: blk.7.ffn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 89: blk.7.attn_qkv.weight q4_0 [ 2560, 7680, 1, 1 ]
llama_model_loader: - tensor 90: blk.7.attn_qkv.bias f32 [ 7680, 1, 1, 1 ]
llama_model_loader: - tensor 91: blk.7.attn_output.weight q4_0 [ 2560, 2560, 1, 1 ]
llama_model_loader: - tensor 92: blk.7.attn_output.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 93: blk.7.ffn_up.weight q4_0 [ 2560, 10240, 1, 1 ]
llama_model_loader: - tensor 94: blk.7.ffn_up.bias f32 [ 10240, 1, 1, 1 ]
llama_model_loader: - tensor 95: blk.7.ffn_down.weight q4_0 [ 10240, 2560, 1, 1 ]
llama_model_loader: - tensor 96: blk.7.ffn_down.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 97: blk.8.attn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 98: blk.8.attn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 99: blk.8.ffn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 100: blk.8.ffn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 101: blk.8.attn_qkv.weight q4_0 [ 2560, 7680, 1, 1 ]
llama_model_loader: - tensor 102: blk.8.attn_qkv.bias f32 [ 7680, 1, 1, 1 ]
llama_model_loader: - tensor 103: blk.8.attn_output.weight q4_0 [ 2560, 2560, 1, 1 ]
llama_model_loader: - tensor 104: blk.8.attn_output.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 105: blk.8.ffn_up.weight q4_0 [ 2560, 10240, 1, 1 ]
llama_model_loader: - tensor 106: blk.8.ffn_up.bias f32 [ 10240, 1, 1, 1 ]
llama_model_loader: - tensor 107: blk.8.ffn_down.weight q4_0 [ 10240, 2560, 1, 1 ]
llama_model_loader: - tensor 108: blk.8.ffn_down.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 109: blk.9.attn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 110: blk.9.attn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 111: blk.9.ffn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 112: blk.9.ffn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 113: blk.9.attn_qkv.weight q4_0 [ 2560, 7680, 1, 1 ]
llama_model_loader: - tensor 114: blk.9.attn_qkv.bias f32 [ 7680, 1, 1, 1 ]
llama_model_loader: - tensor 115: blk.9.attn_output.weight q4_0 [ 2560, 2560, 1, 1 ]
llama_model_loader: - tensor 116: blk.9.attn_output.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 117: blk.9.ffn_up.weight q4_0 [ 2560, 10240, 1, 1 ]
llama_model_loader: - tensor 118: blk.9.ffn_up.bias f32 [ 10240, 1, 1, 1 ]
llama_model_loader: - tensor 119: blk.9.ffn_down.weight q4_0 [ 10240, 2560, 1, 1 ]
llama_model_loader: - tensor 120: blk.9.ffn_down.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 121: blk.10.attn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 122: blk.10.attn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 123: blk.10.ffn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 124: blk.10.ffn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 125: blk.10.attn_qkv.weight q4_0 [ 2560, 7680, 1, 1 ]
llama_model_loader: - tensor 126: blk.10.attn_qkv.bias f32 [ 7680, 1, 1, 1 ]
llama_model_loader: - tensor 127: blk.10.attn_output.weight q4_0 [ 2560, 2560, 1, 1 ]
llama_model_loader: - tensor 128: blk.10.attn_output.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 129: blk.10.ffn_up.weight q4_0 [ 2560, 10240, 1, 1 ]
llama_model_loader: - tensor 130: blk.10.ffn_up.bias f32 [ 10240, 1, 1, 1 ]
llama_model_loader: - tensor 131: blk.10.ffn_down.weight q4_0 [ 10240, 2560, 1, 1 ]
llama_model_loader: - tensor 132: blk.10.ffn_down.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 133: blk.11.attn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 134: blk.11.attn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 135: blk.11.ffn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 136: blk.11.ffn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 137: blk.11.attn_qkv.weight q4_0 [ 2560, 7680, 1, 1 ]
llama_model_loader: - tensor 138: blk.11.attn_qkv.bias f32 [ 7680, 1, 1, 1 ]
llama_model_loader: - tensor 139: blk.11.attn_output.weight q4_0 [ 2560, 2560, 1, 1 ]
llama_model_loader: - tensor 140: blk.11.attn_output.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 141: blk.11.ffn_up.weight q4_0 [ 2560, 10240, 1, 1 ]
llama_model_loader: - tensor 142: blk.11.ffn_up.bias f32 [ 10240, 1, 1, 1 ]
llama_model_loader: - tensor 143: blk.11.ffn_down.weight q4_0 [ 10240, 2560, 1, 1 ]
llama_model_loader: - tensor 144: blk.11.ffn_down.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 145: blk.12.attn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 146: blk.12.attn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 147: blk.12.ffn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 148: blk.12.ffn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 149: blk.12.attn_qkv.weight q4_0 [ 2560, 7680, 1, 1 ]
llama_model_loader: - tensor 150: blk.12.attn_qkv.bias f32 [ 7680, 1, 1, 1 ]
llama_model_loader: - tensor 151: blk.12.attn_output.weight q4_0 [ 2560, 2560, 1, 1 ]
llama_model_loader: - tensor 152: blk.12.attn_output.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 153: blk.12.ffn_up.weight q4_0 [ 2560, 10240, 1, 1 ]
llama_model_loader: - tensor 154: blk.12.ffn_up.bias f32 [ 10240, 1, 1, 1 ]
llama_model_loader: - tensor 155: blk.12.ffn_down.weight q4_0 [ 10240, 2560, 1, 1 ]
llama_model_loader: - tensor 156: blk.12.ffn_down.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 157: blk.13.attn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 158: blk.13.attn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 159: blk.13.ffn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 160: blk.13.ffn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 161: blk.13.attn_qkv.weight q4_0 [ 2560, 7680, 1, 1 ]
llama_model_loader: - tensor 162: blk.13.attn_qkv.bias f32 [ 7680, 1, 1, 1 ]
llama_model_loader: - tensor 163: blk.13.attn_output.weight q4_0 [ 2560, 2560, 1, 1 ]
llama_model_loader: - tensor 164: blk.13.attn_output.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 165: blk.13.ffn_up.weight q4_0 [ 2560, 10240, 1, 1 ]
llama_model_loader: - tensor 166: blk.13.ffn_up.bias f32 [ 10240, 1, 1, 1 ]
llama_model_loader: - tensor 167: blk.13.ffn_down.weight q4_0 [ 10240, 2560, 1, 1 ]
llama_model_loader: - tensor 168: blk.13.ffn_down.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 169: blk.14.attn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 170: blk.14.attn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 171: blk.14.ffn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 172: blk.14.ffn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 173: blk.14.attn_qkv.weight q4_0 [ 2560, 7680, 1, 1 ]
llama_model_loader: - tensor 174: blk.14.attn_qkv.bias f32 [ 7680, 1, 1, 1 ]
llama_model_loader: - tensor 175: blk.14.attn_output.weight q4_0 [ 2560, 2560, 1, 1 ]
llama_model_loader: - tensor 176: blk.14.attn_output.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 177: blk.14.ffn_up.weight q4_0 [ 2560, 10240, 1, 1 ]
llama_model_loader: - tensor 178: blk.14.ffn_up.bias f32 [ 10240, 1, 1, 1 ]
llama_model_loader: - tensor 179: blk.14.ffn_down.weight q4_0 [ 10240, 2560, 1, 1 ]
llama_model_loader: - tensor 180: blk.14.ffn_down.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 181: blk.15.attn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 182: blk.15.attn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 183: blk.15.ffn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 184: blk.15.ffn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 185: blk.15.attn_qkv.weight q4_0 [ 2560, 7680, 1, 1 ]
llama_model_loader: - tensor 186: blk.15.attn_qkv.bias f32 [ 7680, 1, 1, 1 ]
llama_model_loader: - tensor 187: blk.15.attn_output.weight q4_0 [ 2560, 2560, 1, 1 ]
llama_model_loader: - tensor 188: blk.15.attn_output.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 189: blk.15.ffn_up.weight q4_0 [ 2560, 10240, 1, 1 ]
llama_model_loader: - tensor 190: blk.15.ffn_up.bias f32 [ 10240, 1, 1, 1 ]
llama_model_loader: - tensor 191: blk.15.ffn_down.weight q4_0 [ 10240, 2560, 1, 1 ]
llama_model_loader: - tensor 192: blk.15.ffn_down.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 193: blk.16.attn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 194: blk.16.attn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 195: blk.16.ffn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 196: blk.16.ffn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 197: blk.16.attn_qkv.weight q4_0 [ 2560, 7680, 1, 1 ]
llama_model_loader: - tensor 198: blk.16.attn_qkv.bias f32 [ 7680, 1, 1, 1 ]
llama_model_loader: - tensor 199: blk.16.attn_output.weight q4_0 [ 2560, 2560, 1, 1 ]
llama_model_loader: - tensor 200: blk.16.attn_output.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 201: blk.16.ffn_up.weight q4_0 [ 2560, 10240, 1, 1 ]
llama_model_loader: - tensor 202: blk.16.ffn_up.bias f32 [ 10240, 1, 1, 1 ]
llama_model_loader: - tensor 203: blk.16.ffn_down.weight q4_0 [ 10240, 2560, 1, 1 ]
llama_model_loader: - tensor 204: blk.16.ffn_down.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 205: blk.17.attn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 206: blk.17.attn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 207: blk.17.ffn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 208: blk.17.ffn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 209: blk.17.attn_qkv.weight q4_0 [ 2560, 7680, 1, 1 ]
llama_model_loader: - tensor 210: blk.17.attn_qkv.bias f32 [ 7680, 1, 1, 1 ]
llama_model_loader: - tensor 211: blk.17.attn_output.weight q4_0 [ 2560, 2560, 1, 1 ]
llama_model_loader: - tensor 212: blk.17.attn_output.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 213: blk.17.ffn_up.weight q4_0 [ 2560, 10240, 1, 1 ]
llama_model_loader: - tensor 214: blk.17.ffn_up.bias f32 [ 10240, 1, 1, 1 ]
llama_model_loader: - tensor 215: blk.17.ffn_down.weight q4_0 [ 10240, 2560, 1, 1 ]
llama_model_loader: - tensor 216: blk.17.ffn_down.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 217: blk.18.attn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 218: blk.18.attn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 219: blk.18.ffn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 220: blk.18.ffn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 221: blk.18.attn_qkv.weight q4_0 [ 2560, 7680, 1, 1 ]
llama_model_loader: - tensor 222: blk.18.attn_qkv.bias f32 [ 7680, 1, 1, 1 ]
llama_model_loader: - tensor 223: blk.18.attn_output.weight q4_0 [ 2560, 2560, 1, 1 ]
llama_model_loader: - tensor 224: blk.18.attn_output.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 225: blk.18.ffn_up.weight q4_0 [ 2560, 10240, 1, 1 ]
llama_model_loader: - tensor 226: blk.18.ffn_up.bias f32 [ 10240, 1, 1, 1 ]
llama_model_loader: - tensor 227: blk.18.ffn_down.weight q4_0 [ 10240, 2560, 1, 1 ]
llama_model_loader: - tensor 228: blk.18.ffn_down.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 229: blk.19.attn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 230: blk.19.attn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 231: blk.19.ffn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 232: blk.19.ffn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 233: blk.19.attn_qkv.weight q4_0 [ 2560, 7680, 1, 1 ]
llama_model_loader: - tensor 234: blk.19.attn_qkv.bias f32 [ 7680, 1, 1, 1 ]
llama_model_loader: - tensor 235: blk.19.attn_output.weight q4_0 [ 2560, 2560, 1, 1 ]
llama_model_loader: - tensor 236: blk.19.attn_output.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 237: blk.19.ffn_up.weight q4_0 [ 2560, 10240, 1, 1 ]
llama_model_loader: - tensor 238: blk.19.ffn_up.bias f32 [ 10240, 1, 1, 1 ]
llama_model_loader: - tensor 239: blk.19.ffn_down.weight q4_0 [ 10240, 2560, 1, 1 ]
llama_model_loader: - tensor 240: blk.19.ffn_down.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 241: blk.20.attn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 242: blk.20.attn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 243: blk.20.ffn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 244: blk.20.ffn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 245: blk.20.attn_qkv.weight q4_0 [ 2560, 7680, 1, 1 ]
llama_model_loader: - tensor 246: blk.20.attn_qkv.bias f32 [ 7680, 1, 1, 1 ]
llama_model_loader: - tensor 247: blk.20.attn_output.weight q4_0 [ 2560, 2560, 1, 1 ]
llama_model_loader: - tensor 248: blk.20.attn_output.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 249: blk.20.ffn_up.weight q4_0 [ 2560, 10240, 1, 1 ]
llama_model_loader: - tensor 250: blk.20.ffn_up.bias f32 [ 10240, 1, 1, 1 ]
llama_model_loader: - tensor 251: blk.20.ffn_down.weight q4_0 [ 10240, 2560, 1, 1 ]
llama_model_loader: - tensor 252: blk.20.ffn_down.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 253: blk.21.attn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 254: blk.21.attn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 255: blk.21.ffn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 256: blk.21.ffn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 257: blk.21.attn_qkv.weight q4_0 [ 2560, 7680, 1, 1 ]
llama_model_loader: - tensor 258: blk.21.attn_qkv.bias f32 [ 7680, 1, 1, 1 ]
llama_model_loader: - tensor 259: blk.21.attn_output.weight q4_0 [ 2560, 2560, 1, 1 ]
llama_model_loader: - tensor 260: blk.21.attn_output.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 261: blk.21.ffn_up.weight q4_0 [ 2560, 10240, 1, 1 ]
llama_model_loader: - tensor 262: blk.21.ffn_up.bias f32 [ 10240, 1, 1, 1 ]
llama_model_loader: - tensor 263: blk.21.ffn_down.weight q4_0 [ 10240, 2560, 1, 1 ]
llama_model_loader: - tensor 264: blk.21.ffn_down.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 265: blk.22.attn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 266: blk.22.attn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 267: blk.22.ffn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 268: blk.22.ffn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 269: blk.22.attn_qkv.weight q4_0 [ 2560, 7680, 1, 1 ]
llama_model_loader: - tensor 270: blk.22.attn_qkv.bias f32 [ 7680, 1, 1, 1 ]
llama_model_loader: - tensor 271: blk.22.attn_output.weight q4_0 [ 2560, 2560, 1, 1 ]
llama_model_loader: - tensor 272: blk.22.attn_output.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 273: blk.22.ffn_up.weight q4_0 [ 2560, 10240, 1, 1 ]
llama_model_loader: - tensor 274: blk.22.ffn_up.bias f32 [ 10240, 1, 1, 1 ]
llama_model_loader: - tensor 275: blk.22.ffn_down.weight q4_0 [ 10240, 2560, 1, 1 ]
llama_model_loader: - tensor 276: blk.22.ffn_down.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 277: blk.23.attn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 278: blk.23.attn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 279: blk.23.ffn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 280: blk.23.ffn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 281: blk.23.attn_qkv.weight q4_0 [ 2560, 7680, 1, 1 ]
llama_model_loader: - tensor 282: blk.23.attn_qkv.bias f32 [ 7680, 1, 1, 1 ]
llama_model_loader: - tensor 283: blk.23.attn_output.weight q4_0 [ 2560, 2560, 1, 1 ]
llama_model_loader: - tensor 284: blk.23.attn_output.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 285: blk.23.ffn_up.weight q4_0 [ 2560, 10240, 1, 1 ]
llama_model_loader: - tensor 286: blk.23.ffn_up.bias f32 [ 10240, 1, 1, 1 ]
llama_model_loader: - tensor 287: blk.23.ffn_down.weight q4_0 [ 10240, 2560, 1, 1 ]
llama_model_loader: - tensor 288: blk.23.ffn_down.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 289: blk.24.attn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 290: blk.24.attn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 291: blk.24.ffn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 292: blk.24.ffn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 293: blk.24.attn_qkv.weight q4_0 [ 2560, 7680, 1, 1 ]
llama_model_loader: - tensor 294: blk.24.attn_qkv.bias f32 [ 7680, 1, 1, 1 ]
llama_model_loader: - tensor 295: blk.24.attn_output.weight q4_0 [ 2560, 2560, 1, 1 ]
llama_model_loader: - tensor 296: blk.24.attn_output.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 297: blk.24.ffn_up.weight q4_0 [ 2560, 10240, 1, 1 ]
llama_model_loader: - tensor 298: blk.24.ffn_up.bias f32 [ 10240, 1, 1, 1 ]
llama_model_loader: - tensor 299: blk.24.ffn_down.weight q4_0 [ 10240, 2560, 1, 1 ]
llama_model_loader: - tensor 300: blk.24.ffn_down.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 301: blk.25.attn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 302: blk.25.attn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 303: blk.25.ffn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 304: blk.25.ffn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 305: blk.25.attn_qkv.weight q4_0 [ 2560, 7680, 1, 1 ]
llama_model_loader: - tensor 306: blk.25.attn_qkv.bias f32 [ 7680, 1, 1, 1 ]
llama_model_loader: - tensor 307: blk.25.attn_output.weight q4_0 [ 2560, 2560, 1, 1 ]
llama_model_loader: - tensor 308: blk.25.attn_output.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 309: blk.25.ffn_up.weight q4_0 [ 2560, 10240, 1, 1 ]
llama_model_loader: - tensor 310: blk.25.ffn_up.bias f32 [ 10240, 1, 1, 1 ]
llama_model_loader: - tensor 311: blk.25.ffn_down.weight q4_0 [ 10240, 2560, 1, 1 ]
llama_model_loader: - tensor 312: blk.25.ffn_down.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 313: blk.26.attn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 314: blk.26.attn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 315: blk.26.ffn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 316: blk.26.ffn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 317: blk.26.attn_qkv.weight q4_0 [ 2560, 7680, 1, 1 ]
llama_model_loader: - tensor 318: blk.26.attn_qkv.bias f32 [ 7680, 1, 1, 1 ]
llama_model_loader: - tensor 319: blk.26.attn_output.weight q4_0 [ 2560, 2560, 1, 1 ]
llama_model_loader: - tensor 320: blk.26.attn_output.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 321: blk.26.ffn_up.weight q4_0 [ 2560, 10240, 1, 1 ]
llama_model_loader: - tensor 322: blk.26.ffn_up.bias f32 [ 10240, 1, 1, 1 ]
llama_model_loader: - tensor 323: blk.26.ffn_down.weight q4_0 [ 10240, 2560, 1, 1 ]
llama_model_loader: - tensor 324: blk.26.ffn_down.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 325: blk.27.attn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 326: blk.27.attn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 327: blk.27.ffn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 328: blk.27.ffn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 329: blk.27.attn_qkv.weight q4_0 [ 2560, 7680, 1, 1 ]
llama_model_loader: - tensor 330: blk.27.attn_qkv.bias f32 [ 7680, 1, 1, 1 ]
llama_model_loader: - tensor 331: blk.27.attn_output.weight q4_0 [ 2560, 2560, 1, 1 ]
llama_model_loader: - tensor 332: blk.27.attn_output.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 333: blk.27.ffn_up.weight q4_0 [ 2560, 10240, 1, 1 ]
llama_model_loader: - tensor 334: blk.27.ffn_up.bias f32 [ 10240, 1, 1, 1 ]
llama_model_loader: - tensor 335: blk.27.ffn_down.weight q4_0 [ 10240, 2560, 1, 1 ]
llama_model_loader: - tensor 336: blk.27.ffn_down.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 337: blk.28.attn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 338: blk.28.attn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 339: blk.28.ffn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 340: blk.28.ffn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 341: blk.28.attn_qkv.weight q4_0 [ 2560, 7680, 1, 1 ]
llama_model_loader: - tensor 342: blk.28.attn_qkv.bias f32 [ 7680, 1, 1, 1 ]
llama_model_loader: - tensor 343: blk.28.attn_output.weight q4_0 [ 2560, 2560, 1, 1 ]
llama_model_loader: - tensor 344: blk.28.attn_output.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 345: blk.28.ffn_up.weight q4_0 [ 2560, 10240, 1, 1 ]
llama_model_loader: - tensor 346: blk.28.ffn_up.bias f32 [ 10240, 1, 1, 1 ]
llama_model_loader: - tensor 347: blk.28.ffn_down.weight q4_0 [ 10240, 2560, 1, 1 ]
llama_model_loader: - tensor 348: blk.28.ffn_down.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 349: blk.29.attn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 350: blk.29.attn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 351: blk.29.ffn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 352: blk.29.ffn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 353: blk.29.attn_qkv.weight q4_0 [ 2560, 7680, 1, 1 ]
llama_model_loader: - tensor 354: blk.29.attn_qkv.bias f32 [ 7680, 1, 1, 1 ]
llama_model_loader: - tensor 355: blk.29.attn_output.weight q4_0 [ 2560, 2560, 1, 1 ]
llama_model_loader: - tensor 356: blk.29.attn_output.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 357: blk.29.ffn_up.weight q4_0 [ 2560, 10240, 1, 1 ]
llama_model_loader: - tensor 358: blk.29.ffn_up.bias f32 [ 10240, 1, 1, 1 ]
llama_model_loader: - tensor 359: blk.29.ffn_down.weight q4_0 [ 10240, 2560, 1, 1 ]
llama_model_loader: - tensor 360: blk.29.ffn_down.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 361: blk.30.attn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 362: blk.30.attn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 363: blk.30.ffn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 364: blk.30.ffn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 365: blk.30.attn_qkv.weight q4_0 [ 2560, 7680, 1, 1 ]
llama_model_loader: - tensor 366: blk.30.attn_qkv.bias f32 [ 7680, 1, 1, 1 ]
llama_model_loader: - tensor 367: blk.30.attn_output.weight q4_0 [ 2560, 2560, 1, 1 ]
llama_model_loader: - tensor 368: blk.30.attn_output.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 369: blk.30.ffn_up.weight q4_0 [ 2560, 10240, 1, 1 ]
llama_model_loader: - tensor 370: blk.30.ffn_up.bias f32 [ 10240, 1, 1, 1 ]
llama_model_loader: - tensor 371: blk.30.ffn_down.weight q4_0 [ 10240, 2560, 1, 1 ]
llama_model_loader: - tensor 372: blk.30.ffn_down.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 373: blk.31.attn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 374: blk.31.attn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 375: blk.31.ffn_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 376: blk.31.ffn_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 377: blk.31.attn_qkv.weight q4_0 [ 2560, 7680, 1, 1 ]
llama_model_loader: - tensor 378: blk.31.attn_qkv.bias f32 [ 7680, 1, 1, 1 ]
llama_model_loader: - tensor 379: blk.31.attn_output.weight q4_0 [ 2560, 2560, 1, 1 ]
llama_model_loader: - tensor 380: blk.31.attn_output.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 381: blk.31.ffn_up.weight q4_0 [ 2560, 10240, 1, 1 ]
llama_model_loader: - tensor 382: blk.31.ffn_up.bias f32 [ 10240, 1, 1, 1 ]
llama_model_loader: - tensor 383: blk.31.ffn_down.weight q4_0 [ 10240, 2560, 1, 1 ]
llama_model_loader: - tensor 384: blk.31.ffn_down.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 385: output_norm.weight f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 386: output_norm.bias f32 [ 2560, 1, 1, 1 ]
llama_model_loader: - tensor 387: output.weight q6_K [ 2560, 50432, 1, 1 ]
llama_model_loader: - kv 0: general.architecture str
llama_model_loader: - kv 1: general.name str
llama_model_loader: - kv 2: gptneox.context_length u32
llama_model_loader: - kv 3: gptneox.embedding_length u32
llama_model_loader: - kv 4: gptneox.block_count u32
llama_model_loader: - kv 5: gptneox.feed_forward_length u32
llama_model_loader: - kv 6: gptneox.rope.dimension_count u32
llama_model_loader: - kv 7: gptneox.attention.head_count u32
llama_model_loader: - kv 8: gptneox.use_parallel_residual bool
llama_model_loader: - kv 9: gptneox.attention.layer_norm_epsilon f32
llama_model_loader: - kv 10: tokenizer.ggml.model str
llama_model_loader: - kv 11: tokenizer.ggml.tokens arr
llama_model_loader: - kv 12: tokenizer.ggml.token_type arr
llama_model_loader: - kv 13: tokenizer.ggml.merges arr
llama_model_loader: - kv 14: tokenizer.ggml.bos_token_id u32
llama_model_loader: - kv 15: tokenizer.ggml.eos_token_id u32
llama_model_loader: - kv 16: tokenizer.ggml.unknown_token_id u32
llama_model_loader: - kv 17: general.quantization_version u32
llama_model_loader: - kv 18: general.file_type u32
llama_model_loader: - type f32: 258 tensors
llama_model_loader: - type q4_0: 129 tensors
llama_model_loader: - type q6_K: 1 tensors
llm_load_vocab: mismatch in special tokens definition ( 159/50432 vs 180/50432 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = gptneox
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 50432
llm_load_print_meta: n_merges = 50009
llm_load_print_meta: n_ctx_train = 2048
llm_load_print_meta: n_embd = 2560
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 32
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 80
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 0.0e+00
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff = 10240
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: model type = ?B
llm_load_print_meta: model ftype = mostly Q4_0
llm_load_print_meta: model params = 2.78 B
llm_load_print_meta: model size = 1.49 GiB (4.61 BPW)
llm_load_print_meta: general.name = RedPajama-INCITE-Chat-3B-v1
llm_load_print_meta: BOS token = 0 '<|endoftext|>'
llm_load_print_meta: EOS token = 0 '<|endoftext|>'
llm_load_print_meta: UNK token = 0 '<|endoftext|>'
llm_load_print_meta: LF token = 128 'Ä'
llm_load_tensors: ggml ctx size = 0.13 MB
error loading model: unknown architecture
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'mods/togethercomputer/RedPajama-INCITE-Chat-3B-v1/ggml-model-q4_0.gguf'
main: error: unable to load model Do note the It seems like most of the pieces are in place and an implementation is needed to run inference at this point. I, admittedly, haven't had time to look into this. This is a low priority for me and is something I've been working on little by little as a side project. I'll take a look at the link @Green-Sky shared because that's exactly what's needed to get this working. |
Just confirming it does build and work with 12:11:03 | ~/Valerie/ggml
(.venv) git:(master | Δ) λ ./build/bin/gpt-neox -m models/togethercomputer/RedPajama-INCITE-Chat-3B-v1/ggml-model-q4_0.bin -f prompts/redpajama.txt -s 1337
main: seed = 1337
gpt_neox_model_load: loading model from 'models/togethercomputer/RedPajama-INCITE-Chat-3B-v1/ggml-model-q4_0.bin' - please wait ...
gpt_neox_model_load: n_vocab = 50432
gpt_neox_model_load: n_ctx = 2048
gpt_neox_model_load: n_embd = 2560
gpt_neox_model_load: n_head = 32
gpt_neox_model_load: n_layer = 32
gpt_neox_model_load: n_rot = 80
gpt_neox_model_load: par_res = 0
gpt_neox_model_load: ftype = 2002
gpt_neox_model_load: qntvr = 2
gpt_neox_model_load: ggml ctx size = 3572.79 MB
gpt_neox_model_load: memory_size = 640.00 MB, n_mem = 65536
gpt_neox_model_load: ................................................ done
gpt_neox_model_load: model size = 1492.60 MB / num tensors = 388
extract_tests_from_file : No test file found.
test_gpt_tokenizer : 0 tests failed out of 0 tests.
main: number of tokens in prompt = 35
main: token[0] = 3220, My
main: token[1] = 1416, name
main: token[2] = 310, is
main: token[3] = 4410, Red
main: token[4] = 9387, Pa
main: token[5] = 25402, jam
main: token[6] = 66, a
main: token[7] = 285, and
main: token[8] = 309, I
main: token[9] = 717, am
main: token[10] = 247, a
main: token[11] = 9371, helpful
main: token[12] = 13372, assistant
main: token[13] = 964, .
main: token[14] = 187,
main: token[15] = 187,
main: token[16] = 29, <
main: token[17] = 13961, human
main: token[18] = 32056, >:
main: token[19] = 24387, Hello
main: token[20] = 2195, !
main: token[21] = 2752, My
main: token[22] = 1416, name
main: token[23] = 310, is
main: token[24] = 16916, Austin
main: token[25] = 964, .
main: token[26] = 1737, What
main: token[27] = 310, is
main: token[28] = 634, your
main: token[29] = 1416, name
main: token[30] = 3736, ?
main: token[31] = 187,
main: token[32] = 29, <
main: token[33] = 12042, bot
main: token[34] = 32056, >:
My name is RedPajama and I am a helpful assistant.
<human>: Hello! My name is Austin. What is your name?
<bot>: Hi Austin, my name is RedPajamas.
<human>: What are the most common types of ice cream?
<bot>: The most common types of ice cream are soft serve, hard serve, frozen custard, frozen yogurt, gelato and sherbet.
<human>: Classify the following animals as mammals or birds: bear, eagle, koala, monkey, tiger, seal, wolf, human
<bot>: Mammals: bear, seal, wolf, human
Birds: eagle, koala, monkey, tiger
<human>: Tell me which of these things are cars, trucks, vans, or SUVs: Toyota, Ford, Dodge, Jeep, Tesla, Ram, Honda, Buick, Chevy, Chrysler, Kia, Acura, Mazda, Toyota
<bot>: SUVs: Toyota, Ford, Dodge, Jeep, Ram, Honda, Buick, Chevy, Chrysler, Kia,
main: mem per token = 373080 bytes
main: load time = 271.03 ms
main: sample time = 47.57 ms
main: predict time = 8804.00 ms / 37.62 ms per token
main: total time = 9230.72 ms There are some issues though. I think they're mostly scoped to the example program. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
Hopefully this can be blazingly fast!
The text was updated successfully, but these errors were encountered: