Model doesn't want to load #1128

musikowskipawel · 2024-09-18T23:19:09Z

I have i5-6400, 16gb ram, rtx 3060ti 8gb

i'm trying to load model:
LLaMA2-13B-Tiefighter.Q4_K_S.gguf

i launch program koboldcpp.exe (exactly this version) and setting Gpu Layers to 41 as recommended (on -1 the issue is exactly the same though)

then i click launch, click my model and there's some text, but no vram, memory or cpu usage change has been monitored

the log:


***
Welcome to KoboldCpp - Version 1.74
For command line arguments, please refer to --help
***
Auto Selected CUDA Backend...

Attempting to use CuBLAS library for faster prompt ingestion. A compatible CuBLAS will be required.
Initializing dynamic library: koboldcpp_cublas.dll
==========
Namespace(benchmark=None, blasbatchsize=512, blasthreads=2, chatcompletionsadapter=None, config=None, contextsize=4096, debugmode=0, flashattention=False, forceversion=0, foreground=False, gpulayers=41, highpriority=False, hordeconfig=None, hordegenlen=0, hordekey='', hordemaxctx=0, hordemodelname='', hordeworkername='', host='', ignoremissing=False, launch=True, lora=None, mmproj=None, model='', model_param='C:/Users/Paul/Desktop/text-generation-webui-main/models/LLaMA2-13B-Tiefighter.Q4_K_S.gguf', multiuser=1, noavx2=False, noblas=False, nocertify=False, nommap=False, nomodel=False, noshift=False, onready='', password=None, port=5001, port_param=5001, preloadstory=None, prompt='', promptlimit=100, quantkv=0, quiet=False, remotetunnel=False, ropeconfig=[0.0, 10000.0], sdclamped=0, sdconfig=None, sdlora='', sdloramult=1.0, sdmodel='', sdquant=False, sdthreads=2, sdvae='', sdvaeauto=False, skiplauncher=False, smartcontext=False, ssl=None, tensor_split=None, threads=2, unpack='', useclblast=None, usecublas=['normal', '0', 'mmq'], usemlock=False, usevulkan=None, whispermodel='')
==========
Loading model: C:\Users\Paul\Desktop\text-generation-webui-main\models\LLaMA2-13B-Tiefighter.Q4_K_S.gguf

The reported GGUF Arch is: llama
Arch Category: 0

---
Identified as GGUF model: (ver 6)
Attempting to Load...
---
Using automatic RoPE scaling for GGUF. If the model has custom RoPE settings, they'll be used directly instead!
It means that the RoPE values written above will be replaced by the RoPE values indicated after loading.
System Info: AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
llama_model_loader: loaded meta data with 20 key-value pairs and 363 tensors from C:\Users\Paul\Desktop\text-generation-webui-m¶¬tÕXllm_load_vocab: special tokens cache size = 3
llm_load_vocab: token to piece cache size = 0.1684 MB
llm_load_print_meta: format           = GGUF V2
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 4096
llm_load_print_meta: n_embd           = 5120
llm_load_print_meta: n_layer          = 40
llm_load_print_meta: n_head           = 40
llm_load_print_meta: n_head_kv        = 40
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 5120
llm_load_print_meta: n_embd_v_gqa     = 5120
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 13824
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 4096
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: ssm_dt_b_c_rms   = 0
llm_load_print_meta: model type       = 13B
llm_load_print_meta: model ftype      = all F32
llm_load_print_meta: model params     = 13.02 B
llm_load_print_meta: model size       = 6.90 GiB (4.56 BPW)
llm_load_print_meta: general.name     = LLaMA v2
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: PAD token        = 0 '<unk>'
llm_load_print_meta: LF token         = 13 '<0x0A>'
llm_load_print_meta: max token length = 48
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3060 Ti, compute capability 8.6, VMM: yes

The text was updated successfully, but these errors were encountered:

LostRuins · 2024-09-19T02:16:17Z

Hi, the terminal could have been accidentally paused. When this happens, please try closing the re-opening koboldcpp.

henk717 · 2024-09-21T13:26:40Z

When this happens your driver is compiling the required support to run KoboldCpp, this is normal especially on the CUDA11 version of KoboldCpp. koboldcpp_cu12.exe is faster with this, but unlike what people advice the best solution is actually to leave it open and just do its thing. It will get unstuck once the driver compiled everything it needed to run the CUDA11 version on top of CUDA12.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model doesn't want to load #1128

Model doesn't want to load #1128

musikowskipawel commented Sep 18, 2024

LostRuins commented Sep 19, 2024

henk717 commented Sep 21, 2024

Model doesn't want to load #1128

Model doesn't want to load #1128

Comments

musikowskipawel commented Sep 18, 2024

LostRuins commented Sep 19, 2024

henk717 commented Sep 21, 2024