Skip to content
This repository has been archived by the owner on Sep 12, 2024. It is now read-only.

Can't run example on llama-2-13b-chat q4_0 #116

Open
gioragutt opened this issue Aug 26, 2023 · 2 comments
Open

Can't run example on llama-2-13b-chat q4_0 #116

gioragutt opened this issue Aug 26, 2023 · 2 comments

Comments

@gioragutt
Copy link

gioragutt commented Aug 26, 2023

I apologize in advance if I omit any useful details, I'm just a simple dev with no knowledge or understanding in DS and therefore I'm in trial and error land.

I followed the instructions from llama.cpp on the llama-2-13b-chat model, and I now have the q4_0 file: llama-2-13b-chat/ggml-model-q4_0.gguf.

I use the example code from this repo and of course have changed it to point to the model file, but loading fails:

The code:

import { LLM } from 'llama-node';
import { LLamaCpp } from 'llama-node/dist/llm/llama-cpp.js';
import path from 'path';

const model = path.resolve(
	process.cwd(),
	'../llama.cpp/models/llama-2-13b-chat/ggml-model-q4_0.gguf',
);

console.log(model);

const llama = new LLM(LLamaCpp);
/** @type {import('llama-node/dist/llm/llama-cpp').LoadConfig} */
const config = {
	modelPath: model,
	enableLogging: true,
	nCtx: 1024,
	seed: 0,
	f16Kv: false,
	logitsAll: false,
	vocabOnly: false,
	useMlock: false,
	embedding: false,
	useMmap: true,
	nGpuLayers: 128,
};

const template = `How are you?`;
const prompt = `A chat between a user and an assistant.
USER: ${template}
ASSISTANT:`;

const params = {
	nThreads: 4,
	nTokPredict: 2048,
	topK: 40,
	topP: 0.1,
	temp: 0.2,
	repeatPenalty: 1,
	prompt,
};

const run = async () => {
	await llama.load(config);

	await llama.createCompletion(params, response => {
		process.stdout.write(response.token);
	});
};

run();

The error:

Debugger listening on ws://127.0.0.1:59899/c72280cb-a098-4c15-859f-54025e513896
For help, see: https://nodejs.org/en/docs/inspector
Debugger attached.
/Users/gioraguttsait/Git/personal-repos/llm/llama.cpp/models/llama-2-13b-chat/ggml-model-q4_0.gguf
llama.cpp: loading model from /Users/gioraguttsait/Git/personal-repos/llm/llama.cpp/models/llama-2-13b-chat/ggml-model-q4_0.gguf
error loading model: unknown (magic, version) combination: 46554747, 00000001; is this really a GGML file?
llama_init_from_file: failed to load model
Waiting for the debugger to disconnect...
node:internal/process/promises:288
            triggerUncaughtException(err, true /* fromPromise */);
            ^

[Error: Failed to initialize LLama context from file: /Users/gioraguttsait/Git/personal-repos/llm/llama.cpp/models/llama-2-13b-chat/ggml-model-q4_0.gguf] {
  code: 'GenericFailure'
}

Node.js v18.17.1

I can see that the error refers to some constants which it doesn't expect in the file (error loading model: unknown (magic, version) combination: 46554747, 00000001; is this really a GGML file?), and I see that it's a gguf file and not a ggml one.

From a quick google search, I got to this post on r/LocalLLaMA which stats that gguf is sort of a successor to ggml.

I have literally 0 understanding of what I'm doing, and would appreciate if someone could point me in some direction of how to deal with it. Even just pointing out keywords I might have missed which could have led me to find a better answer in the first place 😅

Thanks in advance for your time!

@NoodleBug
Copy link

Exact same issue here. Did you manage to find a work around? I might be wrong, but it doesn't look like this library's llama-cpp has been updated in ~4 months. I wonder if that's the issue.

@dseeker
Copy link

dseeker commented Sep 22, 2023

is there a way to overlay a newest version of llama?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants