-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support new version of llamacpp #9
Comments
Hi Saifeddine, Yes I am in the process of syncing the bindings with the latest Meanwhile, could you please share with me some links of the models that do not work so I can test them ? Thank you! |
hi, thanks, here are some. The bloke has up todate models. for example : |
Yeah, those models are converted to the newest version of I think I will push a release up to that change, and then another release after that change, this way one can choose what version to use based on the models they have ? what do you think ? |
It can be done like this. For now, I use the official llamacpp binding for the new models format and yours for the previous ones. But maybe having two releases is a good thing. |
But Yeah feel free to use either one. The official bindings are great as well. |
I used these steps to update the new model:
..but indeed - the pyllamacpp bindings are now broken. I'll have a look and see if I can switch to the abetlen/llama-cpp-python bindings in the meantime, and get it to work. But yeah - version upgrade is a real time waster, which is why developers should take note and either 1. make it easy to update or 2. make your app/framework/etc. backward compatible with older versions.
It doesn't work for me. It is hallucinating 100% of the time, often in random languages, and not responding coherently to any of my prompts. |
Ok, I have a workaround for now. It seems that the prompt_context, prompt_suffix and prompt_prefix are broken. So I have to add them manually into the prompt for now. These python bindings are the only ones working with the new update of LLama.cpp. So well done UPDATE: It seems to repeat a lot of answers, which I don't think used to happen before (or maybe I missed it?). |
How do i upgrade any model to the new version of ggjt 2? |
@twinlizzie from where did you get the steps you described here ? Yeah, unfortunately, |
What do you mean by it repeat answers ? You mean you get the same answer everytime you run the generation ? |
@Naugustogi, afaik you will need to get the Pytorch models and re-quantize them to a supported format. |
Yep. Convert-pth-to-ggml works now to convert larger models to the ggjtv2. And I figured out the steps on my own.
Actually, I'm not entirely sure. I set the repeat_penalty to 1.2 and it seems to have fixed it, for now. It would sometimes get stuck in a loop where you get the same type of answer no matter what you ask. On the llama-cpp-python repo it seems to be even worse because you always get the same answers 100%. (To the same question that is) .Or maybe I'm missing something about how to properly run the api... The Llama.cpp itself does not have this problem and works perfectly even with my diy upgraded model. I get a different answer to the same question which is how I want it. |
I just tested it again on my end with the models above and I don't have this problem. Everytime I run it I get a different answer. |
Hi Abdeladim, there are many new models that can't run on the pyllamacpp binding because they are using version 2 of ggml.
If you have some time, can you try and add support to this please?
The text was updated successfully, but these errors were encountered: