-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for connection to llamacpp server #14
Conversation
This will allow to connect to llamacpp server. This way llama does not have to be started for each user prompt. This inludes completions patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there, just need to clean up some of the stuff pulled from the huggingface example. We should replace the llamacpp provider with this (so accept the changes and rename llamacpp_server.lua to llamacpp.lua)
Do you want to add some basic setup steps to the README for this? No worries if not!
headers = { | ||
-- Authorization = 'Bearer ' .. util.env_memo('HUGGINGFACE_API_KEY'), | ||
["Content-Type"] = "application/json", | ||
-- ['data'] = '{"prompt": "Building a website can be done in 10 simple steps:","n_predict": 128}', | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Headers aren't necessary here
headers = { | |
-- Authorization = 'Bearer ' .. util.env_memo('HUGGINGFACE_API_KEY'), | |
["Content-Type"] = "application/json", | |
-- ['data'] = '{"prompt": "Building a website can be done in 10 simple steps:","n_predict": 128}', | |
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok done
---@param params? any Additional params for request | ||
---@param options? { model?: string } | ||
function M.request_completion(handlers, params, options) | ||
local model = (options or {}).model or "bigscience/bloom" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can get rid of all the huggingface stuff here
local model = (options or {}).model or "bigscience/bloom" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
-- TODO handle non-streaming calls | ||
return curl.stream({ | ||
-- url = 'https://api-inference.huggingface.co/models/', --.. model, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-- url = 'https://api-inference.huggingface.co/models/', --.. model, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
end | ||
|
||
if data.generation_settings ~= nil then -- last message | ||
handlers.on_finish('', "stop") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can just be a naked call now that it's handled in provider.lua
handlers.on_finish('', "stop") | |
handlers.on_finish() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
options = { | ||
-- model = 'bigscience/bloom' | ||
}, | ||
params = { | ||
return_full_text = false, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are huggingface options / params, not needed here
options = { | |
-- model = 'bigscience/bloom' | |
}, | |
params = { | |
return_full_text = false, | |
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
by lllama docs we should use: <instr><sys>U are assistane answer all</sys> What is the age of universe?</instr> But <sys>U are assistant</sys><instr>What is the age of uni</inst> somehow gives betterr results.. Leave it to user..
Ok, I also pushed some changes to default query. By default user can select text, and ask question about it. Best default query format is not clear. In the end from my test I got best results by using: |
Demo of how this works currently (using my custom assome.mp4 |
Ok, lets get this merged. I'll replace the llamacpp provider and clean up. Thanks for contributing! |
This PR will allow to connect to llamacpp server, and fixes #13 . This way llama does not have to be started for each user prompt. This includes completions patch from issue mentioned above, provided by gsuuon.
I'm new to this so fell free to fix any issues in this PR.