-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: chat completions endpoint with support for receiving and generating audio #231
Comments
I think that one component as well to include in this picture is an LLM too. We are looking at Llama-3.2-3B-Instruct but fine tuning it for answering phone calls for a doctor's office. |
Request caching must be implemented to support multi-turn conversations. |
@silvacarl2 just wanted to let you know that a mix of audio, text, and image inputs is supported in v0.7.0 |
Is that part of the v0.7.0 release? |
No, I haven't added request caching for transcriptions yet. See https://speaches.ai/usage/voice-chat/#limitations |
Ok, that link is still down. Thanks.
…On Fri, Jan 31, 2025 at 6:58 AM Fedir Zadniprovskyi < ***@***.***> wrote:
No, I haven't added request caching for transcriptions yet. See
https://speaches.ai/usage/voice-chat/#limitations
—
Reply to this email directly, view it on GitHub
<#231 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ARDSKD3KGT2SROEE6BEZOAD2NOFQ3AVCNFSM6AAAAABVATF6YOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMRXGU2TAMJTGE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
This is what I'm referring to: Audio generation
OpenAI has recently added
gpt-4o-audio-preview
model, which supports the following input/output combinations:I want to create a
POST /v1/chat/completions
endpoint emulating this functionality. This project will not turn into another LLM runtime like Ollama or VLLM. I'm thinking ofspeaches
acting as a proxy. Here's the flow I have in mind for text + audio in → text + audio out:messages
into something a regular LLM can work withtransformed response with LLM text response + generated speech
Comments and suggestions are welcome!
The text was updated successfully, but these errors were encountered: