Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audio Component Streaming Behaviour is weird? #7742

Closed
1 task done
s-kruschel opened this issue Mar 19, 2024 · 9 comments
Closed
1 task done

Audio Component Streaming Behaviour is weird? #7742

s-kruschel opened this issue Mar 19, 2024 · 9 comments
Labels
bug Something isn't working

Comments

@s-kruschel
Copy link

s-kruschel commented Mar 19, 2024

Describe the bug

Hey folks,

I've searched for similar issues, and there are several gradio Audio component issues. So I'm not sure if they report the same problems.

What I'm trying to do is to stream the TTS OpenAI API response. The OpenAI part is working. However, I do not get the Audio component behaviour.

What I've tried:

  1. To return only a single bytes object chunk. This leads to stuttering voice as the audio plays, then stops, then receives the next chunk.
  2. To return a concatenation of all bytes object chunks (chunks += chunk). This leads to audio, that plays for a second until the next chunk is concatenated to the already existing chunks. Then the audio autoplay starts from beginning. Hence, the audio is also stuttering and never plays through.

Further, only
out = gr.Audio(autoplay=True) seems to work.
out = gr.Audio(autoplay=True, streaming=True) does not work and it just does nothing for whatever reason.

Actually, the optimal solution in my opinion would be, if "streaming=True" is set and one appends the incoming chunks to the already existing chunks, that the audio component does not always restart to play.

Have you searched existing issues? 🔎

  • I have searched and found no existing issues

Reproduction

def text_to_speech_streaming():
    with client.audio.speech.with_streaming_response.create(
            model="tts-1-hd",
            voice="alloy",
            input="This is a special test text that I want to get generated to test streaming the generated voice directly from OpenAI into my gradio application."
        ) as response:
            for chunk in response.iter_bytes(chunk_size=8192):
                yield chunk  

def add_to_stream(audio, instream):
    global tts_generator
   
    if audio is None:
        return gr.update(), instream
    
    if tts_generator is None:
        tts_generator = text_to_speech_streaming()
    
    try: 
        chunk = next(tts_generator)
    except StopIteration:
        tts_generator = None
     
    
    return chunk, chunk


with gr.Blocks() as demo:
    inp = gr.Audio(sources="microphone")
    out = gr.Audio(streaming=True)
    stream = gr.State()

    clear = gr.Button("Clear")

    inp.stream(add_to_stream, [inp, stream], [out, stream])
    clear.click(lambda: [None, None, None], None, [inp, out, stream])


if __name__ == "__main__":
    demo.launch()

Screenshot

No response

Logs

No response

System Info

Gradio Environment Information:
------------------------------
Operating System: Darwin
gradio version: 4.18.0
gradio_client version: 0.10.0

------------------------------------------------
gradio dependencies in your environment:

aiofiles: 23.2.1
altair: 5.2.0
fastapi: 0.109.2
ffmpy: 0.3.2
gradio-client==0.10.0 is not installed.
httpx: 0.26.0
huggingface-hub: 0.20.3
importlib-resources: 6.1.1
jinja2: 3.1.3
markupsafe: 2.1.5
matplotlib: 3.8.2
numpy: 1.26.4
orjson: 3.9.13
packaging: 23.2
pandas: 2.2.1
pillow: 10.2.0
pydantic: 2.6.1
pydub: 0.25.1
python-multipart: 0.0.9
pyyaml: 6.0.1
ruff: 0.2.1
semantic-version: 2.10.0
tomlkit==0.12.0 is not installed.
typer: 0.9.0
typing-extensions: 4.9.0
uvicorn: 0.27.1
authlib; extra == 'oauth' is not installed.
itsdangerous; extra == 'oauth' is not installed.


gradio_client dependencies in your environment:

fsspec: 2024.2.0
httpx: 0.26.0
huggingface-hub: 0.20.3
packaging: 23.2
typing-extensions: 4.9.0
websockets: 11.0.3

Severity

I can work around it

@s-kruschel s-kruschel added the bug Something isn't working label Mar 19, 2024
@ajayarora1235
Copy link

did you end up finding a solution to this?

@s-kruschel
Copy link
Author

Unfortunately not…

@abidlabs
Copy link
Member

Should be fixed via #8906. If you'd like to try it out, you can install gradio from this branch: #8843

@pablovela5620
Copy link

@s-kruschel what work around did you find in the meantime? @abidlabs it would be awesome if there was an audio-streaming example similar to this https://www.gradio.app/guides/streaming-outputs
Right now its not super clear exactly how audio streaming outputs work (in particular for tts)

@freddyaboulton
Copy link
Collaborator

@pablovela5620 - we have a draft guide for audio streaming that will be published in 5.0.

Feedback welcome as we're still tweaking the implementation https://gradio-d8zf06g8v-hugging-face.vercel.app/main/guides/streaming-outputs#streaming-media

@pablovela5620
Copy link

Beautiful! Ya'll were already thinking about this, I'll take a read

@pablovela5620
Copy link

@freddyaboulton
One thing that would help with this documentation would be a more comprehensive example like using inference pro bark API or maybe https://github.com/huggingface/parler-tts. Its not super clear to me if whats is being returned is an incrementally updated file or bytes

having more verbose explicit examples help a lot with getting things right from the get-go (even if they may be overly verbose at times)

@freddyaboulton
Copy link
Collaborator

Hi @pablovela5620 - You just need to return the next chunk of bytes (or a file containing the next chunks).

I've prepared this example using Parler TTS: https://huggingface.co/spaces/gradio/magic-8-ball

It's added in this PR which adds more guides for streaming (#9173)

@pablovela5620
Copy link

@freddyaboulton you are awesome, this is exactly what I was looking for. Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants