Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Audio] Microphone Capture - Allow setting smaller chunk size for low latency #6526

Closed
1 task done
virajkarandikar opened this issue Nov 21, 2023 · 11 comments
Closed
1 task done
Labels
enhancement New feature or request

Comments

@virajkarandikar
Copy link

  • I have searched to see if a similar issue already exists.

By default the streaming mic capture uses buffer/chunk size of 1 second. This adds a long latency in real time applications. Can the chunk size be made configurable/smaller?

Is your feature request related to a problem? Please describe.
Large buffer increases audio latency and makes application sluggish to use.

Describe the solution you'd like
Provide a parameter to configure chunk size when using streaming mic capture

Additional context
Add any other context or screenshots about the feature request here.

@abidlabs
Copy link
Member

Hi @virajkarandikar can you provide sample code we can use to look at the issue?

@abidlabs abidlabs added the enhancement New feature or request label Nov 21, 2023
@virajkarandikar
Copy link
Author

virajkarandikar commented Nov 23, 2023

Code is simple.

with gr.Blocks() as demo:
        audio = gr.Audio(streaming=True)

        def process_audio(audio):
            rate, data = audio
            print(f"rate: {rate}, samples: {len(data)}")

        audio.stream(process_audio, [audio], None)

Below is the log I get on console.

rate: 48000, samples: 24000
rate: 48000, samples: 48000
rate: 48000, samples: 24000
rate: 48000, samples: 24000
rate: 48000, samples: 24000
rate: 48000, samples: 24000

Log indicates - sample rate is 48000, channels is 1, chunk size varies between 24000 (0.5 sec) and 48000 (1 sec). This adds significant latency.

Also the uncompressed audio data at 48000Hz is streamed from the client to application and it adds some amount of network latency. My case model expects 16000 sample rate. So if I can specify sample rate for mic capture, it will reduce the amount of data transfer by 1/3rd. But for that I have filed another issue here #5848.

@virajkarandikar virajkarandikar changed the title Microphone Capture - Allow setting smaller chunk size for low latency [Audio] Microphone Capture - Allow setting smaller chunk size for low latency Nov 24, 2023
@qianhuiliu
Copy link

Hello, have you figured out how to do it? I have the same question.

@virajkarandikar
Copy link
Author

Any update here?

@gaborvecsei
Copy link

I am also interested in this

@virajkarandikar
Copy link
Author

Ping...

@abidlabs
Copy link
Member

This is on our radar, but maybe will take a few weeks for us to get to as we have a lot of other issues we're tackling as well. We are happy to review any PRs if you'd like to contribute this fix.

cc @aliabid94

@mcorroyer
Copy link

I have the same issue, has it progressed?

@adirajagopal
Copy link

This is on our radar, but maybe will take a few weeks for us to get to as we have a lot of other issues we're tackling as well. We are happy to review any PRs if you'd like to contribute this fix.

cc @aliabid94

Hi, is there a specific part of the code base you could point to to suggest how we can reduce the chunk size of the stream? This would help with guiding the PR. Thanks!

@JohanWork
Copy link

@abidlabs could you point out where the chunk size is set? Happy to contribute

@abidlabs
Copy link
Member

Actually, we've implemented this already in our 5.0-dev branch! Let me point you to the PR where you can install and try it out: #8941

Here's a simple transcription demo where you can set your own stream_every param: #8941 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants