Streaming Microphone for CLI #35

jkrukowski · 2024-02-21T16:33:44Z

This PR:

extracts the streaming logic from ContentView.swift
adds ~~AudioWarper~~ AudioStreamTranscriber to combine streaming audio from the microphone, processing it, and transcribing it in real-time
adds Transcriber protocol to expose WhisperKit transcribe methods

~~AudioWarper~~ AudioStreamTranscriber can replace the streaming logic in the example app as well

Resolves: #25

ZachNagengast

Looks really good so far! Excellent approach taking the code from the example and adding it to a shared class. I'm curious about the Transcriber protocol, can you elaborate on your use case for that?

Since we have this new AudioWarper here now, do you think there's any recording code in AudioProcessor that would fit into here as well? Would be nice to have a few debug logs from Logging.debug in this section as well.

One other thing: I think the microphone streaming should be explicit in swift run transcribe --model-path "Models/whisperkit-coreml/openai_whisper-large-v3" such as a --stream boolean argument. Reason for that is ideally giving people a heads up if they forgot to include --audio-path, and only requesting the microphone if we are sure they want to stream.

This should allow significant cleanup for the example apps with this shared interface, but that can happen separately, nicely done!

jkrukowski · 2024-02-22T14:09:09Z

@ZachNagengast thanks for your comments, added some changes, lmk what you think

I'm curious about the Transcriber protocol, can you elaborate on your use case for that?

In order to do the transcription in AudioStreamTranscriber I'd to need to pass the whole WhisperKit class there. I'd rather pass an object that contains just the methods I need and Transcriber protocol could be a 1st step towards that. I imagine that there is a separate class that implements the Transcriber protocol and contains the transcribe methods that currently are in WhisperKit. This way I could:

pass just the Transcriber object to AudioStreamTranscriber and not the whole WhisperKit class
mix Transcriber with other "streamers" (e.g. RemoteAudioStreamTranscriber) in the future
mock Transcriber in tests
make WhisperKit class smaller and more focused on being an entry point for the library

Since we have this new AudioWarper here now, do you think there's any recording code in AudioProcessor that would fit into here as well? Would be nice to have a few debug logs from Logging.debug in this section as well.

added more logging, changed name to AudioStreamTranscriber

One other thing: I think the microphone streaming should be explicit in swift run transcribe --model-path "Models/whisperkit-coreml/openai_whisper-large-v3" such as a --stream boolean argument. Reason for that is ideally giving people a heads up if they forgot to include --audio-path, and only requesting the microphone if we are sure they want to stream.

added --stream flag

This should allow significant cleanup for the example apps with this shared interface, but that can happen separately, nicely done!

I could work on this cleanup ofc

ZachNagengast · 2024-02-22T22:19:27Z

Alright tried this out and have just some minor tweaks to the UI:

When running initially, it should output some info about the model status to the cli, such as "Loading models..." etc
It would be ideal to find a way to not print to the CLI every loop when not in --verbose mode, makes it hard to use as a piped input to other CLI commands (like outputting stdout to a file). Instead, it could only print the new unconfirmed segments, or possibly even replace the current line with currentText for a more live output. Lmk what you think.

Everything else looks good, I'll try to help with this as well after word timestamps.

jkrukowski · 2024-02-23T14:24:18Z

When running initially, it should output some info about the model status to the cli, such as "Loading models..." etc

I'm using the model loading function from WhipserKit should I change the Logging to print there? Or you had something else in mind?

It would be ideal to find a way to not print to the CLI every loop when not in --verbose mode, makes it hard to use as a piped input to other CLI commands (like outputting stdout to a file). Instead, it could only print the new unconfirmed segments, or possibly even replace the current line with currentText for a more live output. Lmk what you think.

changed the state change callback in AudioStreamTranscriber so right now it's going to print only if currentText, unconfirmed or confirmed segments has changed, would that be ok?

atiorh · 2024-02-23T20:14:56Z

Great work @jkrukowski and @ZachNagengast! We can work on improving the output formatting next week on a separate PR

added AudioWarper to allow streaming in cli

d74fb1a

ZachNagengast assigned jkrukowski Feb 21, 2024

ZachNagengast reviewed Feb 21, 2024

View reviewed changes

review changes, renaming, logging, watchos audio permissions

47fde68

jkrukowski marked this pull request as ready for review February 22, 2024 14:10

changed audio stream transcriber callback

ddc430b

jkrukowski added 2 commits February 23, 2024 16:09

added todo

d3a77f4

additional prints

20df322

ZachNagengast approved these changes Feb 23, 2024

View reviewed changes

ZachNagengast merged commit da88f72 into argmaxinc:main Feb 23, 2024
1 check passed

jkrukowski deleted the streaming-microphone-cli branch March 2, 2024 17:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming Microphone for CLI #35

Streaming Microphone for CLI #35

jkrukowski commented Feb 21, 2024 •

edited

Loading

ZachNagengast left a comment

jkrukowski commented Feb 22, 2024

ZachNagengast commented Feb 22, 2024

jkrukowski commented Feb 23, 2024

atiorh commented Feb 23, 2024

Streaming Microphone for CLI #35

Streaming Microphone for CLI #35

Conversation

jkrukowski commented Feb 21, 2024 • edited Loading

ZachNagengast left a comment

Choose a reason for hiding this comment

jkrukowski commented Feb 22, 2024

ZachNagengast commented Feb 22, 2024

jkrukowski commented Feb 23, 2024

atiorh commented Feb 23, 2024

jkrukowski commented Feb 21, 2024 •

edited

Loading