Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming Microphone for CLI #35

Merged
merged 5 commits into from
Feb 23, 2024

Conversation

jkrukowski
Copy link
Contributor

@jkrukowski jkrukowski commented Feb 21, 2024

This PR:

  • extracts the streaming logic from ContentView.swift
  • adds AudioWarper AudioStreamTranscriber to combine streaming audio from the microphone, processing it, and transcribing it in real-time
  • adds Transcriber protocol to expose WhisperKit transcribe methods

AudioWarper AudioStreamTranscriber can replace the streaming logic in the example app as well

Resolves: #25

Copy link
Contributor

@ZachNagengast ZachNagengast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good so far! Excellent approach taking the code from the example and adding it to a shared class. I'm curious about the Transcriber protocol, can you elaborate on your use case for that?

Since we have this new AudioWarper here now, do you think there's any recording code in AudioProcessor that would fit into here as well? Would be nice to have a few debug logs from Logging.debug in this section as well.

One other thing: I think the microphone streaming should be explicit in swift run transcribe --model-path "Models/whisperkit-coreml/openai_whisper-large-v3" such as a --stream boolean argument. Reason for that is ideally giving people a heads up if they forgot to include --audio-path, and only requesting the microphone if we are sure they want to stream.

This should allow significant cleanup for the example apps with this shared interface, but that can happen separately, nicely done!

@jkrukowski
Copy link
Contributor Author

@ZachNagengast thanks for your comments, added some changes, lmk what you think

I'm curious about the Transcriber protocol, can you elaborate on your use case for that?

In order to do the transcription in AudioStreamTranscriber I'd to need to pass the whole WhisperKit class there. I'd rather pass an object that contains just the methods I need and Transcriber protocol could be a 1st step towards that. I imagine that there is a separate class that implements the Transcriber protocol and contains the transcribe methods that currently are in WhisperKit. This way I could:

  • pass just the Transcriber object to AudioStreamTranscriber and not the whole WhisperKit class
  • mix Transcriber with other "streamers" (e.g. RemoteAudioStreamTranscriber) in the future
  • mock Transcriber in tests
  • make WhisperKit class smaller and more focused on being an entry point for the library

Since we have this new AudioWarper here now, do you think there's any recording code in AudioProcessor that would fit into here as well? Would be nice to have a few debug logs from Logging.debug in this section as well.

added more logging, changed name to AudioStreamTranscriber

One other thing: I think the microphone streaming should be explicit in swift run transcribe --model-path "Models/whisperkit-coreml/openai_whisper-large-v3" such as a --stream boolean argument. Reason for that is ideally giving people a heads up if they forgot to include --audio-path, and only requesting the microphone if we are sure they want to stream.

added --stream flag

This should allow significant cleanup for the example apps with this shared interface, but that can happen separately, nicely done!

I could work on this cleanup ofc

@jkrukowski jkrukowski marked this pull request as ready for review February 22, 2024 14:10
@ZachNagengast
Copy link
Contributor

Alright tried this out and have just some minor tweaks to the UI:

  • When running initially, it should output some info about the model status to the cli, such as "Loading models..." etc
  • It would be ideal to find a way to not print to the CLI every loop when not in --verbose mode, makes it hard to use as a piped input to other CLI commands (like outputting stdout to a file). Instead, it could only print the new unconfirmed segments, or possibly even replace the current line with currentText for a more live output. Lmk what you think.

Everything else looks good, I'll try to help with this as well after word timestamps.

@jkrukowski
Copy link
Contributor Author

  • When running initially, it should output some info about the model status to the cli, such as "Loading models..." etc

I'm using the model loading function from WhipserKit should I change the Logging to print there? Or you had something else in mind?

  • It would be ideal to find a way to not print to the CLI every loop when not in --verbose mode, makes it hard to use as a piped input to other CLI commands (like outputting stdout to a file). Instead, it could only print the new unconfirmed segments, or possibly even replace the current line with currentText for a more live output. Lmk what you think.

changed the state change callback in AudioStreamTranscriber so right now it's going to print only if currentText, unconfirmed or confirmed segments has changed, would that be ok?

@atiorh
Copy link
Contributor

atiorh commented Feb 23, 2024

Great work @jkrukowski and @ZachNagengast! We can work on improving the output formatting next week on a separate PR

@ZachNagengast ZachNagengast merged commit da88f72 into argmaxinc:main Feb 23, 2024
1 check passed
@jkrukowski jkrukowski deleted the streaming-microphone-cli branch March 2, 2024 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Streaming Microphone for CLI
3 participants