-
Notifications
You must be signed in to change notification settings - Fork 913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mlx_whisper: add support for audio input from stdin #1012
Conversation
Thanks for the addition. What do you think about a couple modifications:
|
Done. I agree self consistency between related projects is worth more than aesthetic preferences. This does have the nice effect of eliminating test cases. The only tradeoff is users who reflexively think they can pipe anything into any tool's bare name will have to read the docs.
I've come around to |
|
||
parser.add_argument("audio", nargs="+", help="Audio file(s) to transcribe") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my black==24.4.2
insists on re-formatting this line which didn't change
whisper/mlx_whisper/cli.py
Outdated
parser.add_argument( | ||
"--output-name", | ||
type=str, | ||
default="{basename}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this defaults to pre-existing behavior, I would consider the default
an implementation detail not for the user to be concerned about. If this is too much, we can default to None
and handle the None inside the implementation.
whisper/mlx_whisper/writers.py
Outdated
if isinstance(audio_obj, (str, pathlib.Path)): | ||
basename = pathlib.Path(audio_obj).stem | ||
else: | ||
# mx.array, np.ndarray, etc | ||
basename = "content" | ||
|
||
output_basename = self.output_name_template.format(basename=basename) | ||
|
||
output_path = (pathlib.Path(self.output_dir) / output_basename).with_suffix( | ||
f".{self.extension}" | ||
) | ||
|
||
with open(output_path, "w", encoding="utf-8") as f: | ||
with output_path.open("wt", encoding="utf-8") as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored using more "modern" Pathlib, see https://docs.astral.sh/ruff/rules/builtin-open/
whisper/test_cli.sh
Outdated
--output-name "{basename}_mwpl_${test_val}" \ | ||
--output-dir "$TEST_OUTPUT_DIR" \ | ||
--output-format srt \ | ||
--max-words-per-line $test_val \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this I think can be useful for research, run a variation of transcriptions by adjusting knobs, then name those outputs appropriately
whisper/test_cli.sh
Outdated
@@ -0,0 +1,69 @@ | |||
#!/bin/zsh -e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a bare bones shell test harness to add coverage for now, without involving this PR in choosing a higher level shell test runner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the addition, sorry for the delay in getting this landed!
Problem
I wanted to pipe an audio file to
mlx_whisper
, but found it only accepted file paths. This PR will allowmlx_whisper
to accept stdin and pass it toffmpeg
accordingly then allow the rest of the workflow to go on as usual.Changes
load_audio
helper adjustsffmpeg
flags based on file path vs. stdin modeaudio
arg if stdin is determined to be active--input-name
arg is supported to help users name the otherwise anonymous stdin content (cannot guess from file path)zsh
file to drive and test the changes from the CLIProcess
black
andpre-commit
on changes prior to PRpython test.py
shows 4 errors, some regarding floating point comparisons. Looks very far away from my change, may be known issues.