feat: Add Whisper transcription support with speech_recognition fallback #326

AlexZhangji · 2025-02-12T01:40:20Z

Changes

Added OpenAI Whisper support for audio transcription in WavConverter and Mp3Converter
Implemented automatic fallback to speech_recognition if Whisper fails or non-OpenAI client passed in.
Added audio transcription tests (commented out by default) with a short test.wav.

Usage

from markitdown import MarkItDown
from openai import OpenAI

client = OpenAI()  
md = MarkItDown(llm_client=client)

result = md.convert("audio.wav")  # or "audio.mp3"
print(result.text_content)
# Output: ### Audio Transcript (Whisper):
#         <transcription content in markdown>

…client and openai

AlexZhangji · 2025-02-12T01:43:24Z

@microsoft-github-policy-service agree

AlexZhangji added 4 commits February 11, 2025 17:26

add whisper support for audio transcript. only trigger when have llm_…

8301427

…client and openai

fallback to _transcribe_audio

b8927e5

add test for audio. commented out by default.

1f3d3ef

add test audio file. (the moon landing audio)

edc71db

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add Whisper transcription support with speech_recognition fallback #326

feat: Add Whisper transcription support with speech_recognition fallback #326

AlexZhangji commented Feb 12, 2025 •

edited

Loading

AlexZhangji commented Feb 12, 2025

feat: Add Whisper transcription support with speech_recognition fallback #326

Are you sure you want to change the base?

feat: Add Whisper transcription support with speech_recognition fallback #326

Conversation

AlexZhangji commented Feb 12, 2025 • edited Loading

Changes

Usage

AlexZhangji commented Feb 12, 2025

AlexZhangji commented Feb 12, 2025 •

edited

Loading