-
Notifications
You must be signed in to change notification settings - Fork 15.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
YoutubeAudioLoader and updates to OpenAIWhisperParser #5772
YoutubeAudioLoader and updates to OpenAIWhisperParser #5772
Conversation
# Split the audio into chunk_duration_ms chunks | ||
for split_number,i in enumerate(range(0, len(audio), chunk_duration_ms)): | ||
|
||
print(f"Transcribing part {split_number}!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print(f"Transcribing part {split_number}!") |
|
||
with blob.as_bytes_io() as f: | ||
transcript = openai.Audio.transcribe("whisper-1", f) | ||
yield Document( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we yield a single document if the input is a single audio file and we're trying to hide the fact there's chunking under the hood? We can collect the transcripts and concatenate them. The only problem is that it's unclear on which delimiter to use to join on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be easy to do this. E.g., we can build a single blob from the combined docs:
combined_docs = [doc.page_content for doc in docs].join(strings)
But, as discussed, it's kind of nice to have the intermediate outputs.
(The latency is somewhat high - 15 min for 2 hr video.)
with yt_dlp.YoutubeDL(ydl_opts) as ydl: | ||
info = ydl.extract_info(url,download=False) | ||
title = info.get('title', 'video') | ||
print(f"Writing file: {title} to {self.save_dir}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print(f"Writing file: {title} to {self.save_dir}") |
01a5729
to
74326d6
Compare
try: | ||
from pydub import AudioSegment | ||
except ImportError: | ||
print("Please install pydub : pip install pydub") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace with raise ValueError or ImportError
try: | ||
import openai | ||
except ImportError: | ||
print("Please install openai : pip install openai") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to be raised as well
74326d6
to
4f0e4ca
Compare
4f0e4ca
to
e1fa1a4
Compare
) This introduces the `YoutubeAudioLoader`, which will load blobs from a YouTube url and write them. Blobs are then parsed by `OpenAIWhisperParser()`, as show in this [PR](langchain-ai#5580), but we extend the parser to split audio such that each chuck meets the 25MB OpenAI size limit. As shown in the notebook, this enables a very simple UX: ``` # Transcribe the video to text loader = GenericLoader(YoutubeAudioLoader([url],save_dir),OpenAIWhisperParser()) docs = loader.load() ``` Tested on full set of Karpathy lecture videos: ``` # Karpathy lecture videos urls = ["https://youtu.be/VMj-3S1tku0" "https://youtu.be/PaCmpygFfXo", "https://youtu.be/TCH_1BHY58I", "https://youtu.be/P6sfmUTpUmc", "https://youtu.be/q8SA3rM6ckI", "https://youtu.be/t3YJ5hKiMQ0", "https://youtu.be/kCc8FmEb1nY"] # Directory to save audio files save_dir = "~/Downloads/YouTube" # Transcribe the videos to text loader = GenericLoader(YoutubeAudioLoader(urls,save_dir),OpenAIWhisperParser()) docs = loader.load() ```
This introduces the
YoutubeAudioLoader
, which will load blobs from a YouTube url and write them. Blobs are then parsed byOpenAIWhisperParser()
, as show in this PR, but we extend the parser to split audio such that each chuck meets the 25MB OpenAI size limit. As shown in the notebook, this enables a very simple UX:Tested on full set of Karpathy lecture videos: