Even though whisper transcribes in chunks of 30s are the vector embeddings and attention available for the further chunks ? #2325

agandhigoto · 2024-09-08T15:11:48Z

agandhigoto
Sep 8, 2024

I don't understand this concept fully hence asking for clarification -
Even though whisper transcribes in chunks of 30s are the vector embeddings and attention available for the further chunks.

Take an example -

Chunk 1: "The bank manager told me to sign the papers at the branch. Later, when I returned..."
Chunk 2: "...to the branch, I noticed that the teller was gone."

Chunk 1 - Clearly sets the context for a vector embedding around branch with previous context as bank.
Chunk 2 - May not know branch is in context of a tree or a bank or a river unless attention is still active here.

The reason I ask is will the quality differ to transcribe chunks of audio in 30s(done externally lets say for a stream) or pass the full audio and let whisper chunk in 30s windows. The first case as per my understanding will reset embeddings and attention (Even if i pass audio with some overlap lets say 5s it would only carry over the common part only - not from a chunk 5 mins earlier).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Even though whisper transcribes in chunks of 30s are the vector embeddings and attention available for the further chunks ? #2325

{{title}}

Replies: 0 comments

Select a reply

Even though whisper transcribes in chunks of 30s are the vector embeddings and attention available for the further chunks ? #2325

agandhigoto Sep 8, 2024

Replies: 0 comments

agandhigoto
Sep 8, 2024