-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Diarization Output Modified #1586
Changes from 5 commits
b5d4ceb
a4c2ca4
f7e4131
5949a81
f1662fe
4fbefa3
b105e2a
b53296a
46c1f43
99ed289
3ef4a0d
146a180
597dc0a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -46,7 +46,6 @@ def transcribe_file_with_enhanced_model(speech_file): | |
audio = speech.types.RecognitionAudio(content=content) | ||
config = speech.types.RecognitionConfig( | ||
encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16, | ||
sample_rate_hertz=8000, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is sample_rate removed? I am sure there is good reason. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We just had a discussion about it with Roy and Jerjou. If the input file has a different sample rate, it will cause an error. It is simpler just to omit it and the API figures it out on its own. |
||
language_code='en-US', | ||
# Enhanced models are only available to projects that | ||
# opt in for audio data collection. | ||
|
@@ -95,7 +94,6 @@ def transcribe_file_with_metadata(speech_file): | |
audio = speech.types.RecognitionAudio(content=content) | ||
config = speech.types.RecognitionConfig( | ||
encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16, | ||
sample_rate_hertz=8000, | ||
language_code='en-US', | ||
# Add this in the request to send metadata. | ||
metadata=metadata) | ||
|
@@ -125,7 +123,6 @@ def transcribe_file_with_auto_punctuation(speech_file): | |
audio = speech.types.RecognitionAudio(content=content) | ||
config = speech.types.RecognitionConfig( | ||
encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16, | ||
sample_rate_hertz=8000, | ||
language_code='en-US', | ||
# Enable automatic punctuation | ||
enable_automatic_punctuation=True) | ||
|
@@ -156,21 +153,18 @@ def transcribe_file_with_diarization(speech_file): | |
|
||
config = speech.types.RecognitionConfig( | ||
encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16, | ||
sample_rate_hertz=16000, | ||
language_code='en-US', | ||
enable_speaker_diarization=True, | ||
diarization_speaker_count=2) | ||
|
||
print('Waiting for operation to complete...') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consider using python logging facility. Understandably, for this sample it might be overkill so take it or leave it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FWIW nearly all our other Python samples do |
||
response = client.recognize(config, audio) | ||
|
||
for i, result in enumerate(response.results): | ||
alternative = result.alternatives[0] | ||
print('-' * 20) | ||
print('First alternative of result {}: {}' | ||
.format(i, alternative.transcript)) | ||
print('Speaker Tag for the first word: {}' | ||
.format(alternative.words[0].speaker_tag)) | ||
result = response.results[-1] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A comment here explaining why you're only taking the last result (instead of all of them) would probably be helpful. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good idea. Adding it. |
||
words_info = result.alternatives[0].words | ||
pieces = ['%s (%s)' % (word_info.word, word_info.speaker_tag) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Isn't There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point. I will modify it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. pieces is an odd variable name. |
||
for word_info in words_info] | ||
print(' '.join(pieces)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I feel this might not illustrate the "typical" use case, where the developer might more likely want to group and join the words according to their speaker_tag. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Interesting point. Tough to say what the right use case is. But I see it just as a sample. to show them the API, and not the use case. Do you think we can keep it as is, or should we change it? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree - in that sense perhaps let's just iterate through words_info and print everything without the nice formatting of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I agree with the way it's formatted but whatever you decide consider this syntax pieces = ['{} ({})'.format(word_info.word, word_info.speaker_tag) for word_info in words_info] There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
# [END speech_transcribe_diarization] | ||
|
||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -51,10 +51,10 @@ def test_transcribe_file_with_auto_punctuation(capsys): | |
|
||
def test_transcribe_diarization(capsys): | ||
transcribe_file_with_diarization( | ||
os.path.join(RESOURCES, 'Google_Gnome.wav')) | ||
os.path.join(RESOURCES, 'commercial_mono.wav')) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consider adding file name as an argument and not hardcode. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Even for the unit tests? |
||
out, err = capsys.readouterr() | ||
|
||
assert 'OK Google stream stranger things from Netflix to my TV' in out | ||
assert "I'm (1) here (1) hi (2)" in out | ||
|
||
|
||
def test_transcribe_multichannel_file(capsys): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason for omitting this? For WAV files, the API can infer this, but in the general case it's probably a good idea to include the sample rate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I keep going back and forth about it. Sometimes it is useful, and other times it is causing an error when the input file has a different sample rate.