Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diarization Output Modified #1586

Merged
merged 13 commits into from
Jul 20, 2018
2 changes: 1 addition & 1 deletion speech/cloud-client/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -221,7 +221,7 @@ To run this sample:

$ python beta_snippets.py

usage: beta_snippets.py [-h] command path first second
usage: beta_snippets.py [-h] command path [first] [second]

Google Cloud Speech API sample that demonstrates enhanced models
and recognition metadata.
Expand Down
16 changes: 5 additions & 11 deletions speech/cloud-client/beta_snippets.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,6 @@ def transcribe_file_with_enhanced_model(speech_file):
audio = speech.types.RecognitionAudio(content=content)
config = speech.types.RecognitionConfig(
encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=8000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason for omitting this? For WAV files, the API can infer this, but in the general case it's probably a good idea to include the sample rate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I keep going back and forth about it. Sometimes it is useful, and other times it is causing an error when the input file has a different sample rate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is sample_rate removed? I am sure there is good reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just had a discussion about it with Roy and Jerjou. If the input file has a different sample rate, it will cause an error. It is simpler just to omit it and the API figures it out on its own.

language_code='en-US',
# Enhanced models are only available to projects that
# opt in for audio data collection.
Expand Down Expand Up @@ -95,7 +94,6 @@ def transcribe_file_with_metadata(speech_file):
audio = speech.types.RecognitionAudio(content=content)
config = speech.types.RecognitionConfig(
encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=8000,
language_code='en-US',
# Add this in the request to send metadata.
metadata=metadata)
Expand Down Expand Up @@ -125,7 +123,6 @@ def transcribe_file_with_auto_punctuation(speech_file):
audio = speech.types.RecognitionAudio(content=content)
config = speech.types.RecognitionConfig(
encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=8000,
language_code='en-US',
# Enable automatic punctuation
enable_automatic_punctuation=True)
Expand Down Expand Up @@ -156,21 +153,18 @@ def transcribe_file_with_diarization(speech_file):

config = speech.types.RecognitionConfig(
encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=16000,
language_code='en-US',
enable_speaker_diarization=True,
diarization_speaker_count=2)

print('Waiting for operation to complete...')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using python logging facility. Understandably, for this sample it might be overkill so take it or leave it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW nearly all our other Python samples do print(). It's true that it's not always the recommended practice in production, but it's easy to understand. With logging there's always the risk that the developer has some weird config where the logs end up where maybe they don't expect.

response = client.recognize(config, audio)

for i, result in enumerate(response.results):
alternative = result.alternatives[0]
print('-' * 20)
print('First alternative of result {}: {}'
.format(i, alternative.transcript))
print('Speaker Tag for the first word: {}'
.format(alternative.words[0].speaker_tag))
result = response.results[-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment here explaining why you're only taking the last result (instead of all of them) would probably be helpful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Adding it.

words_info = result.alternatives[0].words
pieces = ['%s (%s)' % (word_info.word, word_info.speaker_tag)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't '{} ({})'.format(word_info.word, word_info.speaker_tag) the preferred way to do this these days? Or is Thea no longer benevolent overlord?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I will modify it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pieces is an odd variable name.

for word_info in words_info]
print(' '.join(pieces))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this might not illustrate the "typical" use case, where the developer might more likely want to group and join the words according to their speaker_tag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting point. Tough to say what the right use case is. But I see it just as a sample. to show them the API, and not the use case. Do you think we can keep it as is, or should we change it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree - in that sense perhaps let's just iterate through words_info and print everything without the nice formatting of '%s (%s)'. the expected output of the test really confused me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree with the way it's formatted but whatever you decide consider this syntax

pieces = ['{} ({})'.format(word_info.word, word_info.speaker_tag) for word_info in words_info]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some changes to make the output to look like this:

Speaker #1: I'm here
Speaker #2: hi I'd like to buy a Chrome Cast and I was wondering whether you could help me

# [END speech_transcribe_diarization]


Expand Down
4 changes: 2 additions & 2 deletions speech/cloud-client/beta_snippets_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,10 +51,10 @@ def test_transcribe_file_with_auto_punctuation(capsys):

def test_transcribe_diarization(capsys):
transcribe_file_with_diarization(
os.path.join(RESOURCES, 'Google_Gnome.wav'))
os.path.join(RESOURCES, 'commercial_mono.wav'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding file name as an argument and not hardcode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even for the unit tests?

out, err = capsys.readouterr()

assert 'OK Google stream stranger things from Netflix to my TV' in out
assert "I'm (1) here (1) hi (2)" in out


def test_transcribe_multichannel_file(capsys):
Expand Down
2 changes: 1 addition & 1 deletion texttospeech/cloud-client/audio_profile.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@

Example usage:
python audio_profile.py --text "hello" --effects_profile_id
"telephony-class-application"
"telephony-class-application" --output "output.mp3"
"""

import argparse
Expand Down