-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Diarization Output Modified #1586
Conversation
@dizcology , @tswast , @sirtorry I had to adjust the output in this sample a little. Can you please review this PR? Thanks. |
speech/cloud-client/beta_snippets.py
Outdated
words_info = result.alternatives[0].words | ||
pieces = ['%s (%s)' % (word_info.word, word_info.speaker_tag) | ||
for word_info in words_info] | ||
print(' '.join(pieces)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this might not illustrate the "typical" use case, where the developer might more likely want to group and join the words according to their speaker_tag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting point. Tough to say what the right use case is. But I see it just as a sample. to show them the API, and not the use case. Do you think we can keep it as is, or should we change it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree - in that sense perhaps let's just iterate through words_info and print everything without the nice formatting of '%s (%s)'
. the expected output of the test really confused me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I agree with the way it's formatted but whatever you decide consider this syntax
pieces = ['{} ({})'.format(word_info.word, word_info.speaker_tag) for word_info in words_info]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
speech/cloud-client/beta_snippets.py
Outdated
.format(i, alternative.transcript)) | ||
print('Speaker Tag for the first word: {}' | ||
.format(alternative.words[0].speaker_tag)) | ||
result = response.results[-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment here explaining why you're only taking the last result (instead of all of them) would probably be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Adding it.
speech/cloud-client/beta_snippets.py
Outdated
@@ -46,7 +46,6 @@ def transcribe_file_with_enhanced_model(speech_file): | |||
audio = speech.types.RecognitionAudio(content=content) | |||
config = speech.types.RecognitionConfig( | |||
encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16, | |||
sample_rate_hertz=8000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason for omitting this? For WAV files, the API can infer this, but in the general case it's probably a good idea to include the sample rate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I keep going back and forth about it. Sometimes it is useful, and other times it is causing an error when the input file has a different sample rate.
speech/cloud-client/beta_snippets.py
Outdated
.format(alternative.words[0].speaker_tag)) | ||
result = response.results[-1] | ||
words_info = result.alternatives[0].words | ||
pieces = ['%s (%s)' % (word_info.word, word_info.speaker_tag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't '{} ({})'.format(word_info.word, word_info.speaker_tag)
the preferred way to do this these days? Or is Thea no longer benevolent overlord?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I will modify it.
speech/cloud-client/beta_snippets.py
Outdated
@@ -46,7 +46,6 @@ def transcribe_file_with_enhanced_model(speech_file): | |||
audio = speech.types.RecognitionAudio(content=content) | |||
config = speech.types.RecognitionConfig( | |||
encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16, | |||
sample_rate_hertz=8000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is sample_rate removed? I am sure there is good reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We just had a discussion about it with Roy and Jerjou. If the input file has a different sample rate, it will cause an error. It is simpler just to omit it and the API figures it out on its own.
@@ -51,10 +51,10 @@ def test_transcribe_file_with_auto_punctuation(capsys): | |||
|
|||
def test_transcribe_diarization(capsys): | |||
transcribe_file_with_diarization( | |||
os.path.join(RESOURCES, 'Google_Gnome.wav')) | |||
os.path.join(RESOURCES, 'commercial_mono.wav')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding file name as an argument and not hardcode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even for the unit tests?
speech/cloud-client/beta_snippets.py
Outdated
words_info = result.alternatives[0].words | ||
pieces = ['%s (%s)' % (word_info.word, word_info.speaker_tag) | ||
for word_info in words_info] | ||
print(' '.join(pieces)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I agree with the way it's formatted but whatever you decide consider this syntax
pieces = ['{} ({})'.format(word_info.word, word_info.speaker_tag) for word_info in words_info]
speech/cloud-client/beta_snippets.py
Outdated
.format(alternative.words[0].speaker_tag)) | ||
result = response.results[-1] | ||
words_info = result.alternatives[0].words | ||
pieces = ['%s (%s)' % (word_info.word, word_info.speaker_tag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pieces is an odd variable name.
@@ -156,21 +153,18 @@ def transcribe_file_with_diarization(speech_file): | |||
|
|||
config = speech.types.RecognitionConfig( | |||
encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16, | |||
sample_rate_hertz=16000, | |||
language_code='en-US', | |||
enable_speaker_diarization=True, | |||
diarization_speaker_count=2) | |||
|
|||
print('Waiting for operation to complete...') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using python logging facility. Understandably, for this sample it might be overkill so take it or leave it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW nearly all our other Python samples do print()
. It's true that it's not always the recommended practice in production, but it's easy to understand. With logging there's always the risk that the developer has some weird config where the logs end up where maybe they don't expect.
@dizcology , @puneith, I made some changes based on the comments made by all the reviewers? Can you please take a final look and approve the PR if it looks good? |
speech/cloud-client/beta_snippets.py
Outdated
print('Speaker Tag for the first word: {}' | ||
.format(alternative.words[0].speaker_tag)) | ||
# response.results contains partial results with the last item | ||
# containing the entire result: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mm... not quite. The transcript within each result is separate and sequential per result. However, the words list within an alternative (for whatever reason) includes all the words from all the results thus far. Thus, to get all the words with speaker tags, you only have to take the words list from the last result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Thanks for the clarification. Let me update the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really understanding your comment, @jerjou but this sounds like something that needs to be documented on the cloud.google.com docs with a briefer explanation in the sample itself.
speech/cloud-client/beta_snippets.py
Outdated
if speakers_words and speakers_words[-1][0] == word_info.speaker_tag: | ||
speakers_words[-1][1].append(word_info.word) | ||
else: | ||
speakers_words.append((word_info.speaker_tag, [word_info.word, ])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit hard to read. An intermediate variable and a namedtuple
would go a long way to making this clearer:
Speaker = collections.namedtuple('Speaker', ['tag', 'words'])
speaker_words = [Speaker(tag=0, words=[])]
for word_info in words_info:
current_speaker = speaker_words[-1]
if current_speaker.tag == word_info.speaker_tag:
current_speaker.words.append(word_info.word)
else:
speaker_words.append(Speaker(tag=word_info.speaker_tag, words=[word_info.word]))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, maybe speaker_sequence
or something, to make it clear it's not just a speaker->words_they_spoke mapping, and is actually the conversation / words spoken in sequence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting idea. However, I think the code readability is mainly the result of the intermediate variable Speaker(tag=0, words=[])
as the first element in the list, allowing us to define current_speaker
every time.
The downside of it is that we are introducing a new object that is not returned by the API and we will have to handle it in the next step separately (either by removing it, or by skipping it), which reduces the code cleanness in another way.
So, while I am not against the suggested solution, I am also not sure if it is really helping with the code readability that much.
What do you think @jerjou ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about this:
speakers = []
words = []
for word_info in words_info:
if (not speakers) or speakers[-1] != word_info.speaker_tag:
speakers.append(word_info.speaker_tag)
words.append([])
words[-1].append(word_info.word)
I think this is more readable that the current code, without introducing the intermediate variable.
elif args.command == 'multi-language': | ||
transcribe_file_with_multilanguage(args.path, args.first, args.second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For future reference, argparse's sub-commands feature would be helpful to avoid having args that only matter for one command or another.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
speech/cloud-client/beta_snippets.py
Outdated
# Separating the words by who said what: | ||
speakers_words = [] | ||
for word_info in words_info: | ||
if speakers_words and speakers_words[-1][0] == word_info.speaker_tag: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly what this piece of for loop is doing (creating list of words for tag) isn't is better to use hashmap. Does this loop work what it's supposed to do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jerjou is already reviewing this code so I should stop reviewing. You are already is great hands :)
@@ -156,21 +153,18 @@ def transcribe_file_with_diarization(speech_file): | |||
|
|||
config = speech.types.RecognitionConfig( | |||
encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16, | |||
sample_rate_hertz=16000, | |||
language_code='en-US', | |||
enable_speaker_diarization=True, | |||
diarization_speaker_count=2) | |||
|
|||
print('Waiting for operation to complete...') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW nearly all our other Python samples do print()
. It's true that it's not always the recommended practice in production, but it's easy to understand. With logging there's always the risk that the developer has some weird config where the logs end up where maybe they don't expect.
speech/cloud-client/beta_snippets.py
Outdated
print('Speaker Tag for the first word: {}' | ||
.format(alternative.words[0].speaker_tag)) | ||
# The transcript within each result is separate and sequential per result. | ||
# However, the words list within an alternative (for whatever reason) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that this comment is to explain why the [-1]
, but I'm having trouble understanding what this is referring to. (Probably because I don't know what an "alternative" means in this context.)
I think you could probably get by with fewer words within the sample with just
# To get all words with speaker tags, you only have to take the words list from the last result.
and leave the extra explanation about alternatives for the actual cloud.google.com docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially had a shorter comment. But Jerjou suggested to explain it more, so I added the longer comment. I am okay either way, but I also think this is just a sample and not the formal documentation and the sample does not need to go over all the details of what's what.
speech/cloud-client/beta_snippets.py
Outdated
speakers = [] | ||
words = [] | ||
for word_info in words_info: | ||
if (not speakers) or speakers[-1] != word_info.speaker_tag: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this implementation means that if speakers appear in certain orders you could end up with duplicates. For example.
word_info
speakers: A, B, B, A
would result in a speakers
lists: A, B, A
.
I think this sample would be clearer with a single dictionary of speakers to words rather than two lists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had a long discussion about the code readability of this part of the sample. The API returns each word with an associated speak_tag. So, if I am speaker #1 and I say "I am ok", the words 'I' and 'am' and 'okay' will all be tagged with speaker_tag: 1.
Since the point of the sample is mostly just to showcase the API call and how to unbox the response object, I am beginning to think that this particular implementation is adding more confusion than clarity. I am leaning back toward a simpler implementation that just iterates over words and prints each word along with the associated speaker_tag. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, printing each word with a speaker tag would be much clearer.
You could do something like
current_speaker = None
for word_info in words_info:
if current_speaker is None or current_speaker != word_info.speaker_tag:
current_speaker = word_info.speaker_tag
print()
print(current_speaker)
print(word_info.word)
if you wanted to keep them grouped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new loop for speaker tags is much easier to understand, thanks.
Talked to Puneith offline about it. There is no issue to be resolved.
Shoot. I totally dropped off this PR, didn't I. Sorry - bad email management -_-; Glad y'all just went on without me. |
…-samples#1586) * Printing the last paragraph only. * Python3 print * Removing sample rate setting * Adding the missing output parameter in the example * Changes based on the comments * Removed filenames as input parameters * Removed unused args * Updated README file * Updated the inline comment * Modified code to make it more readable * Simplified the response object processing. * Fixing the long line issue.
…-samples#1586) * Printing the last paragraph only. * Python3 print * Removing sample rate setting * Adding the missing output parameter in the example * Changes based on the comments * Removed filenames as input parameters * Removed unused args * Updated README file * Updated the inline comment * Modified code to make it more readable * Simplified the response object processing. * Fixing the long line issue.
…-samples#1586) * Printing the last paragraph only. * Python3 print * Removing sample rate setting * Adding the missing output parameter in the example * Changes based on the comments * Removed filenames as input parameters * Removed unused args * Updated README file * Updated the inline comment * Modified code to make it more readable * Simplified the response object processing. * Fixing the long line issue.
…-samples#1586) * Printing the last paragraph only. * Python3 print * Removing sample rate setting * Adding the missing output parameter in the example * Changes based on the comments * Removed filenames as input parameters * Removed unused args * Updated README file * Updated the inline comment * Modified code to make it more readable * Simplified the response object processing. * Fixing the long line issue.
…-samples#1586) * Printing the last paragraph only. * Python3 print * Removing sample rate setting * Adding the missing output parameter in the example * Changes based on the comments * Removed filenames as input parameters * Removed unused args * Updated README file * Updated the inline comment * Modified code to make it more readable * Simplified the response object processing. * Fixing the long line issue.
…-samples#1586) * Printing the last paragraph only. * Python3 print * Removing sample rate setting * Adding the missing output parameter in the example * Changes based on the comments * Removed filenames as input parameters * Removed unused args * Updated README file * Updated the inline comment * Modified code to make it more readable * Simplified the response object processing. * Fixing the long line issue.
…-samples#1586) * Printing the last paragraph only. * Python3 print * Removing sample rate setting * Adding the missing output parameter in the example * Changes based on the comments * Removed filenames as input parameters * Removed unused args * Updated README file * Updated the inline comment * Modified code to make it more readable * Simplified the response object processing. * Fixing the long line issue.
…-samples#1586) * Printing the last paragraph only. * Python3 print * Removing sample rate setting * Adding the missing output parameter in the example * Changes based on the comments * Removed filenames as input parameters * Removed unused args * Updated README file * Updated the inline comment * Modified code to make it more readable * Simplified the response object processing. * Fixing the long line issue.
* Printing the last paragraph only. * Python3 print * Removing sample rate setting * Adding the missing output parameter in the example * Changes based on the comments * Removed filenames as input parameters * Removed unused args * Updated README file * Updated the inline comment * Modified code to make it more readable * Simplified the response object processing. * Fixing the long line issue.
* Printing the last paragraph only. * Python3 print * Removing sample rate setting * Adding the missing output parameter in the example * Changes based on the comments * Removed filenames as input parameters * Removed unused args * Updated README file * Updated the inline comment * Modified code to make it more readable * Simplified the response object processing. * Fixing the long line issue.
* Printing the last paragraph only. * Python3 print * Removing sample rate setting * Adding the missing output parameter in the example * Changes based on the comments * Removed filenames as input parameters * Removed unused args * Updated README file * Updated the inline comment * Modified code to make it more readable * Simplified the response object processing. * Fixing the long line issue.
* Printing the last paragraph only. * Python3 print * Removing sample rate setting * Adding the missing output parameter in the example * Changes based on the comments * Removed filenames as input parameters * Removed unused args * Updated README file * Updated the inline comment * Modified code to make it more readable * Simplified the response object processing. * Fixing the long line issue.
* Printing the last paragraph only. * Python3 print * Removing sample rate setting * Adding the missing output parameter in the example * Changes based on the comments * Removed filenames as input parameters * Removed unused args * Updated README file * Updated the inline comment * Modified code to make it more readable * Simplified the response object processing. * Fixing the long line issue.
…-samples#1586) * Printing the last paragraph only. * Python3 print * Removing sample rate setting * Adding the missing output parameter in the example * Changes based on the comments * Removed filenames as input parameters * Removed unused args * Updated README file * Updated the inline comment * Modified code to make it more readable * Simplified the response object processing. * Fixing the long line issue.
…-samples#1586) * Printing the last paragraph only. * Python3 print * Removing sample rate setting * Adding the missing output parameter in the example * Changes based on the comments * Removed filenames as input parameters * Removed unused args * Updated README file * Updated the inline comment * Modified code to make it more readable * Simplified the response object processing. * Fixing the long line issue.
…-samples#1586) * Printing the last paragraph only. * Python3 print * Removing sample rate setting * Adding the missing output parameter in the example * Changes based on the comments * Removed filenames as input parameters * Removed unused args * Updated README file * Updated the inline comment * Modified code to make it more readable * Simplified the response object processing. * Fixing the long line issue.
…-samples#1586) * Printing the last paragraph only. * Python3 print * Removing sample rate setting * Adding the missing output parameter in the example * Changes based on the comments * Removed filenames as input parameters * Removed unused args * Updated README file * Updated the inline comment * Modified code to make it more readable * Simplified the response object processing. * Fixing the long line issue.
…-samples#1586) * Printing the last paragraph only. * Python3 print * Removing sample rate setting * Adding the missing output parameter in the example * Changes based on the comments * Removed filenames as input parameters * Removed unused args * Updated README file * Updated the inline comment * Modified code to make it more readable * Simplified the response object processing. * Fixing the long line issue.
…-samples#1586) * Printing the last paragraph only. * Python3 print * Removing sample rate setting * Adding the missing output parameter in the example * Changes based on the comments * Removed filenames as input parameters * Removed unused args * Updated README file * Updated the inline comment * Modified code to make it more readable * Simplified the response object processing. * Fixing the long line issue.
Printing the last paragraph only.