Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Speech Streaming API. #2523

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,7 @@
speech-encoding
speech-metadata
speech-operation
speech-streaming
speech-sample
speech-transcript

Expand Down
15 changes: 15 additions & 0 deletions docs/speech-streaming.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
Streaming Speech Response
=========================

.. automodule:: google.cloud.speech.streaming_response
:members:
:undoc-members:
:show-inheritance:

Streaming Speech Result
=======================

.. automodule:: google.cloud.speech.streaming_result
:members:
:undoc-members:
:show-inheritance:
88 changes: 80 additions & 8 deletions docs/speech-usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -51,10 +51,9 @@ See: `Speech Asynchronous Recognize`_

>>> import time
>>> from google.cloud import speech
>>> from google.cloud.speech.encoding import Encoding
>>> client = speech.Client()
>>> sample = client.sample(source_uri='gs://my-bucket/recording.flac',
... encoding=Encoding.LINEAR16,
... encoding=speech.Encoding.LINEAR16,
... sample_rate=44100)
>>> operation = client.async_recognize(sample, max_alternatives=2)
>>> retry_count = 100
Expand Down Expand Up @@ -82,10 +81,9 @@ Great Britian.
.. code-block:: python

>>> from google.cloud import speech
>>> from google.cloud.speech.encoding import Encoding
>>> client = speech.Client()
>>> sample = client.sample(source_uri='gs://my-bucket/recording.flac',
... encoding=Encoding.FLAC,
... encoding=speech.Encoding.FLAC,
... sample_rate=44100)
>>> operation = client.async_recognize(sample, max_alternatives=2)
>>> alternatives = client.sync_recognize(
Expand All @@ -107,10 +105,9 @@ Example of using the profanity filter.
.. code-block:: python

>>> from google.cloud import speech
>>> from google.cloud.speech.encoding import Encoding
>>> client = speech.Client()
>>> sample = client.sample(source_uri='gs://my-bucket/recording.flac',
... encoding=Encoding.FLAC,
... encoding=speech.Encoding.FLAC,
... sample_rate=44100)
>>> alternatives = client.sync_recognize(sample, max_alternatives=1,
... profanity_filter=True)
Expand All @@ -129,10 +126,9 @@ words to the vocabulary of the recognizer.
.. code-block:: python

>>> from google.cloud import speech
>>> from google.cloud.speech.encoding import Encoding
>>> client = speech.Client()
>>> sample = client.sample(source_uri='gs://my-bucket/recording.flac',
... encoding=Encoding.FLAC,
... encoding=speech.Encoding.FLAC,
... sample_rate=44100)
>>> hints = ['hi', 'good afternoon']
>>> alternatives = client.sync_recognize(sample, max_alternatives=2,
Expand All @@ -145,5 +141,81 @@ words to the vocabulary of the recognizer.
transcript: Hello, this is a test
confidence: 0.81


Streaming Recognition
---------------------

The :meth:`~google.cloud.speech.Client.stream_recognize` method converts speech
data to possible text alternatives on the fly.

.. note::
Streaming recognition requests are limited to 1 minute of audio.

See: https://cloud.google.com/speech/limits#content

.. code-block:: python

>>> from google.cloud import speech
>>> client = speech.Client()
>>> with open('./hello.wav', 'rb') as stream:
... sample = client.sample(stream=stream, encoding=speech.Encoding.LINEAR16,
... sample_rate=16000)
... for response in client.stream_recognize(sample):
... print(response.transcript)
... print(response.is_final)
hello
True

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.



By setting ``interim_results`` to :data:`True`, interim results (tentative hypotheses)
may be returned as they become available (these interim results are indicated
with the ``is_final=false`` flag). If :data:`False` or omitted, only ``is_final=true``
result(s) are returned.

.. code-block:: python

>>> from google.cloud import speech
>>> client = speech.Client()
>>> with open('./hello.wav', 'rb') as stream:
... sample = client.sample(stream=stream, encoding=speech.Encoding.LINEAR16,
... sample_rate=16000)
... for response in client.stream_recognize(sample,
... interim_results=True):
... print('====Response====')
... print(response.transcript)
... print(response.is_final)
====Response====
he
False
====Response====
hell
False
====Repsonse====

This comment was marked as spam.

hello
True

This comment was marked as spam.



By default the recognizer will perform continuous recognition
(continuing to process audio even if the user pauses speaking) until the client
closes the output stream or when the maximum time limit has been reached.

If you only want to recognize a single utterance you can set
``single_utterance`` to ``True`` and only one result will be returned.

See: `Single Utterance`_

.. code-block:: python

>>> with open('./hello_pause_goodbye.wav', 'rb') as stream:
... sample = client.sample(stream=stream, encoding=speech.Encoding.LINEAR16,
... sample_rate=16000)
... for response in client.stream_recognize(sample,
... single_utterance=True):
... print(response.transcript)
... print(response.is_final)
hello

This comment was marked as spam.

True

.. _Single Utterance: https://cloud.google.com/speech/reference/rpc/google.cloud.speech.v1beta1#streamingrecognitionconfig
.. _sync_recognize: https://cloud.google.com/speech/reference/rest/v1beta1/speech/syncrecognize
.. _Speech Asynchronous Recognize: https://cloud.google.com/speech/reference/rest/v1beta1/speech/asyncrecognize
1 change: 1 addition & 0 deletions scripts/verify_included_modules.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@
'google.cloud.pubsub.__init__',
'google.cloud.resource_manager.__init__',
'google.cloud.speech.__init__',
'google.cloud.speech.streaming.__init__',

This comment was marked as spam.

'google.cloud.storage.__init__',
'google.cloud.streaming.__init__',
'google.cloud.streaming.buffered_stream',
Expand Down
1 change: 1 addition & 0 deletions speech/google/cloud/speech/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,4 @@

from google.cloud.speech.client import Client
from google.cloud.speech.connection import Connection
from google.cloud.speech.encoding import Encoding
Loading