diff --git a/docs/index.rst b/docs/index.rst index 71ae4478634b..ec1ea380c7e8 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -148,6 +148,13 @@ vision-usage +.. toctree:: + :maxdepth: 0 + :hidden: + :caption: Natural Language + + language-usage + .. toctree:: :maxdepth: 0 :hidden: diff --git a/docs/language-usage.rst b/docs/language-usage.rst new file mode 100644 index 000000000000..83b965bad1b4 --- /dev/null +++ b/docs/language-usage.rst @@ -0,0 +1,279 @@ +Using the API +============= + +The `Google Natural Language`_ API can be used to reveal the +structure and meaning of text via powerful machine +learning models. You can use it to extract information about +people, places, events and much more, mentioned in text documents, +news articles or blog posts. You can use it to understand +sentiment about your product on social media or parse intent from +customer conversations happening in a call center or a messaging +app. You can analyze text uploaded in your request or integrate +with your document storage on Google Cloud Storage. + +.. warning:: + + This is a Beta release of Google Cloud Natural Language API. This + API is not intended for real-time usage in critical applications. + +.. _Google Natural Language: https://cloud.google.com/natural-language/docs/getting-started + +Client +------ + +:class:`~gcloud.language.client.Client` objects provide a +means to configure your application. Each instance holds +both a ``project`` and an authenticated connection to the +Natural Language service. + +For an overview of authentication in ``gcloud-python``, see +:doc:`gcloud-auth`. + +Assuming your environment is set up as described in that document, +create an instance of :class:`~gcloud.language.client.Client`. + + .. code-block:: python + + >>> from gcloud import language + >>> client = language.Client() + +By default the ``language`` is ``'en'`` and the ``encoding`` is +UTF-8. To over-ride these values: + + .. code-block:: python + + >>> client = language.Client(language='es', + ... encoding=encoding=language.Encoding.UTF16) + +The encoding can be one of +:attr:`Encoding.UTF8 `, +:attr:`Encoding.UTF16 `, or +:attr:`Encoding.UTF32 `. + +Methods +------- + +The Google Natural Language API has three supported methods + +- `analyzeEntities`_ +- `analyzeSentiment`_ +- `annotateText`_ + +and each method uses a `Document`_ for representing text. To +create a :class:`~gcloud.language.document.Document`, + + .. code-block:: python + + >>> text_content = ( + ... 'Google, headquartered in Mountain View, unveiled the ' + ... 'new Android phone at the Consumer Electronic Show. ' + ... 'Sundar Pichai said in his keynote that users love ' + ... 'their new Android phones.') + >>> document = client.document_from_text(text_content) + +By using :meth:`~gcloud.language.client.Client.document_from_text`, +the document's type is plain text: + + .. code-block:: python + + >>> document.doc_type == language.Document.PLAIN_TEXT + True + +In addition, the document's language defaults to the language on +the client + + .. code-block:: python + + >>> document.language + 'en' + >>> document.language == client.language + True + +In addition, the +:meth:`~gcloud.language.client.Client.document_from_html`, +factory can be used to created an HTML document. In this +method and the from text method, the language can be +over-ridden: + + .. code-block:: python + + >>> html_content = """\ + ... + ... + ... El Tiempo de las Historias</time> + ... </head> + ... <body> + ... <p>La vaca saltó sobre la luna.</p> + ... </body> + ... </html> + ... """ + >>> document = client.document_from_html(html_content, + ... language='es') + +The ``language`` argument can be either ISO-639-1 or BCP-47 language +codes; at the time, only English, Spanish, and Japanese `are supported`_. +However, the ``analyzeSentiment`` method `only supports`_ English text. + +.. _are supported: https://cloud.google.com/natural-language/docs/ +.. _only supports: https://cloud.google.com/natural-language/reference/rest/v1beta1/documents/analyzeSentiment#body.request_body.FIELDS.document + +The document type (``doc_type``) value can be one of +:attr:`Document.PLAIN_TEXT <gcloud.language.document.Document.PLAIN_TEXT>` or +:attr:`Document.HTML <gcloud.language.document.Document.HTML>`. + +In addition to supplying the text / HTML content, a document can refer +to content stored in `Google Cloud Storage`_. We can use the +:meth:`~gcloud.language.client.Client.document_from_blob` method: + + .. code-block:: python + + >>> document = client.document_from_blob(bucket='my-text-bucket', + ... blob='sentiment-me.txt') + >>> document.gcs_url + 'gs://my-text-bucket/sentiment-me.txt' + >>> document.doc_type == language.Document.PLAIN_TEXT + True + +and the :meth:`~gcloud.language.client.Client.document_from_uri` +method. In either case, the document type can be specified with +the ``doc_type`` argument: + + .. code-block:: python + + >>> gcs_url = 'gs://my-text-bucket/sentiment-me.txt' + >>> document = client.document_from_uri( + ... gcs_url, doc_type=language.Document.HTML) + >>> document.gcs_url == gcs_url + True + >>> document.doc_type == language.Document.HTML + True + +.. _analyzeEntities: https://cloud.google.com/natural-language/reference/rest/v1beta1/documents/analyzeEntities +.. _analyzeSentiment: https://cloud.google.com/natural-language/reference/rest/v1beta1/documents/analyzeSentiment +.. _annotateText: https://cloud.google.com/natural-language/reference/rest/v1beta1/documents/annotateText +.. _Document: https://cloud.google.com/natural-language/reference/rest/v1beta1/Document +.. _Google Cloud Storage: https://cloud.google.com/storage/ + +Analyze Entities +---------------- + +The :meth:`~gcloud.language.document.Document.analyze_entities` method +finds named entities (i.e. proper names) in the text and returns them +as a :class:`list` of :class:`~gcloud.language.entity.Entity` objects. +Each entity has a corresponding type, salience (prominence), associated +metadata and other properties. + + .. code-block:: python + + >>> text_content = ("Michelangelo Caravaggio, Italian painter, is " + ... "known for 'The Calling of Saint Matthew'.") + >>> document = client.document(text_content) + >>> entities = document.analyze_entities() + >>> for entity in entities: + ... print('=' * 20) + ... print(' name: %s' % (entity.name,)) + ... print(' type: %s' % (entity.entity_type,)) + ... print('metadata: %s' % (entity.metadata,)) + ... print('salience: %s' % (entity.salience,)) + ==================== + name: Michelangelo Caravaggio + type: PERSON + metadata: {'wikipedia_url': 'http://en.wikipedia.org/wiki/Caravaggio'} + salience: 0.75942981 + ==================== + name: Italian + type: LOCATION + metadata: {'wikipedia_url': 'http://en.wikipedia.org/wiki/Italy'} + salience: 0.20193423 + ==================== + name: The Calling of Saint Matthew + type: WORK_OF_ART + metadata: {'wikipedia_url': 'http://en.wikipedia.org/wiki/index.html?curid=2838808'} + salience: 0.03863598 + +Analyze Sentiment +----------------- + +The :meth:`~gcloud.language.document.Document.analyze_sentiment` method +analyzes the sentiment of the provided text and returns a +:class:`~gcloud.language.sentiment.Sentiment`. Currently, this method +only supports English text. + + .. code-block:: python + + >>> text_content = "Jogging isn't very fun." + >>> document = client.document(text_content) + >>> sentiment = document.analyze_sentiment() + >>> print(sentiment.polarity) + -1 + >>> print(sentiment.magnitude) + 0.8 + +Annotate Text +------------- + +The :meth:`~gcloud.language.document.Document.annotate_text` method +analyzes a document and is intended for users who are familiar with +machine learning and need in-depth text features to build upon. + +The method returns a named tuple with four entries: + +* ``sentences``: A :class:`list` of sentences in the text +* ``tokens``: A :class:`list` of :class:`~gcloud.language.token.Token` + object (e.g. words, punctuation) +* ``sentiment``: The :class:`~gcloud.language.sentiment.Sentiment` of + the text (as returned by + :meth:`~gcloud.language.document.Document.analyze_sentiment`) +* ``entities``: :class:`list` of :class:`~gcloud.language.entity.Entity` + objects extracted from the text (as returned by + :meth:`~gcloud.language.document.Document.analyze_entities`) + +By default :meth:`~gcloud.language.document.Document.annotate_text` has +three arguments ``include_syntax``, ``include_entities`` and +``include_sentiment`` which are all :data:`True`. However, each of these +`Features`_ can be selectively turned off by setting the corresponding +arguments to :data:`False`. + +When ``include_syntax=False``, ``sentences`` and ``tokens`` in the +response is :data:`None`. When ``include_sentiment``, ``sentiment`` in +the response is :data:`None`. When ``include_entities``, ``entities`` in +the response is :data:`None`. + + .. code-block:: python + + >>> text_content = 'The cow jumped over the Moon.' + >>> document = client.document(text_content) + >>> annotations = document.annotate_text() + >>> # Sentences present if include_syntax=True + >>> print(annotations.sentences) + ['The cow jumped over the Moon.'] + >>> # Tokens present if include_syntax=True + >>> for token in annotations.tokens: + ... msg = '%11s: %s' % (token.part_of_speech, token.text_content) + ... print(msg) + DETERMINER: The + NOUN: cow + VERB: jumped + ADPOSITION: over + DETERMINER: the + NOUN: Moon + PUNCTUATION: . + >>> # Sentiment present if include_sentiment=True + >>> print(annotations.sentiment.polarity) + 1 + >>> print(annotations.sentiment.magnitude) + 0.1 + >>> # Entities present if include_entities=True + >>> for entity in annotations.entities: + ... print('=' * 20) + ... print(' name: %s' % (entity.name,)) + ... print(' type: %s' % (entity.entity_type,)) + ... print('metadata: %s' % (entity.metadata,)) + ... print('salience: %s' % (entity.salience,)) + ==================== + name: Moon + type: LOCATION + metadata: {'wikipedia_url': 'http://en.wikipedia.org/wiki/Natural_satellite'} + salience: 0.11793101 + +.. _Features: https://cloud.google.com/natural-language/reference/rest/v1beta1/documents/annotateText#Features