Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Language: add support for accessing ENUM display names #6169

Closed
beccasaurus opened this issue Oct 4, 2018 · 8 comments
Closed

Language: add support for accessing ENUM display names #6169

beccasaurus opened this issue Oct 4, 2018 · 8 comments
Assignees
Labels
api: language Issues related to the Cloud Natural Language API API. codegen type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@beccasaurus
Copy link
Contributor

Feature Request: add support for accessing display name of enum values

For example:

from google.cloud import language
from google.cloud.language import enums
from google.cloud.language import types

client = language.LanguageServiceClient()
document = types.Document(content=text, type=enums.Document.Type.PLAIN_TEXT)
tokens = client.analyze_syntax(document).tokens

for token in tokens:
    print(token.part_of_speech.tag)  # <-- tag is an enum field.
                                     #     link below to RPC reference for Tag.

Expected result, based on all of the other languages' GAPIC clients:

NOUN
ADV
VERB
NOUN

Actual result:

6
3
11
6

I get the expected result when using GAPIC client libraries in all of the other languages.

I would characterize the current behavior as: surprising


The current developer experience, if you want to access the enum values:

  1. Find the relevant enum, eg. Natural Language Part of Speech Tag (noun, verb, pronoun, etc)
  2. Copy the display names into an indexed list in your program,
    tag_names = ('UNKNOWN', 'ADJ', 'ADP', 'ADV', 'CONJ', 'DET', 'NOUN', 'NUM', 'PRON', 'PRT', 'PUNCT', 'VERB', 'X', 'AFFIX')
  3. Access the display name using the enum index int value available on the pb object
  4. Check back on the .proto occasionally incase new enum values have been appended

There may be a better way, but this is the process we use to author our Python samples.

Example: sample for analyzing syntax of text (natural-language/docs/analyzing-syntax)

def syntax_text(text):
    """Detects syntax in the text."""
    client = language.LanguageServiceClient()

    if isinstance(text, six.binary_type):
        text = text.decode('utf-8')

    # Instantiates a plain text document.
    document = types.Document(
        content=text,
        type=enums.Document.Type.PLAIN_TEXT)

    # Detects syntax in the document. You can also analyze HTML with:
    #   document.type == enums.Document.Type.HTML
    tokens = client.analyze_syntax(document).tokens

    # part-of-speech tags from enums.PartOfSpeech.Tag
    pos_tag = ('UNKNOWN', 'ADJ', 'ADP', 'ADV', 'CONJ', 'DET', 'NOUN', 'NUM',
               'PRON', 'PRT', 'PUNCT', 'VERB', 'X', 'AFFIX')

    for token in tokens:
        print(u'{}: {}'.format(pos_tag[token.part_of_speech.tag],
                               token.text.content))

/cc @lukesneeringer

/fyi @vchudnov-g @pongad @theacodes

@tseaver tseaver added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. api: language Issues related to the Cloud Natural Language API API. labels Oct 5, 2018
@tseaver tseaver changed the title Support for accessing ENUM display names Language: add support for accessing ENUM display names Oct 5, 2018
@tseaver tseaver added the codegen label Oct 5, 2018
@tseaver
Copy link
Contributor

tseaver commented Oct 5, 2018

@beccasaurus The only issue I can see is backward-compatibility: google-cloud-language is already GA, which means that changing the API surface is problematic. We are using the Python3 stdlib enum.IntEnum for these enumerations, which means that you can achieve the result you want via:

for token in tokens:
    print(token.part_of_speech.tag.name)

@theacodes
Copy link
Contributor

theacodes commented Oct 5, 2018

This is on @lukesneeringer & company. We can make finding, using, and passing enums easier as we control the parameter types but the return value of an API call is a protobuf.Message and the field types therein are totally determined by protobuf and there's nothing we (client libraries) can do. Luke has proposed a wrapper around protobuf.Message that could resolve this.

In the meantime, we can simplify your sample a little by using the enum interface:

def syntax_text(text):
    """Detects syntax in the text."""
    client = language.LanguageServiceClient()

    if isinstance(text, six.binary_type):
        text = text.decode('utf-8')

    # Instantiates a plain text document.
    document = types.Document(
        content=text,
        type=enums.Document.Type.PLAIN_TEXT)

    # Detects syntax in the document. You can also analyze HTML with:
    #   document.type == enums.Document.Type.HTML
    tokens = client.analyze_syntax(document).tokens

    for token in tokens:
        tag_name = enums.PartOfSpeech.Tag(token.part_of_speech.tag)
        content = token.text.content
        print(u'{!r}: {}'.format(tag_name, content))

This gets you closer to the ideal state while we wait for Luke's new generator to be ready. We can cut over clients after a reasonable deprecation period.

(Note: I edited this to add the !r format flag which prints the representation of the tag value, leaving it out will print the numeric value. You can also use enums.PartOfSpeech.Tag(token.part_of_speech.tag).name, but meh)

@beccasaurus
Copy link
Contributor Author

beccasaurus commented Oct 5, 2018

@tseaver The approach you recommended is not working. When I access the enum field of the pb, it gives me an int, as I initially described, not enum.IntEnum

Running the following code using google-cloud-language==1.0.2

for token in tokens:
    print('value of      token.part_of_speech.tag => {}'.format(token.part_of_speech.tag))
    print('class name of token.part_of_speech.tag => {}'.format(type(token.part_of_speech.tag).__name__))
    print('IntEnum name: token.part_of_speech.tag.name => {}'.format(token.part_of_speech.tag.name))

Output:

value of      token.part_of_speech.tag => 6
class name of token.part_of_speech.tag => int
Traceback (most recent call last):
  File "snippets.py", line 360, in <module>
    syntax_text(args.text)
  File "snippets.py", line 168, in syntax_text
    print('IntEnum name: token.part_of_speech.tag.name => {}'.format(token.part_of_speech.tag.name))
AttributeError: 'int' object has no attribute 'name'

Edit: note this is using Python 3.6.5

@beccasaurus
Copy link
Contributor Author

beccasaurus commented Oct 5, 2018

Why was this closed? If I read this correctly, this is on @lukesneeringer & company

@lukesneeringer what is the current plan and timeline?

Is this not a developer experience issue for Python users? If not, why?

Edit: is there a different tracking issue I can follow in the meantime?

@beccasaurus
Copy link
Contributor Author

@theacodes Is the approach you mentioned documented for our libraries?

I'm having difficulty finding any other current examples which use the enum interface as you recommended.

iiuc this is part of protoc's generated enum interfaces, so it is not in any way specific to google-cloud client libraries. Considering there seems to be a fair bit of confusion around this feature, would it be worth considering adding examples to google-cloud-python documentation where relevant?

My direct involvement is limited to projects to (1) generate GAPIC code samples which need to access enum display names (2) author code samples for ~8 specific APIs. I would like for my team to update our existing samples for these ~8 APIs to use this syntax, but I worry other users won't find this interface. Maybe that's a false worry, but there seems to be some confusion. The last time I reached out to @lukesneeringer directly, he actually thought this had been resolved and that the field is now supposed to be an IntEnum 🤷🏼‍♀️

@beccasaurus
Copy link
Contributor Author

beccasaurus commented Oct 5, 2018

Please note that I did not intend for this PR to be related to Natural Language.
I presumed more of a google-cloud-core issue or protobuf. re: @tseaver's tagging
I can move the discussion to protobuf or find an existing discussion.

I'll wait on @lukesneeringer's guidance before commenting further 🙂

@theacodes
Copy link
Contributor

theacodes commented Oct 5, 2018

Why was this closed? If I read this correctly, this is on @lukesneeringer & company

Because as mentioned, the team who maintains this can not directly resolve it. We can file an appropriate bug upstream on googleapis/gapic-generator-python for you, if you want, but I was going to leave that to @lukesneeringer to surface as he sees fit (I think this is already captured in his design doc for the proto wrapper, but it might be useful for your purposes to have a bug).

Is this not a developer experience issue for Python users? If not, why?

Sure, no one questioned that.

@theacodes Is the approach you mentioned documented for our libraries?

Good call, we could definitely surface this a bit better. At minimum file a feature request upstream to googleapis/gapic-generator to have the Enum classes generate some boilerplate docs that link to Python's IntEnum docs, something like:

This class is a :class:`enum.IntEnum`, so it's members can be accessed using the common :class:`enum.Enum` methods:

..code-block:: python
    >>> Color.RED
    <Color.RED: 1>
    >>> Color(1)
    <Color.RED: 1>
    >>> Color['RED']
    <Color.RED: 1>

More docs than that would need a bit more discussion. Maybe we could work together on a proposal on comprehensively documenting Enum across languages and submit it to the gapic team for implementation?

I worry other users won't find this interface.

Totally valid. Now that we have IntEnums we can make things mostly better but it is still slightly surprisingly the protobuf.Message just return integers. I think explicitly documenting how to translate those to "names" or just compare them to the list of enums is a good thing to do. We just can't resolve that directly here.

@beccasaurus
Copy link
Contributor Author

Thank you for the detailed response! 🙏

All makes sense! As I happen to work across GAPIC libraries in multiple languages (& ∴ pb objects)
it's surprising to me because I believe every other language gives the display name rather than the int

I was able to update some of our code samples on the website, so at least we'll have some examples showing how to get enum display values!

GoogleCloudPlatform/python-docs-samples#1738 (pending)

This is no longer blocking me (I can get the display name now!) and the syntax is relatively straightforward-ish for users who're intimately familiar with pb objects 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: language Issues related to the Cloud Natural Language API API. codegen type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

3 participants