Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Labeling Beta samples #2096

Merged
merged 13 commits into from
Apr 5, 2019
78 changes: 78 additions & 0 deletions datalabeling/README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
.. This file is automatically generated. Do not edit this file directly.

Google Cloud Data Labeling Service Python Samples
===============================================================================

.. image:: https://gstatic.com/cloudssh/images/open-btn.png
:target: https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/GoogleCloudPlatform/python-docs-samples&page=editor&open_in_editor=datalabeling/README.rst


This directory contains samples for Google Cloud Data Labeling Service. `Google Cloud Data Labeling Service`_ allows developers to request having human labelers label a collection of data that you plan to use to train a custom machine learning model.




.. _Google Cloud Data Labeling Service: https://cloud.google.com/data-labeling/docs/

Setup
-------------------------------------------------------------------------------


Authentication
++++++++++++++

This sample requires you to have authentication setup. Refer to the
`Authentication Getting Started Guide`_ for instructions on setting up
credentials for applications.

.. _Authentication Getting Started Guide:
https://cloud.google.com/docs/authentication/getting-started

Install Dependencies
++++++++++++++++++++

#. Clone python-docs-samples and change directory to the sample directory you want to use.

.. code-block:: bash

$ git clone https://github.com/GoogleCloudPlatform/python-docs-samples.git

#. Install `pip`_ and `virtualenv`_ if you do not already have them. You may want to refer to the `Python Development Environment Setup Guide`_ for Google Cloud Platform for instructions.

.. _Python Development Environment Setup Guide:
https://cloud.google.com/python/setup

#. Create a virtualenv. Samples are compatible with Python 2.7 and 3.4+.

.. code-block:: bash

$ virtualenv env
$ source env/bin/activate

#. Install the dependencies needed to run the samples.

.. code-block:: bash

$ pip install -r requirements.txt

.. _pip: https://pip.pypa.io/
.. _virtualenv: https://virtualenv.pypa.io/



The client library
-------------------------------------------------------------------------------

This sample uses the `Google Cloud Client Library for Python`_.
You can read the documentation for more details on API usage and use GitHub
to `browse the source`_ and `report issues`_.

.. _Google Cloud Client Library for Python:
https://googlecloudplatform.github.io/google-cloud-python/
.. _browse the source:
https://github.com/GoogleCloudPlatform/google-cloud-python
.. _report issues:
https://github.com/GoogleCloudPlatform/google-cloud-python/issues


.. _Google Cloud SDK: https://cloud.google.com/sdk/
18 changes: 18 additions & 0 deletions datalabeling/README.rst.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# This file is used to generate README.rst

product:
name: Google Cloud Data Labeling Service
short_name: Cloud Data Labeling
url: https://cloud.google.com/data-labeling/docs/
description: >
`Google Cloud Data Labeling Service`_ allows developers to request having
human labelers label a collection of data that you plan to use to train a
custom machine learning model.

setup:
- auth
- install_deps

cloud_client_library: true

folder: datalabeling
77 changes: 77 additions & 0 deletions datalabeling/create_annotation_spec_set.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
#!/usr/bin/env python

# Copyright 2019 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import argparse


# [START datalabeling_create_annotation_spec_set_beta]
def create_annotation_spec_set(project_id):
"""Creates a data labeling annotation spec set for the given
Google Cloud project.
"""
from google.cloud import datalabeling_v1beta1 as datalabeling
client = datalabeling.DataLabelingServiceClient()

project_path = client.project_path(project_id)

annotation_spec_1 = datalabeling.types.AnnotationSpec(
display_name='label_1',
description='label_description_1'
)

annotation_spec_2 = datalabeling.types.AnnotationSpec(
display_name='label_2',
description='label_description_2'
)

annotation_spec_set = datalabeling.types.AnnotationSpecSet(
display_name='YOUR_ANNOTATION_SPEC_SET_DISPLAY_NAME',
description='YOUR_DESCRIPTION',
annotation_specs=[annotation_spec_1, annotation_spec_2]
)

response = client.create_annotation_spec_set(
project_path, annotation_spec_set)

# The format of the resource name:
# project_id/{project_id}/annotationSpecSets/{annotationSpecSets_id}
print('The annotation_spec_set resource name: {}'.format(response.name))
print('Display name: {}'.format(response.display_name))
print('Description: {}'.format(response.description))
print('Annotation specs:')
for annotation_spec in response.annotation_specs:
print('\tDisplay name: {}'.format(annotation_spec.display_name))
print('\tDescription: {}\n'.format(annotation_spec.description))

return response
# [END datalabeling_create_annotation_spec_set_beta]


if __name__ == '__main__':
parser = argparse.ArgumentParser(
description=__doc__,
formatter_class=argparse.RawDescriptionHelpFormatter
)

parser.add_argument(
'--project-id',
help='Project ID. Required.',
required=True
)

args = parser.parse_args()

create_annotation_spec_set(args.project_id)
36 changes: 36 additions & 0 deletions datalabeling/create_annotation_spec_set_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
#!/usr/bin/env python

# Copyright 2019 Google, Inc
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import os

import create_annotation_spec_set
from google.cloud import datalabeling_v1beta1 as datalabeling
import pytest

PROJECT_ID = os.getenv('GCLOUD_PROJECT')


@pytest.mark.slow
def test_create_annotation_spec_set(capsys):
response = create_annotation_spec_set.create_annotation_spec_set(
PROJECT_ID)
out, _ = capsys.readouterr()
assert 'The annotation_spec_set resource name:' in out

# Delete the created annotation spec set.
annotation_spec_set_name = response.name
client = datalabeling.DataLabelingServiceClient()
client.delete_annotation_spec_set(annotation_spec_set_name)
93 changes: 93 additions & 0 deletions datalabeling/create_instruction.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
#!/usr/bin/env python

# Copyright 2019 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import argparse


# [START datalabeling_create_instruction_beta]
def create_instruction(project_id, data_type, instruction_gcs_uri):
""" Creates a data labeling PDF instruction for the given Google Cloud
project. The PDF file should be uploaded to the project in
Google Cloud Storage.
"""
from google.cloud import datalabeling_v1beta1 as datalabeling
client = datalabeling.DataLabelingServiceClient()

project_path = client.project_path(project_id)

pdf_instruction = datalabeling.types.PdfInstruction(
gcs_file_uri=instruction_gcs_uri)

instruction = datalabeling.types.Instruction(
display_name='YOUR_INSTRUCTION_DISPLAY_NAME',
description='YOUR_DESCRIPTION',
data_type=data_type,
pdf_instruction=pdf_instruction
)

operation = client.create_instruction(project_path, instruction)

result = operation.result()

# The format of the resource name:
# project_id/{project_id}/instruction/{instruction_id}
print('The instruction resource name: {}\n'.format(result.name))
print('Display name: {}'.format(result.display_name))
print('Description: {}'.format(result.description))
print('Create time:')
print('\tseconds: {}'.format(result.create_time.seconds))
print('\tnanos: {}'.format(result.create_time.nanos))
print('Data type: {}'.format(
datalabeling.enums.DataType(result.data_type).name))
print('Pdf instruction:')
print('\tGcs file uri: {}'.format(
result.pdf_instruction.gcs_file_uri))

return result
# [END datalabeling_create_instruction_beta]


if __name__ == '__main__':
parser = argparse.ArgumentParser(
description=__doc__,
formatter_class=argparse.RawDescriptionHelpFormatter
)

parser.add_argument(
'--project-id',
help='Project ID. Required.',
required=True
)

parser.add_argument(
'--data-type',
help='Data type. Only support IMAGE, VIDEO, TEXT and AUDIO. Required.',
required=True
)

parser.add_argument(
'--instruction-gcs-uri',
help='The URI of Google Cloud Storage of the instruction. Required.',
required=True
)

args = parser.parse_args()

create_instruction(
args.project_id,
args.data_type,
args.instruction_gcs_uri
)
41 changes: 41 additions & 0 deletions datalabeling/create_instruction_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
#!/usr/bin/env python

# Copyright 2019 Google, Inc
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import os

import create_instruction
from google.cloud import datalabeling_v1beta1 as datalabeling
import pytest

PROJECT_ID = os.getenv('GCLOUD_PROJECT')
INSTRUCTION_GCS_URI = ('gs://cloud-samples-data/datalabeling'
'/instruction/test.pdf')


@pytest.mark.slow
def test_create_instruction(capsys):
result = create_instruction.create_instruction(
PROJECT_ID,
'IMAGE',
INSTRUCTION_GCS_URI
)
out, _ = capsys.readouterr()
assert 'The instruction resource name: ' in out

# Delete the created instruction.
instruction_name = result.name
client = datalabeling.DataLabelingServiceClient()
client.delete_instruction(instruction_name)
Loading