Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

migrate code from googleapis/python-dataproc #8509

Merged
merged 163 commits into from
Nov 15, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
163 commits
Select commit Hold shift + click to select a range
5ab0fe2
chore: move samples from python-docs-sample (#66)
arithmetic1728 Aug 8, 2020
3c57ee2
chore(deps): update dependency google-cloud-storage to v1.30.0 (#68)
renovate-bot Aug 8, 2020
4c4563b
chore(deps): update dependency google-cloud-dataproc to v1.1.0 (#67)
renovate-bot Aug 8, 2020
162b852
feat!: migrate to use microgen (#71)
arithmetic1728 Aug 10, 2020
c0b587b
chore(deps): update dependency google-auth to v1.21.2 (#84)
renovate-bot Sep 16, 2020
ef0ea6c
chore(deps): update dependency grpcio to v1.32.0 (#89)
renovate-bot Sep 16, 2020
7704b72
feat: adding submit_job samples (#88)
bradmiro Sep 16, 2020
35ffc73
chore(deps): update dependency google-auth to v1.21.3 (#93)
renovate-bot Sep 23, 2020
e84dcd8
chore(deps): update dependency google-auth to v1.22.0 (#94)
renovate-bot Sep 28, 2020
53d2a8d
chore(deps): update dependency google-auth to v1.22.1 (#95)
renovate-bot Oct 13, 2020
e57d060
chore(deps): update dependency google-cloud-storage to v1.31.2 (#83)
renovate-bot Oct 13, 2020
299f8df
chore(deps): update dependency google-cloud-dataproc to v1.1.1 (#81)
renovate-bot Oct 16, 2020
6719422
chore(deps): update dependency google-cloud-dataproc to v2 (#82)
renovate-bot Oct 16, 2020
b2c00a3
chore(deps): update dependency grpcio to v1.33.1 (#97)
renovate-bot Oct 23, 2020
1ae94df
chore(deps): update dependency grpcio to v1.33.2 (#98)
renovate-bot Oct 28, 2020
5bf2666
chore(deps): update dependency google-auth to v1.23.0 (#99)
renovate-bot Oct 29, 2020
991e0d5
feat: add common resource paths, expose client transport (#87)
yoshi-automation Nov 16, 2020
d8bc905
chore(deps): update dependency google-cloud-storage to v1.32.0 (#96)
renovate-bot Nov 16, 2020
67542be
chore(deps): update dependency google-cloud-dataproc to v2.2.0 (#102)
renovate-bot Nov 16, 2020
72624c1
chore(deps): update dependency google-cloud-storage to v1.33.0 (#103)
renovate-bot Nov 18, 2020
16dacfe
chore(deps): update dependency grpcio to v1.34.0 (#109)
renovate-bot Dec 5, 2020
4ccf22a
chore(deps): update dependency google-auth to v1.24.0 (#114)
renovate-bot Dec 15, 2020
522f2b1
chore(deps): update dependency google-cloud-storage to v1.35.0 (#113)
renovate-bot Dec 15, 2020
89b26dd
chore: update templates (#118)
yoshi-automation Dec 29, 2020
ef8f87c
chore(deps): update dependency grpcio to v1.34.1 (#124)
renovate-bot Jan 15, 2021
49a965a
chore(deps): update dependency google-auth to v1.26.1 (#132)
renovate-bot Feb 12, 2021
88828d9
chore(deps): update dependency google-cloud-storage to v1.36.0 (#131)
renovate-bot Feb 12, 2021
c9df8a0
chore(deps): update dependency google-auth to v1.27.0 (#134)
renovate-bot Feb 20, 2021
e378716
chore(deps): update dependency grpcio to v1.35.0 (#127)
renovate-bot Feb 22, 2021
41b9daf
chore(deps): update dependency google-cloud-storage to v1.36.1 (#136)
renovate-bot Feb 24, 2021
b86ebf8
chore(deps): update dependency grpcio to v1.36.0 (#137)
renovate-bot Mar 1, 2021
ba80c32
docs: adding backoff to quickstart test (#135)
bradmiro Mar 2, 2021
d6fe3c7
chore(deps): update dependency google-auth-httplib2 to v0.1.0 (#142)
renovate-bot Mar 5, 2021
11c266e
chore(deps): update dependency grpcio to v1.36.1 (#139)
renovate-bot Mar 5, 2021
797f7b7
chore(deps): update dependency google-auth to v1.27.1 (#141)
renovate-bot Mar 5, 2021
1a99aab
chore(deps): update dependency google-cloud-dataproc to v2.3.0 (#143)
renovate-bot Mar 5, 2021
e402a92
chore(deps): update dependency google-cloud-storage to v1.36.2 (#144)
renovate-bot Mar 11, 2021
84489af
chore(deps): update dependency google-auth to v1.28.0 (#146)
renovate-bot Mar 17, 2021
063c5c9
fix: (samples) fixing samples for new machine types (#150)
bradmiro Mar 26, 2021
69833e0
chore(deps): update dependency google-cloud-storage to v1.37.0 (#153)
renovate-bot Mar 27, 2021
2b11b22
chore(deps): update dependency google-cloud-dataproc to v2.3.1 (#154)
renovate-bot Apr 2, 2021
f2fb618
fix: use correct retry deadlines (#122)
yoshi-automation Apr 5, 2021
7280790
chore(deps): update dependency grpcio to v1.37.0 (#163)
renovate-bot Apr 8, 2021
1e64f91
chore(deps): update dependency google-cloud-storage to v1.37.1 (#159)
renovate-bot Apr 8, 2021
2a3ee6f
chore(deps): update dependency google-auth to v1.28.1 (#165)
renovate-bot Apr 10, 2021
d148921
chore: migrate to owl bot (#171)
parthea Apr 23, 2021
1355a07
chore(deps): update dependency google-auth to v1.30.0 (#169)
renovate-bot Apr 28, 2021
3e8c65c
chore(deps): update dependency grpcio to v1.37.1 (#182)
renovate-bot May 14, 2021
d3f6998
chore(deps): update dependency pytest to v6.2.4 (#172)
renovate-bot May 14, 2021
417756e
chore(deps): update dependency google-cloud-storage to v1.38.0 (#174)
renovate-bot May 14, 2021
cae55bf
chore: new owl bot post processor docker image (#193)
gcf-owl-bot[bot] May 22, 2021
9500ba7
chore(deps): update dependency google-auth to v1.30.1 (#195)
renovate-bot May 26, 2021
122e303
chore(deps): update dependency google-cloud-dataproc to v2.4.0 (#191)
renovate-bot May 26, 2021
c608499
chore(deps): update dependency google-auth to v1.30.2 (#197)
renovate-bot Jun 9, 2021
658993b
chore(deps): update dependency grpcio to v1.38.0 (#190)
renovate-bot Jun 16, 2021
eb24de0
chore(deps): update dependency google-auth to v1.31.0 (#198)
renovate-bot Jun 16, 2021
3904a24
chore(deps): update dependency google-cloud-storage to v1.39.0 (#210)
renovate-bot Jun 23, 2021
a390a5c
chore(deps): update dependency grpcio to v1.38.1 (#208)
renovate-bot Jun 23, 2021
bba3d48
chore(deps): update dependency google-auth to v1.32.0 (#207)
renovate-bot Jun 25, 2021
5dec977
chore(deps): update dependency google-auth to v1.32.1 (#217)
renovate-bot Jul 1, 2021
531d5c3
chore(deps): update dependency google-cloud-storage to v1.40.0 (#216)
renovate-bot Jul 1, 2021
5516b72
fix: Attribute error Name while executing the sample code (#205)
vikrant-sinha Jul 2, 2021
f16f439
chore(deps): update dependency backoff to v1.11.0 (#220)
renovate-bot Jul 12, 2021
83f7a08
chore(deps): update dependency google-auth to v1.33.0 (#226)
renovate-bot Jul 14, 2021
90323b8
chore(deps): update dependency google-cloud-storage to v1.41.0 (#222)
renovate-bot Jul 15, 2021
dcb2ca1
chore(deps): update dependency backoff to v1.11.1 (#224)
renovate-bot Jul 16, 2021
38a4b34
chore(deps): update dependency google-cloud-storage to v1.41.1 (#230)
renovate-bot Jul 21, 2021
5f5b2b7
chore(deps): update dependency google-auth to v1.33.1 (#229)
renovate-bot Jul 21, 2021
309bddd
feat: add Samples section to CONTRIBUTING.rst (#228)
gcf-owl-bot[bot] Jul 22, 2021
1300356
chore(deps): update dependency grpcio to v1.39.0 (#231)
renovate-bot Jul 22, 2021
8268598
chore(deps): update dependency google-auth to v1.34.0 (#235)
renovate-bot Jul 27, 2021
a8c21b2
chore(deps): update dependency google-cloud-dataproc to v2.5.0 (#234)
renovate-bot Jul 27, 2021
a40774c
chore: fix INSTALL_LIBRARY_FROM_SOURCE in noxfile.py (#240)
gcf-owl-bot[bot] Aug 11, 2021
c8ba31b
chore(deps): update dependency google-cloud-storage to v1.42.0 (#242)
renovate-bot Aug 12, 2021
14cf125
docs: update cluster sample (#218)
loferris Aug 12, 2021
2a4f2fd
chore: drop mention of Python 2.7 from templates (#244)
gcf-owl-bot[bot] Aug 13, 2021
4e95279
chore(deps): update dependency google-auth to v1.35.0 (#245)
renovate-bot Aug 17, 2021
2a42545
chore(deps): update dependency google-auth to v2 (#246)
renovate-bot Aug 24, 2021
5e5ffd7
chore(deps): update dependency pytest to v6.2.5 (#252)
renovate-bot Aug 31, 2021
424862c
chore(deps): update dependency google-auth to v2.0.2 (#254)
renovate-bot Sep 1, 2021
9584f0c
chore(deps): update dependency grpcio to v1.40.0 (#263)
renovate-bot Sep 8, 2021
a74fc45
chore(deps): update dependency google-cloud-storage to v1.42.1 (#264)
renovate-bot Sep 9, 2021
d834761
chore: blacken samples noxfile template (#266)
gcf-owl-bot[bot] Sep 17, 2021
7b5cfcc
chore(deps): update all dependencies (#265)
renovate-bot Sep 20, 2021
19e0bbe
chore(deps): update dependency google-auth to v2.2.0 (#274)
renovate-bot Sep 27, 2021
4c8765f
chore(deps): update dependency grpcio to v1.41.0 (#275)
renovate-bot Sep 28, 2021
6b06451
chore(deps): update dependency google-auth to v2.2.1 (#276)
renovate-bot Sep 29, 2021
52affd7
chore: fail samples nox session if python version is missing (#279)
gcf-owl-bot[bot] Sep 30, 2021
46fb7ec
chore(deps): update dependency google-cloud-storage to v1.42.3 (#280)
renovate-bot Oct 1, 2021
1c550e7
chore(deps): update dependency google-cloud-dataproc to v3 (#282)
renovate-bot Oct 5, 2021
3de36a6
chore(python): Add kokoro configs for python 3.10 samples testing (#287)
gcf-owl-bot[bot] Oct 8, 2021
ba18b32
chore(deps): update dependency google-auth to v2.3.0 (#286)
renovate-bot Oct 9, 2021
1b17a1a
chore(deps): update all dependencies (#292)
renovate-bot Oct 26, 2021
ea2cfe1
chore(deps): update all dependencies (#295)
renovate-bot Oct 26, 2021
435ef79
chore(deps): update dependency google-auth to v2.3.3 (#298)
renovate-bot Nov 1, 2021
101a440
chore(deps): update dependency google-cloud-dataproc to v3.1.1 (#299)
renovate-bot Nov 3, 2021
007a381
chore: correct region tag in submit_job_to_cluster.py (#304)
aman-ebay Nov 12, 2021
50c18c5
chore(deps): update all dependencies (#306)
renovate-bot Nov 19, 2021
ab7df1f
chore(deps): update all dependencies (#313)
renovate-bot Jan 9, 2022
d00d307
chore(samples): Add check for tests in directory (#323)
gcf-owl-bot[bot] Jan 11, 2022
350f252
docs(samples): update python-api-walkthrough.md (#308)
aman-ebay Jan 13, 2022
e56230f
chore(deps): update dependency google-cloud-storage to v2 (#326)
renovate-bot Jan 14, 2022
cd6dc62
chore(samples): fix job polling (#314)
medb Jan 17, 2022
afafbd4
chore(deps): update dependency google-cloud-dataproc to v3.2.0 (#330)
renovate-bot Jan 18, 2022
4b3b4b7
chore(python): Noxfile recognizes that tests can live in a folder (#331)
gcf-owl-bot[bot] Jan 19, 2022
66289ca
chore(deps): update dependency google-cloud-storage to v2.1.0 (#332)
renovate-bot Jan 22, 2022
266b7b5
chore(deps): update all dependencies (#338)
renovate-bot Feb 7, 2022
4a3b563
chore(deps): update dependency pytest to v7.0.1 (#343)
renovate-bot Feb 14, 2022
fe29c46
chore(deps): update dependency grpcio to v1.44.0 (#345)
renovate-bot Feb 18, 2022
09a6156
chore(deps): update dependency google-cloud-dataproc to v3.3.0 (#349)
renovate-bot Feb 26, 2022
d327dea
chore: Adding support for pytest-xdist and pytest-parallel (#358)
gcf-owl-bot[bot] Mar 4, 2022
f015223
chore(deps): update all dependencies (#354)
renovate-bot Mar 5, 2022
c3ac64a
chore(deps): update dependency google-cloud-dataproc to v4.0.1 (#362)
renovate-bot Mar 7, 2022
816fbb4
test(samples): use try/finally for clusters and use pytest-xdist (#360)
busunkim96 Mar 8, 2022
56d91ce
chore(deps): update dependency pytest to v7.1.0 (#366)
renovate-bot Mar 13, 2022
4ff6896
chore(deps): update dependency google-cloud-storage to v2.2.0 (#367)
renovate-bot Mar 14, 2022
2b8eefa
chore(deps): update dependency google-cloud-storage to v2.2.1 (#368)
renovate-bot Mar 15, 2022
6cde59e
chore(deps): update all dependencies (#369)
renovate-bot Mar 19, 2022
9c2bd8b
Fix: resource quotas (#377)
loferris Mar 25, 2022
08ad604
chore(python): use black==22.3.0 (#383)
gcf-owl-bot[bot] Mar 29, 2022
c5f2cee
Fix: updating submit_job_to_cluster.py (#387)
aman-ebay Mar 31, 2022
5fce912
chore(deps): update all dependencies (#399)
renovate-bot Apr 7, 2022
4c37062
chore(deps): update dependency google-auth to v2.6.4 (#402)
renovate-bot Apr 12, 2022
cca6c6a
chore(deps): update dependency google-cloud-storage to v2.3.0 (#403)
renovate-bot Apr 13, 2022
d864ffa
Update python-api-walkthrough.md (#398)
aman-ebay Apr 14, 2022
fc20d4c
chore(deps): update dependency google-auth to v2.6.5 (#406)
renovate-bot Apr 15, 2022
69987e3
docs: Dataproc ebay walkthrough update (#405)
aman-ebay Apr 16, 2022
37da1e2
chore(python): add nox session to sort python imports (#408)
gcf-owl-bot[bot] Apr 21, 2022
66b9606
chore(deps): update dependency google-auth to v2.6.6 (#411)
renovate-bot Apr 22, 2022
f975204
chore(deps): update dependency pytest to v7.1.2 (#412)
renovate-bot Apr 25, 2022
ffead32
chore(deps): update dependency backoff to v2 (#413)
renovate-bot Apr 26, 2022
a2c39fe
chore(deps): update dependency backoff to v2.0.1 (#414)
renovate-bot Apr 27, 2022
69539de
chore(deps): update dependency grpcio to v1.46.0 (#415)
renovate-bot May 5, 2022
b4f4b29
chore(deps): update dependency grpcio to v1.46.1 (#420)
renovate-bot May 12, 2022
61ac352
chore(deps): update dependency grpcio to v1.46.3 (#422)
renovate-bot May 22, 2022
89292c6
fix: require python 3.7+ (#442)
gcf-owl-bot[bot] Jul 10, 2022
c9e161a
chore(deps): update all dependencies (#428)
renovate-bot Jul 15, 2022
383e535
chore(deps): update all dependencies (#451)
renovate-bot Aug 2, 2022
b5f9f3e
chore(deps): update all dependencies (#453)
renovate-bot Aug 6, 2022
180cdfd
chore(deps): update all dependencies (#454)
renovate-bot Aug 6, 2022
b5af4e7
chore(deps): update dependency google-cloud-dataproc to v5.0.1 (#460)
renovate-bot Aug 19, 2022
99c3f19
chore(deps): update dependency google-auth to v2.11.0 (#461)
renovate-bot Aug 23, 2022
a88d34c
chore(deps): update all dependencies (#473)
renovate-bot Sep 6, 2022
0889ffe
chore: detect samples tests in nested directories (#477)
gcf-owl-bot[bot] Sep 13, 2022
94473a2
chore(deps): update dependency grpcio to v1.49.0 (#479)
renovate-bot Sep 16, 2022
4526cad
chore(deps): update dependency google-auth to v2.11.1 (#480)
renovate-bot Sep 20, 2022
0e21d87
chore(deps): update all dependencies (#483)
renovate-bot Oct 3, 2022
d3ae234
chore(deps): update dependency google-cloud-dataproc to v5.0.2 (#488)
renovate-bot Oct 4, 2022
48acea7
chore(deps): update dependency backoff to v2.2.1 (#489)
renovate-bot Oct 6, 2022
fe80dcc
chore(deps): update all dependencies (#492)
renovate-bot Oct 18, 2022
f77ae7c
chore(deps): update dependency grpcio to v1.50.0 (#493)
renovate-bot Oct 19, 2022
6b1b7dd
chore(deps): update all dependencies (#494)
renovate-bot Oct 26, 2022
0edadf4
chore(deps): update dependency google-auth to v2.14.0 (#496)
renovate-bot Nov 1, 2022
43ba3f6
Merge remote-tracking branch 'migration/main' into python-dataproc-mi…
msampathkumar Nov 14, 2022
192d724
Remove (redundant) noxfile.py
msampathkumar Nov 14, 2022
3d5271f
Update Licence Header
msampathkumar Nov 14, 2022
0729e52
Merge branch 'main' into python-dataproc-migration
msampathkumar Nov 14, 2022
37a30f7
Merge branch 'main' into python-dataproc-migration
msampathkumar Nov 15, 2022
582b2b3
Merge branch 'main' into python-dataproc-migration
m-strzelczyk Nov 15, 2022
ff680ea
Merge branch 'main' into python-dataproc-migration
msampathkumar Nov 15, 2022
2f5e8a6
Revert "Remove (redundant) noxfile.py"
msampathkumar Nov 15, 2022
1cd8248
Removing only noxfile.py
msampathkumar Nov 15, 2022
4827801
Update blunderbuss.yml and CODEOWNERS
msampathkumar Nov 15, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
Expand Up @@ -74,5 +74,6 @@
/talent/**/* @GoogleCloudPlatform/python-samples-reviewers
/vision/**/* @GoogleCloudPlatform/python-samples-reviewers
/workflows/**/* @GoogleCloudPlatform/python-samples-reviewers
/datacatalog/**/* @GoogleCloudPlatform/python-samples-reviewers
/datacatalog/**/* @GoogleCloudPlatform/python-samples-reviewers
/kms/**/** @GoogleCloudPlatform/dee-infra @GoogleCloudPlatform/python-samples-reviewers
/dataproc/**/** @GoogleCloudPlatform/cloud-dpes @GoogleCloudPlatform/python-samples-reviewers
4 changes: 4 additions & 0 deletions .github/blunderbuss.yml
Original file line number Diff line number Diff line change
Expand Up @@ -172,6 +172,10 @@ assign_prs_by:
- 'api: cloudtasks'
to:
- GoogleCloudPlatform/infra-db-dpes
- labels:
- 'api: dataproc'
to:
- GoogleCloudPlatform/cloud-dpes

assign_issues:
- GoogleCloudPlatform/python-samples-owners
Expand Down
84 changes: 84 additions & 0 deletions dataproc/snippets/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# Cloud Dataproc API Examples

[![Open in Cloud Shell][shell_img]][shell_link]

[shell_img]: http://gstatic.com/cloudssh/images/open-btn.png
[shell_link]: https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/GoogleCloudPlatform/python-docs-samples&page=editor&open_in_editor=dataproc/README.md

Sample command-line programs for interacting with the Cloud Dataproc API.

See [the tutorial on the using the Dataproc API with the Python client
library](https://cloud.google.com/dataproc/docs/tutorials/python-library-example)
for information on a walkthrough you can run to try out the Cloud Dataproc API sample code.

Note that while this sample demonstrates interacting with Dataproc via the API, the functionality demonstrated here could also be accomplished using the Cloud Console or the gcloud CLI.

`list_clusters.py` is a simple command-line program to demonstrate connecting to the Cloud Dataproc API and listing the clusters in a region.

`submit_job_to_cluster.py` demonstrates how to create a cluster, submit the
`pyspark_sort.py` job, download the output from Google Cloud Storage, and output the result.

`single_job_workflow.py` uses the Cloud Dataproc InstantiateInlineWorkflowTemplate API to create an ephemeral cluster, run a job, then delete the cluster with one API request.

`pyspark_sort.py_gcs` is the same as `pyspark_sort.py` but demonstrates
reading from a GCS bucket.

## Prerequisites to run locally:

* [pip](https://pypi.python.org/pypi/pip)

Go to the [Google Cloud Console](https://console.cloud.google.com).

Under API Manager, search for the Google Cloud Dataproc API and enable it.

## Set Up Your Local Dev Environment

To install, run the following commands. If you want to use [virtualenv](https://virtualenv.readthedocs.org/en/latest/)
(recommended), run the commands within a virtualenv.

* pip install -r requirements.txt

## Authentication

Please see the [Google cloud authentication guide](https://cloud.google.com/docs/authentication/).
The recommended approach to running these samples is a Service Account with a JSON key.

## Environment Variables

Set the following environment variables:

GOOGLE_CLOUD_PROJECT=your-project-id
REGION=us-central1 # or your region
CLUSTER_NAME=waprin-spark7
ZONE=us-central1-b

## Running the samples

To run list_clusters.py:

python list_clusters.py $GOOGLE_CLOUD_PROJECT --region=$REGION

`submit_job_to_cluster.py` can create the Dataproc cluster or use an existing cluster. To create a cluster before running the code, you can use the [Cloud Console](console.cloud.google.com) or run:

gcloud dataproc clusters create your-cluster-name

To run submit_job_to_cluster.py, first create a GCS bucket (used by Cloud Dataproc to stage files) from the Cloud Console or with gsutil:

gsutil mb gs://<your-staging-bucket-name>

Next, set the following environment variables:

BUCKET=your-staging-bucket
CLUSTER=your-cluster-name

Then, if you want to use an existing cluster, run:

python submit_job_to_cluster.py --project_id=$GOOGLE_CLOUD_PROJECT --zone=us-central1-b --cluster_name=$CLUSTER --gcs_bucket=$BUCKET

Alternatively, to create a new cluster, which will be deleted at the end of the job, run:

python submit_job_to_cluster.py --project_id=$GOOGLE_CLOUD_PROJECT --zone=us-central1-b --cluster_name=$CLUSTER --gcs_bucket=$BUCKET --create_new_cluster

The script will setup a cluster, upload the PySpark file, submit the job, print the result, then, if it created the cluster, delete the cluster.

Optionally, you can add the `--pyspark_file` argument to change from the default `pyspark_sort.py` included in this script to a new script.
73 changes: 73 additions & 0 deletions dataproc/snippets/create_cluster.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
#!/usr/bin/env python

# Copyright 2019 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# This sample walks a user through creating a Cloud Dataproc cluster using
# the Python client library.
#
# This script can be run on its own:
# python create_cluster.py ${PROJECT_ID} ${REGION} ${CLUSTER_NAME}


import sys

# [START dataproc_create_cluster]
from google.cloud import dataproc_v1 as dataproc


def create_cluster(project_id, region, cluster_name):
"""This sample walks a user through creating a Cloud Dataproc cluster
using the Python client library.

Args:
project_id (string): Project to use for creating resources.
region (string): Region where the resources should live.
cluster_name (string): Name to use for creating a cluster.
"""

# Create a client with the endpoint set to the desired cluster region.
cluster_client = dataproc.ClusterControllerClient(
client_options={"api_endpoint": f"{region}-dataproc.googleapis.com:443"}
)

# Create the cluster config.
cluster = {
"project_id": project_id,
"cluster_name": cluster_name,
"config": {
"master_config": {"num_instances": 1, "machine_type_uri": "n1-standard-2"},
"worker_config": {"num_instances": 2, "machine_type_uri": "n1-standard-2"},
},
}

# Create the cluster.
operation = cluster_client.create_cluster(
request={"project_id": project_id, "region": region, "cluster": cluster}
)
result = operation.result()

# Output a success message.
print(f"Cluster created successfully: {result.cluster_name}")
# [END dataproc_create_cluster]


if __name__ == "__main__":
if len(sys.argv) < 4:
sys.exit("python create_cluster.py project_id region cluster_name")

project_id = sys.argv[1]
region = sys.argv[2]
cluster_name = sys.argv[3]
create_cluster(project_id, region, cluster_name)
57 changes: 57 additions & 0 deletions dataproc/snippets/create_cluster_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Copyright 2019 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import os
import uuid

from google.api_core.exceptions import NotFound
from google.cloud import dataproc_v1 as dataproc
import pytest

import create_cluster


PROJECT_ID = os.environ["GOOGLE_CLOUD_PROJECT"]
REGION = "us-central1"
CLUSTER_NAME = "py-cc-test-{}".format(str(uuid.uuid4()))


@pytest.fixture(autouse=True)
def teardown():
yield

cluster_client = dataproc.ClusterControllerClient(
client_options={"api_endpoint": f"{REGION}-dataproc.googleapis.com:443"}
)
# Client library function
try:
operation = cluster_client.delete_cluster(
request={
"project_id": PROJECT_ID,
"region": REGION,
"cluster_name": CLUSTER_NAME,
}
)
# Wait for cluster to delete
operation.result()
except NotFound:
print("Cluster already deleted")


def test_cluster_create(capsys):
# Wrapper function for client library function
create_cluster.create_cluster(PROJECT_ID, REGION, CLUSTER_NAME)

out, _ = capsys.readouterr()
assert CLUSTER_NAME in out
35 changes: 35 additions & 0 deletions dataproc/snippets/dataproc_e2e_donttest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
#!/usr/bin/env python

# Copyright 2019 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

""" Integration tests for Dataproc samples.

Creates a Dataproc cluster, uploads a pyspark file to Google Cloud Storage,
submits a job to Dataproc that runs the pyspark file, then downloads
the output logs from Cloud Storage and verifies the expected output."""

import os

import submit_job_to_cluster

PROJECT = os.environ["GOOGLE_CLOUD_PROJECT"]
BUCKET = os.environ["CLOUD_STORAGE_BUCKET"]
CLUSTER_NAME = "testcluster3"
ZONE = "us-central1-b"


def test_e2e():
output = submit_job_to_cluster.main(PROJECT, ZONE, CLUSTER_NAME, BUCKET)
assert b"['Hello,', 'dog', 'elephant', 'panther', 'world!']" in output
97 changes: 97 additions & 0 deletions dataproc/snippets/instantiate_inline_workflow_template.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# Copyright 2020 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# This sample walks a user through instantiating an inline
# workflow for Cloud Dataproc using the Python client library.
#
# This script can be run on its own:
# python instantiate_inline_workflow_template.py ${PROJECT_ID} ${REGION}


import sys

# [START dataproc_instantiate_inline_workflow_template]
from google.cloud import dataproc_v1 as dataproc


def instantiate_inline_workflow_template(project_id, region):
"""This sample walks a user through submitting a workflow
for a Cloud Dataproc using the Python client library.

Args:
project_id (string): Project to use for running the workflow.
region (string): Region where the workflow resources should live.
"""

# Create a client with the endpoint set to the desired region.
workflow_template_client = dataproc.WorkflowTemplateServiceClient(
client_options={"api_endpoint": f"{region}-dataproc.googleapis.com:443"}
)

parent = "projects/{}/regions/{}".format(project_id, region)

template = {
"jobs": [
{
"hadoop_job": {
"main_jar_file_uri": "file:///usr/lib/hadoop-mapreduce/"
"hadoop-mapreduce-examples.jar",
"args": ["teragen", "1000", "hdfs:///gen/"],
},
"step_id": "teragen",
},
{
"hadoop_job": {
"main_jar_file_uri": "file:///usr/lib/hadoop-mapreduce/"
"hadoop-mapreduce-examples.jar",
"args": ["terasort", "hdfs:///gen/", "hdfs:///sort/"],
},
"step_id": "terasort",
"prerequisite_step_ids": ["teragen"],
},
],
"placement": {
"managed_cluster": {
"cluster_name": "my-managed-cluster",
"config": {
"gce_cluster_config": {
# Leave 'zone_uri' empty for 'Auto Zone Placement'
# 'zone_uri': ''
"zone_uri": "us-central1-a"
}
},
}
},
}

# Submit the request to instantiate the workflow from an inline template.
operation = workflow_template_client.instantiate_inline_workflow_template(
request={"parent": parent, "template": template}
)
operation.result()

# Output a success message.
print("Workflow ran successfully.")
# [END dataproc_instantiate_inline_workflow_template]


if __name__ == "__main__":
if len(sys.argv) < 3:
sys.exit(
"python instantiate_inline_workflow_template.py " + "project_id region"
)

project_id = sys.argv[1]
region = sys.argv[2]
instantiate_inline_workflow_template(project_id, region)
Loading