Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Job and BatchPredictionJob classes #79

Merged
merged 9 commits into from
Nov 24, 2020

Conversation

vinnysenthil
Copy link
Contributor

PR for batch prediction method in Model class to follow once this PR and #66 get merged.

Includes the following:

  • Job base class for remaining Job subclasses
    • Adds a shared status() method for Job
  • BatchPredictionJob class
    • Adds a iter_outputs() method which returns either a BQ QueryJob or list of GCS Blobs
  • 7 unit tests covering 94% of jobs.py
  • Addition of dependency google-cloud-bigquery
  • Addition of a constants.py to have single source of truth on re-used constants (i.e. supported parameters, API constants)

Fixes b/169783178, b/171074104 🦕

@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Nov 17, 2020
@vinnysenthil vinnysenthil changed the title Job and BatchPredictionJob classes feat: Job and BatchPredictionJob classes Nov 17, 2020
Copy link
Member

@sasha-gitg sasha-gitg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Minor requests.

google/cloud/aiplatform/jobs.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/jobs.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/jobs.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/jobs.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/jobs.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/utils.py Outdated Show resolved Hide resolved
tests/unit/aiplatform/test_jobs.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/initializer.py Show resolved Hide resolved
google/cloud/aiplatform/jobs.py Show resolved Hide resolved
google/cloud/aiplatform/jobs.py Show resolved Hide resolved
google/cloud/aiplatform/jobs.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/jobs.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/jobs.py Outdated Show resolved Hide resolved
@vinnysenthil
Copy link
Contributor Author

Hey @tswast , looping you in here for your BigQuery expertise.

Context:
This is for the Model Builder SDK and this PR implements a BatchPredictionJob class where users can send many instances to a trained model and get many predictions back. There is an option provide a BQ project to write prediction output directly into. The service creates a new dataset in the provided BQ project and writes the predictions into a table.

We're implementing a helper method iter_outputs for users to get an iterable object to traverse the predictions. In the scenario they chose a BQ output, I return a QueryJob that runs SELECT * {generated_dataset_name}.predictions and allows a user to pass in an int value for LIMIT.

Here are the lines in question:

PTAL and share your thoughts when you get a chance 😄

google/cloud/aiplatform/jobs.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/jobs.py Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
Copy link

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for BigQuery usage. I have some concerns about constructing both Storage and BigQuery clients within a client method, but depending on the performance expectations of this could be considered a minor issue.

google/cloud/aiplatform/jobs.py Outdated Show resolved Hide resolved

# BigQuery Destination, return QueryJob
elif output_info.bigquery_output_dataset:
bq_client = bigquery.Client()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit concerned about this. Ideally you'd re-use the credentials from uCAIP client.

Also, there's a risk of leaking sockets when you create clients on-the-fly. Not as big a deal for REST clients, but definitely a concern for gRPC clients. googleapis/google-cloud-python#9790 googleapis/google-cloud-python#9457

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bq_client = bigquery.Client()
bq_client = bigquery.Client(
credentials=self.api_client._transport._credentials
)

^ This change would build a BigQuery Client using the same credentials as uCAIP's JobServiceClient.

In regards to the leaking sockets, would the solution referenced in that issue work? See below

# Close sockets opened by BQ Client
bq_client._http._auth_request.session.close()
bq_client._http.close()

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change for credentials LGTM. (Storage should get similar treatment).

It's a little trickier in our case, because we want the client to live for the lifetime of the RowIterator.

Unless you want to convert to full list of rows / pandas dataframe before returning? In which case all the API requests would be made here and we could close the client when done (FWIW, the client does have a close function in BQ. https://googleapis.dev/python/bigquery/latest/generated/google.cloud.bigquery.client.Client.html#google.cloud.bigquery.client.Client.close)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the credentials change on both BQ and Storage.

Re: closing connections - this is indeed tricky since the method is meant to return an iterator. However your comment made realize a larger issue of us instantiating a GAPIC client for every instance of a high-level SDK object. I'm capturing this in b/174111905.

Will merge this blocking PR for now, thanks for calling this issue out!

@vinnysenthil vinnysenthil merged commit f2ccd1e into googleapis:dev Nov 24, 2020
dizcology pushed a commit to dizcology/python-aiplatform that referenced this pull request Nov 30, 2020
* Create and move all constants to constants.py

* Fix tests after constants.py, drop unused vars

* Init Job and BatchPredictionJob class, unit tests

* Address all reviewer comments

* Update docstring to bigquery.table.RowIterator

* Get GCS/BQ clients to use same creds as uCAIP
dizcology pushed a commit to dizcology/python-aiplatform that referenced this pull request Nov 30, 2020
* Create and move all constants to constants.py

* Fix tests after constants.py, drop unused vars

* Init Job and BatchPredictionJob class, unit tests

* Address all reviewer comments

* Update docstring to bigquery.table.RowIterator

* Get GCS/BQ clients to use same creds as uCAIP
dizcology pushed a commit to dizcology/python-aiplatform that referenced this pull request Nov 30, 2020
* Create and move all constants to constants.py

* Fix tests after constants.py, drop unused vars

* Init Job and BatchPredictionJob class, unit tests

* Address all reviewer comments

* Update docstring to bigquery.table.RowIterator

* Get GCS/BQ clients to use same creds as uCAIP
dizcology pushed a commit to dizcology/python-aiplatform that referenced this pull request Nov 30, 2020
* Create and move all constants to constants.py

* Fix tests after constants.py, drop unused vars

* Init Job and BatchPredictionJob class, unit tests

* Address all reviewer comments

* Update docstring to bigquery.table.RowIterator

* Get GCS/BQ clients to use same creds as uCAIP
dizcology pushed a commit to dizcology/python-aiplatform that referenced this pull request Nov 30, 2020
* Create and move all constants to constants.py

* Fix tests after constants.py, drop unused vars

* Init Job and BatchPredictionJob class, unit tests

* Address all reviewer comments

* Update docstring to bigquery.table.RowIterator

* Get GCS/BQ clients to use same creds as uCAIP
dizcology pushed a commit to dizcology/python-aiplatform that referenced this pull request Dec 22, 2020
* Create and move all constants to constants.py

* Fix tests after constants.py, drop unused vars

* Init Job and BatchPredictionJob class, unit tests

* Address all reviewer comments

* Update docstring to bigquery.table.RowIterator

* Get GCS/BQ clients to use same creds as uCAIP
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants