Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery: Add ability to pass in a table ID instead of a query to the %%bigquery magic. #9170

Merged
merged 6 commits into from
Sep 23, 2019
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 26 additions & 11 deletions bigquery/google/cloud/bigquery/magics.py
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,7 @@

from __future__ import print_function

import re
import ast
import sys
import time
Expand Down Expand Up @@ -266,6 +267,15 @@ def default_query_job_config(self, value):
context = Context()


def _print_error(error, destination_var=None):
if destination_var:
print(
"Could not save output to variable '{}'.".format(destination_var),
file=sys.stderr,
)
print("\nERROR:\n", error, file=sys.stderr)


def _run_query(client, query, job_config=None):
"""Runs a query while printing status updates

Expand Down Expand Up @@ -434,6 +444,21 @@ def _cell_magic(line, query):
else:
max_results = None

if not re.search(r"\s", query.rstrip()):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What scenario is this protecting against? Is there a scenario where unicode strings will not remove space characters via rstrip?

What do we expect in the else case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related: Does this repository enforce code coverage? Is there a case where we test there being unstoppable whitespace and not taking this branch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition is actually testing for queries that contain whitespace characters that aren't removed by rstrip (so any whitespace that isn't a trailing newline). The assumption being made (as described in #9105 ) is that anything containing whitespace is a SQL query and won't take this branch, while anything string without whitespace is a table_id and will take this branch.

So anything regular SQL query would fall into the else case. There are tests that check whether query strings without spaces will be interpreted as table IDs and a test that checks if a string without whitespace that isn't a valid table_id raises an appropriate error message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add a comment in the code if that would be helpful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also write this down in a comment, i.e. that anything without whitespace is assumed to be table identifier which triggers a different use case of the magic command.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comment in d663104

table_id = query.rstrip()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, stripping whitespace from the same value again is not necessary, we could store the stripped string into a variable the first time we do it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in d663104


try:
rows = client.list_rows(table_id, max_results=max_results)
except Exception as ex:
return _print_error(str(ex), args.destination_var)

result = rows.to_dataframe(bqstorage_client=bqstorage_client)
if args.destination_var:
IPython.get_ipython().push({args.destination_var: result})
return
else:
return result

job_config = bigquery.job.QueryJobConfig()
job_config.query_parameters = params
job_config.use_legacy_sql = args.use_legacy_sql
Expand All @@ -445,24 +470,14 @@ def _cell_magic(line, query):
value = int(args.maximum_bytes_billed)
job_config.maximum_bytes_billed = value

error = None
try:
query_job = _run_query(client, query, job_config=job_config)
except Exception as ex:
error = str(ex)
return _print_error(str(ex), args.destination_var)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be slightly confusing for the reader at first glance, because _print_error() itself does not return anything, it just has a side effect. I would simply express the same in two lines (a sole return in its own).

Copy link
Contributor Author

@shubha-rajan shubha-rajan Sep 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 6f2dd64


if not args.verbose:
display.clear_output()

if error:
if args.destination_var:
print(
"Could not save output to variable '{}'.".format(args.destination_var),
file=sys.stderr,
)
print("\nERROR:\n", error, file=sys.stderr)
return

if args.dry_run and args.destination_var:
IPython.get_ipython().push({args.destination_var: query_job})
return
Expand Down
108 changes: 108 additions & 0 deletions bigquery/tests/unit/test_magics.py
Original file line number Diff line number Diff line change
Expand Up @@ -696,6 +696,114 @@ def test_bigquery_magic_w_max_results_valid_calls_queryjob_result():
query_job_mock.result.assert_called_with(max_results=5)


def test_bigquery_magic_w_table_id_invalid():
ip = IPython.get_ipython()
ip.extension_manager.load_extension("google.cloud.bigquery")
magics.context._project = None

credentials_mock = mock.create_autospec(
google.auth.credentials.Credentials, instance=True
)
default_patch = mock.patch(
"google.auth.default", return_value=(credentials_mock, "general-project")
)

list_rows_patch = mock.patch(
"google.cloud.bigquery.magics.bigquery.Client.list_rows",
autospec=True,
side_effect=exceptions.BadRequest("Not a valid table ID"),
)

table_id = "not-a-real-table"

with list_rows_patch, default_patch, io.capture_output() as captured_io:
ip.run_cell_magic("bigquery", "df", table_id)

output = captured_io.stderr
assert "Could not save output to variable" in output
assert "400 Not a valid table ID" in output
assert "Traceback (most recent call last)" not in output


@pytest.mark.usefixtures("ipython_interactive")
def test_bigquery_magic_w_table_id_and_destination_var():
ip = IPython.get_ipython()
ip.extension_manager.load_extension("google.cloud.bigquery")
magics.context._project = None

credentials_mock = mock.create_autospec(
google.auth.credentials.Credentials, instance=True
)
default_patch = mock.patch(
"google.auth.default", return_value=(credentials_mock, "general-project")
)

row_iterator_mock = mock.create_autospec(
google.cloud.bigquery.table.RowIterator, instance=True
)

client_patch = mock.patch(
"google.cloud.bigquery.magics.bigquery.Client", autospec=True
)

table_id = "bigquery-public-data.samples.shakespeare"
result = pandas.DataFrame([17], columns=["num"])

with client_patch as client_mock, default_patch:
client_mock().list_rows.return_value = row_iterator_mock
row_iterator_mock.to_dataframe.return_value = result

ip.run_cell_magic("bigquery", "df", table_id)

assert "df" in ip.user_ns
df = ip.user_ns["df"]

assert isinstance(df, pandas.DataFrame)


@pytest.mark.usefixtures("ipython_interactive")
def test_bigquery_magic_w_table_id_and_bqstorage_client():
ip = IPython.get_ipython()
ip.extension_manager.load_extension("google.cloud.bigquery")
magics.context._project = None

credentials_mock = mock.create_autospec(
google.auth.credentials.Credentials, instance=True
)
default_patch = mock.patch(
"google.auth.default", return_value=(credentials_mock, "general-project")
)

row_iterator_mock = mock.create_autospec(
google.cloud.bigquery.table.RowIterator, instance=True
)

client_patch = mock.patch(
"google.cloud.bigquery.magics.bigquery.Client", autospec=True
)

bqstorage_mock = mock.create_autospec(
bigquery_storage_v1beta1.BigQueryStorageClient
)
bqstorage_instance_mock = mock.create_autospec(
bigquery_storage_v1beta1.BigQueryStorageClient, instance=True
)
bqstorage_mock.return_value = bqstorage_instance_mock
bqstorage_client_patch = mock.patch(
"google.cloud.bigquery_storage_v1beta1.BigQueryStorageClient", bqstorage_mock
)

table_id = "bigquery-public-data.samples.shakespeare"

with default_patch, client_patch as client_mock, bqstorage_client_patch:
client_mock().list_rows.return_value = row_iterator_mock

ip.run_cell_magic("bigquery", "--use_bqstorage_api --max_results=5", table_id)
row_iterator_mock.to_dataframe.assert_called_once_with(
bqstorage_client=bqstorage_instance_mock
)


@pytest.mark.usefixtures("ipython_interactive")
def test_bigquery_magic_dryrun_option_sets_job_config():
ip = IPython.get_ipython()
Expand Down