Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery: Add dry_run option to BigQuery magic #9067

Merged
merged 20 commits into from
Aug 23, 2019

Conversation

shubha-rajan
Copy link
Contributor

See #8143
Adding the --dry_run flag returns a QueryJob instead of a pandas DataFrame as requested in the original issue. The QueryJob can also be stored in a variable if the destination_var magic argument is present

…nt, a QueryJob object is returned for inspection instead of an empty DataFrame
@shubha-rajan shubha-rajan requested a review from a team August 21, 2019 01:37
@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Aug 21, 2019
@plamut plamut changed the title Added dry_run option to bigquery magic BigQuery: Add dry_run option to BigQuery magic Aug 21, 2019
@plamut plamut added the api: bigquery Issues related to the BigQuery API. label Aug 21, 2019
Copy link
Contributor

@plamut plamut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! It generally it looks good to me, apart from the dry_run argument help string.

I noticed, however, than when using the dry_run mode, an error is printed to the console:

In [16]: %%bigquery --dry_run 
    ...: SELECT * FROM `bigquery-public-data.samples.shakespeare` LIMIT 5; 
    ...:  
    ...:                                                                                                                                                                                                                                                                                                                                                                                                                          
Executing query with job ID: None

ERROR:
 404 GET https://www.googleapis.com/bigquery/v2/projects/precise-truck-742/queries/None?maxResults=0&location=US: Not found: Job precise-truck-742:US.None

(job ID: None)

                   -----Query Job SQL Follows-----                    

    |    .    |    .    |    .    |    .    |    .    |    .    |
   1:SELECT * FROM `bigquery-public-data.samples.shakespeare` LIMIT 5;
   2:
    |    .    |    .    |    .    |    .    |    .    |    .    |

This can be confusing for the users, and would be good to have it fixed. However, if fixing _run_query() proves to be too complex in the scope of this PR, we can make these changes separately.

bigquery/google/cloud/bigquery/magics.py Outdated Show resolved Hide resolved
Co-Authored-By: Peter Lamut <plamut@users.noreply.github.com>
@shubha-rajan
Copy link
Contributor Author

I'll investigate that error being printed out. I agree that it would be confusing to users, and don't think it's outside of the scope of the PR

@tswast
Copy link
Contributor

tswast commented Aug 21, 2019

When you run a dry run query, a job is not actually created, so fetching the results fails.

@tswast
Copy link
Contributor

tswast commented Aug 21, 2019

Have you run this in a notebook, yet? I'm curious what the output looks like. We probably want to improve the __str__ or __repr__ functions on QueryJob to at least show estimated_bytes_processed, because that's what people running dryrun queries are going to care about.

@shubha-rajan
Copy link
Contributor Author

shubha-rajan commented Aug 22, 2019

@tswast as of now, running it in a notebook is silent. I tried modifying _run_query() and that made the error @plamut was referring to go away, but I still couldn't print out the estimated bytes without it clearing. I think the next step is to look at modifying those functions in QueryJob like you said
image

@shubha-rajan shubha-rajan force-pushed the bq-add-dryrun-to-magic branch 2 times, most recently from 1a9f187 to 6efa408 Compare August 22, 2019 07:11
@shubha-rajan
Copy link
Contributor Author

6efa408 allows the cell to output the total bytes processed. adding a variable argument allows the user to inspect values the returned QueryJob's properties including total_bytes_processed and total_bytes_billed
image

Copy link
Contributor

@plamut plamut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new changes look good IMO.

I still have two questions, though:

  • If destination_var is used and an error occurs, should the failed query job still be stored for introspection? Currently it isn't:

    In [179]: %%bigquery result --dry_run 
        ...: SELECT SELECT * FROM `bigquery-public-data.samples.shakespeare` LIMIT 5; 
        ...:  
        ...:  
        ...:                                                                                                                                                                                                                                                                                                                                                                                                                         
    
    ERROR:
    400 POST https://www.googleapis.com/bigquery/v2/projects/precise-truck-742/jobs: Syntax error: Unexpected keyword SELECT at [1:8]
    
    In [180]: "result" in locals()                                                                                                                                                                                                                                                                                                                                                                                                    
    Out[180]: False
    
  • If destination_var is not specified and an error occurs, it would IMO still be useful to print a query even on dry runs, especially on syntax errors. Currently this information is omitted in dry runs, is this intentional?

    In [188]: %%bigquery --dry_run 
        ...: SELECT SELECT * FROM `bigquery-public-data.samples.shakespeare` LIMIT 5; 
        ...:  
        ...:  
        ...:                                                                                                                                                                                                                                                                                                                                                                                                                         
    
    ERROR:
    400 POST https://www.googleapis.com/bigquery/v2/projects/precise-truck-742/jobs: Syntax error: Unexpected keyword SELECT at [1:8]
    

    Without the --dry-run option, the query gets printed as a part of the error message.

@shubha-rajan
Copy link
Contributor Author

So currently, destination_var doesn't return anything if an error occurs by default, even when the --dry_run flag isn't present. I investigated adding in that functionality, but the problem is that if calling _run_query raises an exception, it won't return a QueryJob. I suspect getting around that would mean making changes to exception handling in QueryJob and/or Client and idk what might end up breaking because of that 😅

As for printing out the SQL queries along with error messages, that's working as of the last commit.
image

Copy link
Contributor

@plamut plamut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So currently, destination_var doesn't return anything if an error occurs by default, even when the --dry_run flag isn't present.

Fine then, the --dry_run option does not have to deal with that, either. 👍

The error output is now much more useful, looks good from my side.

Copy link
Contributor

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great! Love the new error message behavior.

FYI: I've filed #9091 for consideration of how to handle failures with destination_var.

bigquery/google/cloud/bigquery/job.py Outdated Show resolved Hide resolved
bigquery/google/cloud/bigquery/job.py Outdated Show resolved Hide resolved
bigquery/tests/unit/test_magics.py Outdated Show resolved Hide resolved
bigquery/tests/unit/test_magics.py Outdated Show resolved Hide resolved
@tswast tswast merged commit 0f6fa6b into googleapis:master Aug 23, 2019
HemangChothani pushed a commit to HemangChothani/google-cloud-python that referenced this pull request Aug 29, 2019
* added dry_run option to bigquery magics. when --dry_run flag is present, a QueryJob object is returned for inspection instead of an empty DataFrame
* print estimated bytes instead of total bytes
* updated docstring for _AsyncJob._begin
* Update docstring for QueryJob._begin
* added SQL query to error output and messaging for failure to save to variable in magics

Co-Authored-By: Peter Lamut <plamut@users.noreply.github.com>
Co-Authored-By: Tim Swast <swast@google.com>
emar-kar pushed a commit to MaxxleLLC/google-cloud-python that referenced this pull request Sep 18, 2019
* added dry_run option to bigquery magics. when --dry_run flag is present, a QueryJob object is returned for inspection instead of an empty DataFrame
* print estimated bytes instead of total bytes
* updated docstring for _AsyncJob._begin
* Update docstring for QueryJob._begin
* added SQL query to error output and messaging for failure to save to variable in magics

Co-Authored-By: Peter Lamut <plamut@users.noreply.github.com>
Co-Authored-By: Tim Swast <swast@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants