Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --params option to %%bigquery magic #6277

Merged
merged 8 commits into from
Oct 30, 2018

Conversation

guillermo-carrasco
Copy link

Often it is useful to be able to parametrize query strings. While there is the option of parametrizing queries using the raw sdk, there is no option to do so using the %%bigquery magic in Jupyter notebooks.

This is useful for several reasons, specially when the same parameter appears several times in the query. This could be for example a cutoff date appearing in several JOIN conditions.

This PR adds the option --params to the %%bigquery magic for Jupyter notebooks. --params accepts a JSON string that will be used to format the string contained in the query.

An example of how this would work:

%%bigquery df --params {"max_question_length":300,"limit":10}

SELECT
      posts.id AS post_id,
      posts.creation_date AS post_creation_date,
      posts.body AS question
FROM
  `bigquery-public-data.stackoverflow.posts_questions` posts
WHERE
  LENGTH(posts.body) < {max_question_length}
LIMIT {limit}

@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Oct 19, 2018
bigquery/google/cloud/bigquery/magics.py Outdated Show resolved Hide resolved
bigquery/google/cloud/bigquery/magics.py Outdated Show resolved Hide resolved
bigquery/tests/unit/test_magics.py Outdated Show resolved Hide resolved
@guillermo-carrasco
Copy link
Author

@tseaver I am not familiar with the review process in these repositories. Should I push a new commit with the suggested changes, and rebase at the end? or should I rebase now?

Thanks for reviewing!

@tseaver
Copy link
Contributor

tseaver commented Oct 19, 2018

@guillermo-carrasco

I am not familiar with the review process in these repositories. Should I push a new commit with the suggested changes, and rebase at the end? or should I rebase now?

Push a new commit on the same branch. I will review those changes. We iterate until I hit the "approve" button for the review, and then I hit the merge button.

Occasionally I might ask for a rebase against the current master (e.g., if the CI machinery gets confused about what has changed), but I wouldn't ask that you squash commits in such a rebase: instead, when I merge, I will squash all commits for the PR into one.

Thanks for reviewing!

Thank you for the patch!

@guillermo-carrasco
Copy link
Author

@tseaver thanks for the clarification. I just amended the suggested changes.

@tseaver tseaver added the api: bigquery Issues related to the BigQuery API. label Oct 22, 2018
@tseaver
Copy link
Contributor

tseaver commented Oct 22, 2018

@shollyman, @tswast I'm fine with the patch as it stands now. Please comment on suitability from the DPE side.

@tswast tswast requested a review from alixhami October 25, 2018 23:58
bigquery/google/cloud/bigquery/magics.py Outdated Show resolved Hide resolved
@guillermo-carrasco
Copy link
Author

This new commit makes it easier to use dictionaries with the %%bigquery magic. Now one can pass in a previously built dictionary, instead of having to write it down in the magic line:

In [1]: params = {"min_word_count": 200, "corpus": "romeoandjuliet"}

In [2]: %%bigquery df --params $params
    ...: 
    ...: SELECT word, word_count
    ...: FROM `bigquery-public-data.samples.shakespeare`
    ...: WHERE corpus = "{corpus}"
    ...: AND word_count >= {min_word_count}
    ...: ORDER BY word_count DESC;
    ...: 

@guillermo-carrasco
Copy link
Author

@tswast please disregard my last comment. I have folowed your suggestion and used to_query_parameters from BigQuery's built-in query parameter handling. Now the parametrizing in notebooks works in line with the SDK:

In [1]: params = {"min_word_count": 200, "corpus": "romeoandjuliet"}

In [2]: %%bigquery df --params $params
   ...: 
   ...: SELECT word, word_count
   ...: FROM `bigquery-public-data.samples.shakespeare`
   ...: WHERE corpus = @corpus
   ...: AND word_count >= @min_word_count
   ...: ORDER BY word_count DESC;

Copy link
Contributor

@alixhami alixhami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution. I have some comments on how to document this addition, because we want to be really clear to users how to use this.

Also something to note for any future contributions is that this magic is intentionally simple and is not intended to duplicate all functionality of the library. We want to avoid the maintenance burden of duplicating all the library's functionality and also keep the interface simple because magics are generally used as short hand for simple operations.

bigquery/google/cloud/bigquery/magics.py Outdated Show resolved Hide resolved
bigquery/google/cloud/bigquery/magics.py Outdated Show resolved Hide resolved
bigquery/google/cloud/bigquery/magics.py Outdated Show resolved Hide resolved
bigquery/tests/unit/test_magics.py Outdated Show resolved Hide resolved
bigquery/tests/unit/test_magics.py Outdated Show resolved Hide resolved
bigquery/google/cloud/bigquery/magics.py Show resolved Hide resolved
bigquery/google/cloud/bigquery/magics.py Outdated Show resolved Hide resolved
Copy link
Contributor

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the helper. LGTM once you make the changes @alixhami requested.

@guillermo-carrasco
Copy link
Author

Sorry for closing, it was a mistake. Thanks for the review @alixhami. I'll implement the suggested changes.

@guillermo-carrasco
Copy link
Author

@alixhami I have implemented the following changes:

  • Added a test for expanding a dictionary
  • Importing module instead of function
  • Fixed df typo (also in the other test)

I am not sure that changing the docstring to reflect the dict approach is what we want though. It is still a JSON string what is sent in the cell magic. It's just that it is also possible to expand a dictionary instead of writing the JSON string. But at the end, it is a JSON string what is treated in _cell_magic. Do you still think that I should change the documentation? I have added usage examples for both cases.

Regarding contributions: I totally understand, and I appreciate you're taking the time to review this. Query parametrization is something we use a lot, and we have a heavy notebook-drived development, so this would be a great addition for us.

Thanks again!

@alixhami
Copy link
Contributor

Thanks @guillermo-carrasco!

@tswast - what is your preference on how this parameter should be documented?

Copy link
Contributor

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with the current documentation, given the fact that all magic arguments are passed in as strings.

@guillermo-carrasco
Copy link
Author

I've slightly changed the docstring now. I had to remove the dictionary reference example from the documentation. The reason is that since $ is ipython specific syntax, sphinx is failing on building the documentation. I couldn't find a way to overcome this. I'm open to suggestions, thanks!

@alixhami
Copy link
Contributor

@guillermo-carrasco I'm looking into the docs build issue now. I think that example is very helpful, so I'm going to see if there's a way to include it without throwing errors.

@tseaver
Copy link
Contributor

tseaver commented Oct 30, 2018

@alixhami PTAL one final pass.

@alixhami
Copy link
Contributor

Ok I pushed some commits, which make the following updates:

  • Remove python syntax highlighting (because it isn't actually python anyway), so the dict params example could be added back in
  • Fix lint and coverage
  • Update the documentation for the params parameter to be more detailed and point to the relevant examples. The trade off I made here is favoring clarity at the expense of being repetitive. I ran the docstring updates by @tswast offline.

@tseaver tseaver merged commit f32d7ad into googleapis:master Oct 30, 2018
@guillermo-carrasco
Copy link
Author

amazing, thanks a lot for all the help and reviews :)

@guillermo-carrasco guillermo-carrasco deleted the parametrized-queries branch October 30, 2018 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants