Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(bigquery): add tests for concatenating categorical columns #10180

Merged
merged 1 commit into from
Jan 31, 2020

Conversation

plamut
Copy link
Contributor

@plamut plamut commented Jan 22, 2020

Closes #8044.

This PR adds tests for concatenating multiple-page results that contain categorical column data (both with or without pyarrow available).

It appears that the code already works, provided that a correct explicit categorical dtype is passed to to_dataframe() method.

If dtypes are not specified, a default pandas concatenating behavior results in categorical columns ending up with the dtype "object", but that's already explained in the dtypes parameter's docstring, meaning that the code behaves as advertised.

PR checklist

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

@plamut plamut added the api: bigquery Issues related to the BigQuery API. label Jan 22, 2020
@plamut plamut requested review from tswast and a team January 22, 2020 14:05
@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Jan 22, 2020
@plamut plamut changed the title chore(bigquery): add tests for concatenating categorical columns test(bigquery): add tests for concatenating categorical columns Jan 22, 2020
@plamut
Copy link
Contributor Author

plamut commented Jan 30, 2020

@tswast Friendly ping. :)

This PR is probably something than can be reviewed and closed quickly (read: before the repo split tomorrow).

Copy link
Contributor

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

bqstorage_client=bqstorage_client,
dtypes={
"col_category": pandas.core.dtypes.dtypes.CategoricalDtype(
categories=["low", "medium", "high"], ordered=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[no action required] I suspect there will be users who don't necessarily know the categories ahead of time, though this could trivially be done with a group by / distinct query. If we had a support cookbook, I'd say we should add this info there.

@plamut plamut merged commit 77dd923 into googleapis:master Jan 31, 2020
@plamut plamut deleted the iss-8044 branch January 31, 2020 07:12
This was referenced Jan 31, 2020
This was referenced Feb 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BigQuery: use union categorical to concatenate pages in to_dataframe when categorical dtype is requested
3 participants