Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery: Fix bug where load_table_from_dataframe could not append to REQUIRED fields. #8230

Conversation

tswast
Copy link
Contributor

@tswast tswast commented Jun 5, 2019

If a BigQuery schema is supplied as part of the job_config, it can be
used to set the nullable bit correctly on the serialized parquet file.

Closes #8093.

…D fields.

If a BigQuery schema is supplied as part of the `job_config`, it can be
used to set the `nullable` bit correctly on the serialized parquet file.
@tswast tswast requested a review from a team June 5, 2019 22:14
@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Jun 5, 2019
@tswast tswast requested review from shollyman and plamut June 5, 2019 22:15
@tseaver tseaver changed the title Fix bug where load_table_from_dataframe could not append to REQUIRED fields. BigQuery: Fix bug where load_table_from_dataframe could not append to REQUIRED fields. Jun 6, 2019
Copy link
Contributor

@plamut plamut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: I figured out that the example in the issue description does not hit the to_parquet() line, because job_config.schema is None. Will try to figure out how to set that.


(disclaimer: my BQ knowledge is very limited)

Non-essential remark aside, the code changes look good to me all in all. I had some trouble verifying the fix, though.

I was able to reproduce the issue following the steps from description (had to switch "foo" and "bar" in the second-to-last line). When testing it again on the PR branch, however, the issue persisted, I again got the same error.

What could I be missing?

FWIW, I did make sure to re-install the bigquery library after pulling the PR code:

(venv-3.6) peter@black-box:~/workspace/google-cloud-python/bigquery (pr_temp)$ pip install -e .

arrow_names.append(bq_field.name)
arrow_arrays.append(bq_to_arrow_array(dataframe[bq_field.name], bq_field))

arrow_table = pyarrow.Table.from_arrays(arrow_arrays, names=arrow_names)
if all((field is not None for field in arrow_fields)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor)
As a sole argument, the generator expression does not have to be enclosed in an extra pair of parentheses.

Copy link
Contributor

@plamut plamut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update 2: I changed the last line of the example from the issue description to the following:

from google.cloud.bigquery import job
job_config = job.LoadJobConfig(schema=schema)

client.load_table_from_dataframe(
    df, table_ref, job_config=job_config
).result()

The error I then got was different, but seemed similar to the original one:

google.api_core.exceptions.BadRequest: 400 Error while reading data, error message: Provided schema is not compatible with the file 'prod-scotty-8efadb65-d51b-44ba-bfec-cf98d1e93934'. Field 'bar' is specified as REQUIRED in provided schema which does not match NULLABLE as specified in the file.

When I ran the modified example with the PR fix, the error disappeared. Seems like the fix works (and the new code path was indeed taken).

@plamut
Copy link
Contributor

plamut commented Jun 7, 2019

Based on my limited BQ knowledge, the fix seems to work and the code looks good, but I will wait with merging, since @shollyman might have something more to add.

(if not, then please feel free to go ahead and merge it)

Copy link
Contributor

@shollyman shollyman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this.

@plamut plamut merged commit 5c85d51 into googleapis:master Jun 7, 2019
@tswast tswast deleted the issue8093-load_table_from_dataframe-required-fields branch June 8, 2019 00:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BigQuery: Field <field> has changed mode from REQUIRED to NULLABLE
4 participants