Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery: load_table_from_dataframe Not working with string columns #9007

Closed
milonimrod opened this issue Aug 12, 2019 · 4 comments · Fixed by #9022
Closed

BigQuery: load_table_from_dataframe Not working with string columns #9007

milonimrod opened this issue Aug 12, 2019 · 4 comments · Fixed by #9022
Assignees
Labels
api: bigquery Issues related to the BigQuery API. type: docs Improvement to the documentation for an API.

Comments

@milonimrod
Copy link

milonimrod commented Aug 12, 2019

Environment details

Running on Ubuntu 16.04 with Python 2.7.14 (google-cloud-bigquery==1.17.0)

Code example

I'm running the code example showed here

Created the following table:
image

dataset_ref = bq_client.dataset('rwr')
table_ref = dataset_ref.table("monty_python")
records = [
    {"title": "The Meaning of Life", "release_year": 1983},
    {"title": "Monty Python and the Holy Grail", "release_year": 1975},
    {"title": "Life of Brian", "release_year": 1979},
    {"title": "And Now for Something Completely Different", "release_year": 1971},
]
# Optionally set explicit indices.
# If indices are not specified, a column will be created for the default
# indices created by pandas.
index = ["Q24980", "Q25043", "Q24953", "Q16403"]
dataframe = pd.DataFrame(records, index=pd.Index(index, name="wikidata_id"))

job = bq_client.load_table_from_dataframe(dataframe, table_ref, location="US")

job.result()  # Waits for table load to complete.

using the following code for the insert works well:

bq_client.insert_rows_json(table_ref, dataframe.to_dict(orient='records'))

Stack trace

---------------------------------------------------------------------------
BadRequest                                Traceback (most recent call last)
<ipython-input-105-5021150a87a3> in <module>()
     15 job = bq_client.load_table_from_dataframe(dataframe, table_ref, location="US")
     16 
---> 17 job.result()  # Waits for table load to complete.

/home/nimrodm/miniconda/envs/garage/lib/python2.7/site-packages/google/cloud/bigquery/job.pyc in result(self, timeout, retry)
    731             self._begin(retry=retry)
    732         # TODO: modify PollingFuture so it can pass a retry argument to done().
--> 733         return super(_AsyncJob, self).result(timeout=timeout)
    734 
    735     def cancelled(self):

/home/nimrodm/miniconda/envs/garage/lib/python2.7/site-packages/google/api_core/future/polling.pyc in result(self, timeout)
    125             # pylint: disable=raising-bad-type
    126             # Pylint doesn't recognize that this is valid in this case.
--> 127             raise self._exception
    128 
    129         return self._result

BadRequest: 400 Provided Schema does not match Table tr:rwr.monty_python. Field title has changed type from STRING to BYTES
@tseaver tseaver added api: bigquery Issues related to the BigQuery API. type: question Request for information or clarification. Not an issue. labels Aug 12, 2019
@plamut
Copy link
Contributor

plamut commented Aug 13, 2019

I can confirm that the issue is reproducible under Python 2.7.

@plamut
Copy link
Contributor

plamut commented Aug 13, 2019

@milonimrod The reason for this behavior is that under Python 2.7, the string literals in data that is passed to the dataframe are actually bytes, not text (i.e. unicode), but the table column type is set to STRING.

This can be fixed by using Unicode string literals for all string columns (including the index):

10,13c10,13
<     {"title": "The Meaning of Life", "release_year": 1983},
<     {"title": "Monty Python and the Holy Grail", "release_year": 1975},
<     {"title": "Life of Brian", "release_year": 1979},                                                                                                                                                                                                                                                                                                                                                                           
<     {"title": "And Now for Something Completely Different", "release_year": 1971},                                                                                                                                                                                                                                                                                                                                              
---                                                                                                                                                                                                                                                                                                                                                                                                                               
>     {"title": u"The Meaning of Life", "release_year": 1983},                                                                                                                                                                                                                                                                                                                                                                    
>     {"title": u"Monty Python and the Holy Grail", "release_year": 1975},                                                                                                                                                                                                                                                                                                                                                        
>     {"title": u"Life of Brian", "release_year": 1979},                                                                                                                                                                                                                                                                                                                                                                          
>     {"title": u"And Now for Something Completely Different", "release_year": 1971},                                                                                                                                                                                                                                                                                                                                             
19c19                                                                                                                                                                                                                                                                                                                                                                                                                             
< index = ["Q24980", "Q25043", "Q24953", "Q16403"]                                                                                                                                                                                                                                                                                                                                                                                
---                                                                                                                                                                                                                                                                                                                                                                                                                               
> index = [u"Q24980", u"Q25043", u"Q24953", u"Q16403"] 

With this adjustment, the posted example works just fine under Python 2.7, too.

With that said, the examples from the docs should be updated to work as is. Thanks for reporting this!

@plamut plamut added type: docs Improvement to the documentation for an API. and removed type: question Request for information or clarification. Not an issue. labels Aug 13, 2019
@plamut plamut assigned plamut and unassigned tswast Aug 13, 2019
@milonimrod
Copy link
Author

In order to solve this on a pre-existing dataframe the solution I've used:

dataframe["title"] = dataframe["title"].astype(unicode)
dataframe.index = dataframe.index.astype(unicode)

@tanduong
Copy link

Small correction:

dataframe["title"] = dataframe["title"].astype('unicode')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. type: docs Improvement to the documentation for an API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants