Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integers exceeding the bigquery integer limit are still converted to integer in the schema #18

Closed
alvinburgos opened this issue Jan 9, 2019 · 2 comments

Comments

@alvinburgos
Copy link

To replicate:

test.json:

{"name": "111222333444555666777"}
{"name": "111222333444555666777"}

Expected:

 % python3 -m bigquery_schema_generator.generate_schema --keep_nulls < ../data/test.json
INFO:root:Processed 2 lines
[
  {
    "mode": "NULLABLE",
    "name": "name",
    "type": "STRING"
  }
]

Actual:

 % python3 -m bigquery_schema_generator.generate_schema --keep_nulls < ../data/test.json
INFO:root:Processed 2 lines
[
  {
    "mode": "NULLABLE",
    "name": "name",
    "type": "INTEGER"
  }
]
@bxparks
Copy link
Owner

bxparks commented Jan 9, 2019

Thanks for the bug report!
This is an unintended consequence of PR #15 which replicates the behavior of bq load which infers the integer and date types inside quoted strings. It looks like we have to limit the integer range to what bq load does. I'm going to guess it's either 64-bit signed integer or 53-bit signed integer. Probably won't get to this today, but likely tomorrow.

bxparks added a commit that referenced this issue Jan 10, 2019
@bxparks
Copy link
Owner

bxparks commented Jan 10, 2019

It turns out that when an integer overflows a signed 64-bit, bq load infers that value to be a FLOAT instead of a STRING. The same thing happens when the integer is inside a quoted string. So in your example, bq load produces:

[
  {
    "mode": "NULLABLE",
    "name": "name",
    "type": "FLOAT"
  }
]

It could be argued that STRING makes more sense for an overflowing integer inside a quoted string, but unfortunately, that's not what bq load seems to do. My fix follows the bq load convention, and produces a FLOAT (or a QFLOAT internally for quoted large integers).

This fix is on the develop branch. Do you want to give it a try before I merge it into master?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants