Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BIGNUMERIC support #367

Closed
tswast opened this issue Nov 4, 2020 · 16 comments · Fixed by #527
Closed

BIGNUMERIC support #367

tswast opened this issue Nov 4, 2020 · 16 comments · Fixed by #527
Assignees
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@tswast
Copy link
Contributor

tswast commented Nov 4, 2020

BIGNUMERIC is a new type with even larger decimal precision than NUMERIC. We should ensure we can go to/from decimal.decimal and this new data type.

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Nov 4, 2020
@tswast tswast added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Nov 4, 2020
@HemangChothani
Copy link
Contributor

@tswast Need help! What is the precision for BIGNUMERIC when pyarrow converison come into the picture , like for NUMERIC
we have pyarrow.decimal128(38, 9) or any other way to differentiate NUMERIC and BIGNUMERIC

@tswast
Copy link
Contributor Author

tswast commented Nov 9, 2020

From our internal docs:

The BIGNUMERIC data type is an exact numeric value with more than 76 digits of precision and exactly 38 decimal digits of scale. Precision is the number of digits that the number contains. Scale is how many of these digits appear after the decimal point

@tswast
Copy link
Contributor Author

tswast commented Nov 9, 2020

I don't think 128 bits will be enough for BIGNUMERIC. We should create a table and try to read it with the BigQuery Storage API to see what pyarrow schema it returns.

@tswast
Copy link
Contributor Author

tswast commented Nov 12, 2020

We'll need to use pyarrow.decimal256 for this type.

@HemangChothani
Copy link
Contributor

I think pyarrow.decimal256 is not released yet ,checked with the latest release 2.0.0. Done with the development, for the tests i will wait for the feature to be released.

@tswast
Copy link
Contributor Author

tswast commented Nov 16, 2020

@emkornfield Do you know when decimal256 will be available in pyarrow? It'll be necessary for pandas integration with BIGNUMERIC.

@emkornfield
Copy link

It will be in the next release (3.0.0, January-ish). Should be in nightly builds now but there are some remaining issues to get Python support working.

tswast added a commit that referenced this issue Dec 3, 2020
tswast added a commit that referenced this issue Dec 3, 2020
gcf-merge-on-green bot pushed a commit that referenced this issue Dec 4, 2020
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
- [x] Make sure to open an issue as a [bug/issue](https://github.com/googleapis/python-bigquery/issues/new/choose) before writing your code!  That way we can discuss the change, evaluate designs, and agree on the general idea
- [x] Ensure the tests and linter pass
- [x] Code coverage does not decrease (if any source code was changed)
- [ ] Appropriate docs were updated (if necessary)

Towards #367 🦕
@tswast
Copy link
Contributor Author

tswast commented Dec 21, 2020

Since we're using the BQ Storage API (and thus Arrow) for the DB-API connector, we aren't able to support BIGNUMERIC fully in the DB-API until the Arrow release that adds 256-bit decimal support

____________________ test_w_standard_sql_types[SELECT BIGNUMERIC "3.141592653589793238462643383279502884"-expected11] _____________________

sql = 'SELECT BIGNUMERIC "3.141592653589793238462643383279502884"', expected = Decimal('3.141592653589793238462643383279502884')
cursor = <google.cloud.bigquery.dbapi.cursor.Cursor object at 0x7fe402c2c7c0>

    @pytest.mark.parametrize(
        ("sql", "expected"),
        helpers.STANDARD_SQL_EXAMPLES
    )
    def test_w_standard_sql_types(sql, expected, cursor):
        cursor.execute(sql)
        assert cursor.rowcount == 1
>       row = cursor.fetchone()

tests/system/test_dbapi.py:56: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
google/cloud/bigquery/dbapi/_helpers.py:255: in with_closed_check
    return method(self, *args, **kwargs)
google/cloud/bigquery/dbapi/cursor.py:292: in fetchone
    return six.next(self._query_data)
google/cloud/bigquery/dbapi/_helpers.py:240: in <genexpr>
    return (to_table_row(row_data) for row_data in rows_iterable)
../python-bigquery-storage/google/cloud/bigquery_storage_v1/reader.py:274: in __iter__
    for row in page:
../python-bigquery-storage/google/cloud/bigquery_storage_v1/reader.py:441: in next
    return six.next(self._iter_rows)
../python-bigquery-storage/google/cloud/bigquery_storage_v1/reader.py:629: in to_rows
    record_batch = self._parse_arrow_message(message)
../python-bigquery-storage/google/cloud/bigquery_storage_v1/reader.py:650: in _parse_arrow_message
    self._parse_arrow_schema()
../python-bigquery-storage/google/cloud/bigquery_storage_v1/reader.py:661: in _parse_arrow_schema
    self._schema = pyarrow.ipc.read_schema(
pyarrow/ipc.pxi:718: in pyarrow.lib.read_schema
    ???
pyarrow/error.pxi:122: in pyarrow.lib.pyarrow_internal_check_status
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   pyarrow.lib.ArrowInvalid: Library only supports 128-bit decimal values

pyarrow/error.pxi:84: ArrowInvalid

@emkornfield
Copy link

you could potentially test against pyarrow nightly builds if you want to verify how this works.

@tswast
Copy link
Contributor Author

tswast commented Dec 21, 2020

Yeah, testing with nightly #95 is on my list to try out over the holidays.

@HemangChothani
Copy link
Contributor

I have tested BIGNUMERIC before with master branch and think it works fine.

from google.cloud import bigquery
from google.cloud.bigquery import dbapi

client = bigquery.Client()
cursor = dbapi.connect(client).cursor()
sql = 'SELECT BIGNUMERIC "3.141592653589793238462643383279502884"'
cursor.execute(sql)
cursor.fetchone()   #print Row((Decimal('3.14159265358979323846264338327950288400'),), {'f0_': 0})

@tswast
Copy link
Contributor Author

tswast commented Jan 28, 2021

@plamut Would you have the bandwidth to take this issue?

#447 can be a good starting point, but as discussed in #488 (comment) I'd very much like pyarrow>=3.0 to be optional (only required when actually using BIGNUMERIC). This is because pyarrow is a core library across many projects in the data science community, so forced upgrades will almost certainly cause customer issues because of version conflicts.

We should continue to run tests against pyarrow==1.0 in our Python 3.6 tests. Any tests that use BIGNUMERIC should be separated from the other pandas & DB-API tests. They should be skipped using pytest's skip mechanism, parsing the version string from pyarrow.__version__ and comparing to >= the parsed version of "3.0.0".

@plamut
Copy link
Contributor

plamut commented Jan 28, 2021

@tswast Depends on how urgent it is (I have some PubSub work on my hands ATM), but I can probably at least have a look in a week's time or so - would that be fine?

@tswast
Copy link
Contributor Author

tswast commented Jan 28, 2021

Yes. We can wait a few more weeks. Just wanted to check in since this is now unblocked.

@shollyman
Copy link
Contributor

Looking at this, it appears we may need some proto publishing/wrangling for the standardsql type mapping as well.

@plamut
Copy link
Contributor

plamut commented Feb 16, 2021

@shollyman Good to know, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
5 participants