Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The DB API Binary function should accept bytes. #628

Closed
jimfulton opened this issue Apr 26, 2021 · 3 comments · Fixed by #630
Closed

The DB API Binary function should accept bytes. #628

jimfulton opened this issue Apr 26, 2021 · 3 comments · Fixed by #630
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. dbapi type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@jimfulton
Copy link
Contributor

(3.8) jim@ds9:~/p/g/python-bigquery-sqlalchemy$ python
Python 3.8.5 (default, Jan 27 2021, 15:41:15) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import google.cloud.bigquery.dbapi
>>> google.cloud.bigquery.dbapi.Binary(b'x')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jim/p/g/python-bigquery/google/cloud/bigquery/dbapi/types.py", line 42, in Binary
    return string.encode("utf-8")
AttributeError: 'bytes' object has no attribute 'encode'

Bytes are the most common way to represent binary data. Accepting strings, as it does now seems at best to be a convenience and at worst a bug magnet.

In SQLAlchemy, if you defined a model that has a binary attribute, you'd store bytes data in it, but that would break for bigquery, di to this issue.

Sqlite's Binary function requires bytes data.

I propose to change the function to accept bytes. For the sake of backward compatibility, I propose to continue to accept strings.

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Apr 26, 2021
@tswast tswast added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Apr 26, 2021
@tswast
Copy link
Contributor

tswast commented Apr 26, 2021

This makes sense to me. (Both continue to accept strings, and also support bytes)

I'm having trouble finding in the REST docs how we expect BYTES data type to be encoded, but presumably we have other support for this outside of the DB-API which we can reference.

@jimfulton
Copy link
Contributor Author

At a higher level:

(unit-3-9) jim@ds9:~/p/g/python-bigquery$ python
Python 3.9.4 (default, Apr  8 2021, 08:01:43) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import google.cloud.bigquery.dbapi
/home/jim/p/g/python-bigquery/.nox/unit-3-9/lib/python3.9/site-packages/pandas/compat/__init__.py:97: UserWarning: Could not import the lzma module. Your installed Python is incomplete. Attempting to use lzma compression will result in a RuntimeError.
  warnings.warn(msg)
>>> conn = google.cloud.bigquery.dbapi.connect()
>>> cursor = conn.cursor()
>>> cursor.execute("select %(:BYTES)s", [b'xxx'])
>>> list(cursor)
[Row((b'xxx',), {'f0_': 0})]
>>> cursor.execute("select %(:BYTES)s", ['yyy'])
>>> list(cursor)
[Row((b'\xcb,',), {'f0_': 0})]

Note that bytes input is handled correctly, but string input isn't.

The Binary function doesn't appear to be used by the Python client itself. When running unit tests, it's only called by a test that tests that it encodes strings.

@jimfulton
Copy link
Contributor Author

I'm having trouble finding in the REST docs how we expect BYTES data type to be encoded, but presumably we have other support for this outside of the DB-API which we can reference.

From: https://cloud.google.com/bigquery/docs/reference/rest/v2/StandardSqlDataType

"Encoded as a base64 string per RFC 4648, section 4."

It appears that we encode this properly only for bytes data:

def _bytes_to_json(value):
"""Coerce 'value' to an JSON-compatible representation."""
if isinstance(value, bytes):
value = base64.standard_b64encode(value).decode("ascii")
return value

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. dbapi type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants