Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): streamline adbc connectivity, adding snowflake support #9600

Merged
merged 2 commits into from
Jul 3, 2023

Conversation

alexander-beedie
Copy link
Collaborator

@alexander-beedie alexander-beedie commented Jun 28, 2023

Should close #9569, by better generalising adbc module inference/connection.

Will need confirmation about snowflake, as I don't actually have access to an instance, but our existing support for postgres/sqlite is confirmed to work with the updated code (I spun-up a local docker postgres image to validate it).

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars labels Jun 28, 2023
@jonashaag
Copy link
Contributor

@alexander-beedie let me know if you want help testing this on a Snowflake instance.

@alexander-beedie
Copy link
Collaborator Author

@alexander-beedie let me know if you want help testing this on a Snowflake instance.

Sounds good! If you're able to grab the PR and try it out in advance that would be much appreciated :)

@ritchie46
Copy link
Member

@jonashaag will you test this PR before we merge it?

@jonashaag
Copy link
Contributor

I can test today.

@jonashaag
Copy link
Contributor

jonashaag commented Jun 29, 2023

For the purposes of this PR, I think we can say that it works.

A few problems:

  • I got the following confusing error message when I forgot to select a warehouse. Not sure if this is something that can be improved in Polars.
    ERRO[0238]connection.go:435 gosnowflake.(*snowflakeConn).QueryArrowStream error: 000606 (57P03): No active warehouse selected in the current session.  Select an active warehouse with the 'use warehouse' command.
    ---------------------------------------------------------------------------
    InternalError                             Traceback (most recent call last)
    File ~/p/polars/py-polars/polars/io/database.py:143, in _read_sql_adbc(query, connection_uri)
        142 cursor = conn.cursor()
    --> 143 cursor.execute(query)
        144 tbl = cursor.fetch_arrow_table()
    
    File ~/p/polars/py-polars/.venv/lib/python3.9/site-packages/adbc_driver_manager/dbapi.py:604, in Cursor.execute(self, operation, parameters)
        603 self._prepare_execute(operation, parameters)
    --> 604 handle, self._rowcount = self._stmt.execute_query()
        605 self._results = _RowIterator(
        606     pyarrow.RecordBatchReader._import_from_c(handle.address)
        607 )
    
    File ~/p/polars/py-polars/.venv/lib/python3.9/site-packages/adbc_driver_manager/_lib.pyx:991, in adbc_driver_manager._lib.AdbcStatement.execute_query()
    
    File ~/p/polars/py-polars/.venv/lib/python3.9/site-packages/adbc_driver_manager/_lib.pyx:385, in adbc_driver_manager._lib.check_error()
    
    InternalError: ADBC_STATUS_INTERNAL (9): [Snowflake] 000606 (57P03): No active warehouse selected in the current session.  Select an active warehouse with the 'use warehouse' command.
    
    
    During handling of the above exception, another exception occurred:
    
    RuntimeError                              Traceback (most recent call last)
    Cell In[5], line 1
    ----> 1 df = pl.read_database("select * from ...", engine="adbc")
    
    File ~/p/polars/py-polars/polars/io/database.py:108, in read_database(query, connection_uri, partition_on, partition_range, partition_num, protocol, engine)
        106     if not isinstance(query, str):
        107         raise ValueError("Only a single SQL query string is accepted for adbc.")
    --> 108     return _read_sql_adbc(query, connection_uri)
        109 else:
        110     raise ValueError("Engine is not implemented, try either connectorx or adbc.")
    
    File ~/p/polars/py-polars/polars/io/database.py:145, in _read_sql_adbc(query, connection_uri)
        143     cursor.execute(query)
        144     tbl = cursor.fetch_arrow_table()
    --> 145     cursor.close()
        146 return from_arrow(tbl)
    
    File ~/p/polars/py-polars/.venv/lib/python3.9/site-packages/adbc_driver_manager/dbapi.py:217, in _Closeable.__exit__(self, exc_type, exc_val, exc_tb)
        216 def __exit__(self, exc_type, exc_val, exc_tb) -> None:
    --> 217     self.close()
    
    File ~/p/polars/py-polars/.venv/lib/python3.9/site-packages/adbc_driver_manager/dbapi.py:302, in Connection.close(self)
        299 if self._closed:
        300     return
    --> 302 self._conn.close()
        303 self._db.close()
        304 self._closed = True
    
    File ~/p/polars/py-polars/.venv/lib/python3.9/site-packages/adbc_driver_manager/_lib.pyx:855, in adbc_driver_manager._lib.AdbcConnection.close()
    
    File ~/p/polars/py-polars/.venv/lib/python3.9/site-packages/adbc_driver_manager/_lib.pyx:456, in adbc_driver_manager._lib._AdbcHandle._check_open_children()
    
    File ~/p/polars/py-polars/.venv/lib/python3.9/site-packages/adbc_driver_manager/_lib.pyx:458, in adbc_driver_manager._lib._AdbcHandle._check_open_children()
    
    RuntimeError: Cannot close AdbcConnection with open AdbcStatement
    
    
  • There is no way to pass a connection object to read_databases AFAICT, which makes it not obvious how to actually run that use warehouse command (you can't run multiple SQL statements in a single call to Snowflake, ie. you "can't use ;").
  • There was a panic in ADBC, see linked issue.
  • There is no way to cancel a running query using ^C, probably a problem in ADBC as well?

@alexander-beedie
Copy link
Collaborator Author

alexander-beedie commented Jun 30, 2023

Thanks for testing! Looks good to me, given that we just pass-through to the ADBC driver.

Does snowflake not allow you to specify the warehouse in the URI? Bit of a shame! If so, and given there isn’t an “executemany”, maybe we could sequentially execute ADBC statements given as a list… can return to that after a think 🤔

@jonashaag
Copy link
Contributor

It does allow you to, that's how I ended up testing this. Took me a while to think of that though.

@alexander-beedie
Copy link
Collaborator Author

It does allow you to, that's how I ended up testing this. Took me a while to think of that though.

Ahh, ok - I thought you had to issue a separate "USE WAREHOUSE ..." statement :) If you've got a sample URI with the warehouse option we could add an example of it into the docstring to help out any other snowflake users 👍

@ritchie46 ritchie46 merged commit 811d0c1 into pola-rs:main Jul 3, 2023
@alexander-beedie alexander-beedie deleted the adbc-snowflake-support branch July 3, 2023 09:16
@jonashaag
Copy link
Contributor

Sorry for late reply, here is what I ended up using

pl.read_database("select * from snowflake_sample_data.tpch_sf10.customer", "snowflake://user:pass@account.snowflakecomputing.com?warehouse=my_warehouse", engine="adbc")

@alexander-beedie
Copy link
Collaborator Author

alexander-beedie commented Jul 3, 2023

Thx :) The ADBC docs were lacking, but I eventually found the URI param/option referenced in snowflake's docs for sqlalchemy: https://docs.snowflake.com/en/developer-guide/python-connector/sqlalchemy#additional-connection-parameters

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Snowflake ADBC Driver Support
3 participants