Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_database errors on read from an empty table #14906

Closed
2 tasks done
Sage0614 opened this issue Mar 7, 2024 · 5 comments · Fixed by #14916
Closed
2 tasks done

read_database errors on read from an empty table #14906

Sage0614 opened this issue Mar 7, 2024 · 5 comments · Fixed by #14916
Assignees
Labels
bug Something isn't working python Related to Python Polars

Comments

@Sage0614
Copy link

Sage0614 commented Mar 7, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

  1. create an empty table with zero records, for example, a table in ms-sql

  2. run pl.read_database with a arrow_odbc connection string

  3. run

query = f"select count(*) from {table}"
size = pl.read_database(query,conn).get_column("column_0")[0]

it will return size = 0, so it is not a connection issue

  1. run
query = f"select top 1 * from {table}"
res = pl.read_database(query,conn)

it will error out, complaining in /io/database.py/ line 407,258,251, and /convert.py line 614
with ValueError: Must pass schema, or at least one RecordBatch

Log output

No response

Issue description

seems polars is inferring the type from output of the database table record, it assumes the table has at least 1 record. however, if I just want to know the the mapping of database table schema to polars table schema, this method won't work, as there is 0 record in the provided database table.

Expected behavior

I am expecting I get an empty polars table with the schema matches database table schema, instead of getting an error,
as "select * from table" is a valid operation on an empty table in database.

Installed versions

--------Version info---------
Polars:               0.20.13
Index type:           UInt32
Platform:             Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
Python:               3.11.4 (main, Aug 24 2023, 11:18:03) [GCC 11.4.0]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fsspec:               2023.6.0
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           3.7.2
numpy:                1.25.2
openpyxl:             <not installed>
pandas:               2.0.3
pyarrow:              13.0.0
pydantic:             1.10.12
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           2.0.20
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>
@Sage0614 Sage0614 added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Mar 7, 2024
@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Mar 7, 2024

seems polars is inferring the type from output of the database table record

We don't actually do that (it would be very unreliable with null values, etc); we actually introspect the resulting cursor description, which is typically more robust. I can confirm that we don't have a generic problem with empty result sets (I just quickly tested on a local SQLite db).

Possibly an unexpected interaction between the ODBC / Arrow / Polars layers 🤔 Midnight here now, so I'll look at this in more detail tomorrow. (I have a SQL Server test database available in a Docker container, so I should be able to reproduce, as you mentioned using MSSQL).

@alexander-beedie alexander-beedie self-assigned this Mar 7, 2024
@alexander-beedie alexander-beedie removed the needs triage Awaiting prioritization by a maintainer label Mar 7, 2024
@Sage0614
Copy link
Author

Sage0614 commented Mar 7, 2024

I see thanks, I encounter this issue with MSSQL and arrow-odbc with ODBC Driver 17 for connection, to be specific.

@alexander-beedie
Copy link
Collaborator

Got it; was able to reproduce locally.
The issue is specific to ODBC, but the fix is straightforward 👌

@alexander-beedie
Copy link
Collaborator

@Sage0614: should be fixed - have a look at the just-released 0.20.15.

@Sage0614
Copy link
Author

@Sage0614: should be fixed - have a look at the just-released 0.20.15.

Just tried the new version, works correctly for me, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working python Related to Python Polars
Projects
None yet
2 participants