-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ComputeError for read_database suggests to infer_schema_length but no option to do so with SQLAlchemy Connection #11912
Comments
Hmm; I can't see an obvious explanation for this in the standard (default) non-batched call to You haven't got any other parameters involved, such as "schema_overrides", "iter_batches", etc? |
No, I'm not using any other parameters. The defaults to those extra parameters are being passed. This is for SQL Server, where SQLAlchemy is leveraging pyodbc. SQLAlchemy reads the data types for the 24 table fields as [<class 'str'>, <class 'str'>, <class 'int'>, <class 'str'>, <class 'int'>, <class 'str'>, <class 'str'>, <class 'int'>, <class 'str'>, <class 'str'>, <class 'int'>, <class 'bool'>, <class 'bool'>, <class 'str'>, <class 'bool'>, <class 'int'>, <class 'str'>, <class 'str'>, <class 'datetime.datetime'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'decimal.Decimal'>, <class 'int'>] I can run it with pandas and even then convert it to an arrow table without any issue. After converting to an arrow table, the data types for the table fields become [string, string, int64, string, int64, string, string, int64, string, string, double, bool, bool, string, bool, double, string, string, timestamp[ns], null, double, double, double, double] Perhaps something is happening w/ the datetime conversion? My backup, currently, is to execute using pandas and then convert to polars, but would of course like to skip that step entirely. |
Somewhat to my surprise it looks like there is a free/test instance of SQL Server available for Ubuntu; while I actually run macOS, I do have a virtualised x86 Ubuntu available on my NAS (which is what I used to validate Any special |
No, no special connection settings really. The SQL Server connection uses username/pw authentication instead of windows authentication, but doubt that should really matter. I'm using SQLAlchemy's URL function to create the URI which is then fed into an engine to connect..
(I think under the hood, SQLAlchemy uses pyodbc as the ODBC connection driver. ) |
(I am not an English speaker, so the text may not be appropriate.) I have confirmed that when I run get_database in a sqlalchemy environment, the datetime type(sqlalchemy.DateTime) is in error. error message
This is a tentative response, but it worked as expected by #11246 (comment). If I were to respond more appropriately, I should implement #11246 (comment), but I have not. scripts and detail error logsscripts(Functions are expanded for illustrative purposes, but options, etc. remain unchanged.) return pl.read_database(
query=query, # str query
connection=sqlalchemy.create_engine(uri).connect(),
) errored sqlalchemy columns settingfrom sqlalchemy import Column as C
from sqlalchemy import Integer, String, DateTime, Boolean
# ...
postTime = C(DateTime(timezone=True), nullable=False, index=True) error logs
fixed scriptsscripts(Functions are expanded for illustrative purposes, but options, etc. remain unchanged.) return pl.read_database(
query=query, # str query
connection=sqlalchemy.create_engine(uri).connect(),
schema_overrides={ 'Race.postTime': pl.Datetime }
) Hope this helps you. Thanks. |
Hi @alexander-beedie & @stinodego , any update on this issue? |
Will revisit this; I had a hell of a time trying to get a working SQL Server up the last time I tried to dig in to this... I'll try again 🤔 In the meantime, can you supply the database-side DDL for the table you're reading from, so I can see the exact SQLServer types? (rather than the alchemy/pandas/arrow types) |
Hmm, think I've got something; was able to (finally 😓) get a suitable Docker image up locally, make Do you happen to have a column of type (Also: you might want to take a look at using |
Checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.
Reproducible example
Log output
No response
Issue description
When executing a parameterized query with the read_database function, using a SLQAlchemy connection, I get a compute error that suggests increase the infer_schema_length. There's no available option to do that w/ the read_database function at this time.
I'm getting an error that reads
could not append value: "Date_" of type: str to the builder; make sure that all rows have the same schema or consider increasing
infer_schema_length
it might also be that a value overflows the data-type's capacity
The issue seems to reside when putting the rows into a DataFrame., as I'm able to use SQLAlchemy's exec_driver_sql method to get the output in rows.
Expected behavior
The expected output should be a 8524 x 24 DataFrame
Installed versions
The text was updated successfully, but these errors were encountered: