-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(python): Expose infer_schema_length
parameter on read_database
#15076
feat(python): Expose infer_schema_length
parameter on read_database
#15076
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems fair enough. Just for my understanding though: why does read_database
need to do schema inference at all? Don't databases have a strict schema that we can use? I guess we cannot account for all possible third party data types so we look at the data instead?
The query result's cursor "description" property has a I do have a "TODO" to improve this on our side, but it's non-trivial (which I know well, because I have written code that does exactly this at work, but it's not simple to do the same for Polars, as I have an extensive dtype-translation architecture running to thousands of lines of code by itself in place there ;) (Also there are backends like SQLite that don't populate the cursor |
5a9fea9
to
eb0d76e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation!
read_database
infer_schema_length
parameter on read_database
Closes #15059.
If not using an Arrow-aware driver or the "schema_overrides" parameter, and a column starts with > 100
null
values, we need to expose the "infer_schema_length" parameter to allow for more generous dtype inference. (The other options are preferred, but we still need to make this parameter available if they cannot be used).