-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improving performance when converting DuckDB's results to pandas #451
Comments
I was just recently comparing timings between DuckDB directly via the Python API compared with using via a JupySQL The idea in #470 to make ResultSets lazy is good. But I also wonder if it could be good to provide more options to users to avoid the need for using ResultSets as much as possible. I like how magic_duckdb has done this, by allowing users to specify the result type as any format that DuckDB can export to (Pandas, Arrow, Polars), or ask for a DuckDB relation back. The option to return a DuckDB relation is nice because then it enables workflows that make use of the DuckDB Python relational API. Personally, when using DuckDB in a notebook, of the two, I'm always going to want a DuckDB relation over a ResultSet. |
hi @ned2, thanks a lot for your feedback! We inherited ResultSet from ipython-sql so my default thought was to make it better. but I see your point that in many cases, users will convert it to another format anyway. I'll ensure we take your suggestions into account, I'm still unsure what the best API so I'll keep you in the loop for feedback! |
no worries, glad it's helpful! oh and another I just thought of in favour of making the DuckDB relation available as a result type is that it's already lazy |
we got some competition 😁: https://github.com/iqmo-org/magic_duckdb/blob/main/notebooks/benchmarking.ipynb
sqlalchemy is adding a lot of overhead when converting DuckDB results to pandas, the fix is simple, we should use DuckDB's native
.df()
method and bypass sqlalchemy.The text was updated successfully, but these errors were encountered: