Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improving performance when using duckdb #637

Closed
edublancas opened this issue Jun 21, 2023 · 0 comments · Fixed by #725
Closed

improving performance when using duckdb #637

edublancas opened this issue Jun 21, 2023 · 0 comments · Fixed by #725
Labels
stash Label used to categorize issues that will be worked on next

Comments

@edublancas
Copy link

In #470, we made the ResultSet object lazy, which improves performance when running queries that fetch a lot of data.

However, there's still a bottleneck when converting DuckDB results into pandas dataframes. DuckDB offers a .df() method that can efficiently convert DuckDB's results into a pandas data frame. When the autopandas option is turned on, the .df() is used (see #469); however when autopandas is off, and users call:

%sql df << SELECT * FROM ...
df.DataFrame()

.DataFrame() won't use DuckDB's .df() method. We need to change that and ensure that .df() is always used whenever we are using DuckDB (as this is a DuckDB-specific feature). Once this is implemented, we can get rid of the logic introduced by #469 (here), and just call .DataFrame()

@edublancas edublancas added stash Label used to categorize issues that will be worked on next med complexity labels Jun 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stash Label used to categorize issues that will be worked on next
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant