improving performance when using duckdb #637

edublancas · 2023-06-21T21:13:24Z

In #470, we made the ResultSet object lazy, which improves performance when running queries that fetch a lot of data.

However, there's still a bottleneck when converting DuckDB results into pandas dataframes. DuckDB offers a .df() method that can efficiently convert DuckDB's results into a pandas data frame. When the autopandas option is turned on, the .df() is used (see #469); however when autopandas is off, and users call:

%sql df << SELECT * FROM ...
df.DataFrame()

.DataFrame() won't use DuckDB's .df() method. We need to change that and ensure that .df() is always used whenever we are using DuckDB (as this is a DuckDB-specific feature). Once this is implemented, we can get rid of the logic introduced by #469 (here), and just call .DataFrame()

The text was updated successfully, but these errors were encountered:

edublancas mentioned this issue Jun 21, 2023

add some benchmarking tests #638

Closed

edublancas added stash Label used to categorize issues that will be worked on next med complexity labels Jun 21, 2023

edublancas mentioned this issue Jul 12, 2023

duckdb + polars performance improvements #725

Merged

4 tasks

edublancas closed this as completed in #725 Jul 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improving performance when using duckdb #637

improving performance when using duckdb #637

edublancas commented Jun 21, 2023

improving performance when using duckdb #637

improving performance when using duckdb #637

Comments

edublancas commented Jun 21, 2023