-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactoring ResultSet #470
Comments
impportant: see this comment, @ned2 makes a good case for going an alternative route. rather than making ResultSet lazy, we could avoid them altogether. |
so to summarize: let's first work on the lazy loading approach since that will benefit all databases, once that's fixed, we can work on further DuckDB improvements in #536 |
@yafimvo: re counting total number of rows. fair point, if we want to, the only way would be to run a |
Interesting topic, I think it depends on database storage engine
Probably need to evaluate if the counting number is the bottleneck, if yes, I agree we might need to remove the message |
It would be great if there was a magic available to set the fetch size. For example when doing bulk ETL loads the fetch size can significantly improve performance. |
@rupurt Can you provide more details? (Please open a new issue so we can discuss) |
we can improve peformance by refactoring ResultSet
result_set
in a cell, we should only fetch the rows required to show the preview table. if they do something likelist(result_set)
, then, yes, fetch all results.fetchall()
when using duckdb. duckdb has native support for exporting results to pandas and polars. so we shouldn't callfetchall()
, and instead use the native API (.df()
or.pl()
).pl()
only accepts thechunk_size
argumentnote that a solution for improving duckdb + pandas performance has been applied here: #469 however, we should move such logic inside ResultSet, as #469 only works when autopandas is turned on
The text was updated successfully, but these errors were encountered: