-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Native CSV file list reading #16180
Conversation
5498728
to
f2da805
Compare
CodSpeed Performance ReportMerging #16180 will degrade performances by 22.39%Comparing Summary
Benchmarks breakdown
|
d0767fd
to
aca89b8
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16180 +/- ##
==========================================
- Coverage 81.03% 80.93% -0.10%
==========================================
Files 1392 1393 +1
Lines 178939 179559 +620
Branches 2907 2907
==========================================
+ Hits 144997 145323 +326
- Misses 33436 33730 +294
Partials 506 506 ☔ View full report in Codecov by Sentry. |
failing test is python import timing - I don't think this PR has anything to do with it |
} | ||
} | ||
|
||
concat_df(out.iter())? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we be sure here the the DataFrame
s have the same Schema
?
If so, we can use concat_df_unchecked
. It isn't unsafe, but it will elide schema checks and duplicate name checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very good catch, I checked and realized this PR as it is would cause differing schemas to be silently coerced, I will need to do some more fixing
Looks great! Have a few minor comments. I suspect this also should be much better in performance when dealing with many files. |
This replaces the existing codepath that dispatches multi-CSV file reads to the Union executor.