-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(python): Optimise read_excel
when using "calamine" engine with the latest fastexcel
#17735
feat(python): Optimise read_excel
when using "calamine" engine with the latest fastexcel
#17735
Conversation
I'm not really sure how to tackle the coverage issue here... Is there an easy way to run the CI with different versions of a dependency ? |
depends on ToucanToco/fastexcel#260 |
read_excel
when underlying fastexcel
>= 0.11
Unfortunately not... looks like this change can't be merged here until ToucanToco/fastexcel#260 is merged (and released) on your side though - unless you want to update this PR's fastexcel version check to additionally detect that the currently-failing parameter is being used, and fall-back if so? Otherwise we can park this in draft until it's good to go. |
05c6856
to
c7786c4
Compare
read_excel
when underlying fastexcel
>= 0.11read_excel
when underlying fastexcel
is >= 0.11
read_excel
when underlying fastexcel
is >= 0.11read_excel
when underlying fastexcel
is >= 0.11.2
c7786c4
to
8f76948
Compare
@alexander-beedie I've updated the PR to check for fastexcel >= 0.11.2. However, I'm not sure on how to proceed to have coverage on both branches. Do you have a suggestion/preference ? |
Only real option is that you locally test the earlier version 😅 Also, looks like we have to wait for the 0.11.4 release for tests to be able to complete:
|
Yup, I've marked the PR as draft until ToucanToco/fastexcel#270 is merged. These macos runner updates are a real pain 😕 All right, I'll run the tests locally with both versions then. I'll let you know when we're all good, thanks! |
Signed-off-by: Luka Peschke <luka.peschke@toucantoco.com>
8f76948
to
5503c17
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #17735 +/- ##
=======================================
Coverage 80.49% 80.49%
=======================================
Files 1503 1503
Lines 197027 197031 +4
Branches 2795 2796 +1
=======================================
+ Hits 158589 158598 +9
+ Misses 37918 37912 -6
- Partials 520 521 +1 ☔ View full report in Codecov by Sentry. |
@alexander-beedie macos seem to be finally working! I've ran the tests in Regarding the perf and memory gain, I've ran the following script on this branch with both fastexcel versions: #!/usr/bin/env python3
import polars as pl
from sys import argv
pl.read_excel(argv[1], engine="calamine") |
read_excel
when underlying fastexcel
is >= 0.11.2read_excel
when using "calamine" engine with the latest fastexcel
Good stuff! 👍 |
Since worksheets are directly converted to polars, fastexcel's eager loading functions can be used. This makes fastexcel use borrowed types under the hood, which allows it to run faster and have a slightly lower memory footprint. See ToucanToco/fastexcel#147 (comment)