-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX-#3884: Fix read_excel() dropping empty rows #4161
FIX-#3884: Fix read_excel() dropping empty rows #4161
Conversation
ac0a547
to
f4ff778
Compare
Codecov Report
@@ Coverage Diff @@
## master #4161 +/- ##
==========================================
+ Coverage 86.71% 89.60% +2.89%
==========================================
Files 201 201
Lines 16798 16800 +2
==========================================
+ Hits 14566 15054 +488
+ Misses 2232 1746 -486
Continue to review full report at Codecov.
|
18098bc
to
4b9a4a5
Compare
Could you please add issue #3884 in the PR description? |
cce0815
to
b32fe9d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad we have a fix for this, but I worry about performance impact. Left a comment on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @naren-ponder! Looking at this, there may be cases where the partitions are parsing both values and border-only cells. Can we create a test case that is 50 rows or so with every other row being empty with a border? This implementation might fail that test from my understanding of the code.
Added a test case for this to |
991fd85
to
da1f549
Compare
Signed-off-by: Naren Krishna <naren@ponder.io>
Signed-off-by: Naren Krishna <naren@ponder.io>
Signed-off-by: Naren Krishna <naren@ponder.io>
Signed-off-by: Naren Krishna <naren@ponder.io>
Signed-off-by: Naren Krishna <naren@ponder.io>
Signed-off-by: Naren Krishna <naren@ponder.io>
Signed-off-by: Naren Krishna <naren@ponder.io>
Signed-off-by: Naren Krishna <naren@ponder.io>
Signed-off-by: Naren Krishna <naren@ponder.io>
…ctory (modin-project#4214) Signed-off-by: Igoshev, Yaroslav <yaroslav.igoshev@intel.com>
Signed-off-by: Naren Krishna <naren@ponder.io>
Signed-off-by: Naren Krishna <naren@ponder.io>
Signed-off-by: Naren Krishna <naren@ponder.io>
617dddc
to
2495d4a
Compare
Signed-off-by: Naren Krishna <naren@ponder.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @naren-ponder, LGTM!
Signed-off-by: Naren Krishna naren@ponder.io
What do these changes do?
In Pandas 1.3.0, read_excel behavior changes from dropping empty rows or rows with a border in .xlsx files to including them with "NaN" values. This Modin change ensures compatibility with Pandas. Resolves #3884.
flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
git commit -s
docs/development/architecture.rst
is up-to-date