-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Fix decompress_impl for csv with n_rows set #17118
Conversation
crates/polars/tests/it/io/csv.rs
Outdated
#[test] | ||
#[cfg(any(feature = "decompress", feature = "decompress-fast"))] | ||
fn test_read_compressed() { | ||
const COMPRESSED_CSV: &str = "../../examples/datasets/compressed.csv.gz"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we create the compressed file on the fly (in-memory). That saves adding a file to the repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current compressed.csv.gz
is randomly generated and the n_rows
which yields to ComputeError
is found manually. I'd leave compressed.csv.gz
as-is or remove test_read_compressed
completely, since it's mostly here for illustrating the initial issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can create a in memory random file that illustrates this. Otherwise removing the test and file is also fine. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A test_read_compressed
and a correspondent csv.gz file are removed 👍
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #17118 +/- ##
==========================================
+ Coverage 80.86% 80.88% +0.02%
==========================================
Files 1456 1456
Lines 191141 191340 +199
Branches 2728 2739 +11
==========================================
+ Hits 154562 154769 +207
+ Misses 36073 36064 -9
- Partials 506 507 +1 ☔ View full report in Codecov by Sentry. |
Thanks @Mottl |
This commit truncates the
out
buffer indecompress_impl
to exactlyn_rows
(if set).Returning all the
out
buffer fromdecompress_impl
yields to a corrupted last row and, as a result, to further parse errors like this: