Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix reading of single-row unterminated CSV files #17305

Merged
merged 9 commits into from
Nov 14, 2024

Conversation

vuule
Copy link
Contributor

@vuule vuule commented Nov 13, 2024

Description

Fixed the logic in the CSV reader that led to empty output instead of producing a table with a single column and one row.
Added tests to make sure the new logic does not cause regressions.
Also did some small clean up around the fix.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@github-actions github-actions bot added libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API. labels Nov 13, 2024
@vuule vuule changed the title Bug-read_csv-single-row Fix reading of single-row unterminated CSV files Nov 13, 2024
@vuule vuule added bug Something isn't working non-breaking Non-breaking change labels Nov 13, 2024
@vuule vuule marked this pull request as ready for review November 13, 2024 20:25
@vuule vuule requested review from a team as code owners November 13, 2024 20:25
@vuule vuule requested review from vyasr and bdice November 13, 2024 20:25

std::vector<char> first_row = header;
// Empty row, return empty column names vector
if (row.empty()) { return {}; }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out, the line terminator is cropped when it's the only character(s) on the line, so we don't need to account for that case. The only empty row is an actually empty line (zero characters).
Added tests for this assumption.

@vuule
Copy link
Contributor Author

vuule commented Nov 14, 2024

/merge

@rapids-bot rapids-bot bot merged commit a7194f6 into rapidsai:branch-24.12 Nov 14, 2024
105 checks passed
@vuule vuule deleted the bug-read_csv-single-row branch November 14, 2024 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants