-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Implement support for multi-character comments in read_csv
#12519
feat: Implement support for multi-character comments in read_csv
#12519
Conversation
Sorry, I don't want this in polars. I am sorry about this one. This had to be documented better. |
Ok, I see this one is only comments. This should be acceptable. |
60a8de2
to
8df1694
Compare
8df1694
to
c1d79fb
Compare
Thanks, can you also expose it to python as well? |
crates/polars-lazy/src/scan/csv.rs
Outdated
#[must_use] | ||
pub fn with_comment_char(mut self, comment_char: Option<u8>) -> Self { | ||
self.comment_char = comment_char; | ||
/// Set a single byte character as the comment character. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should update the comment here (it mentions 'single-byte character')
crates/polars/tests/it/io/csv.rs
Outdated
assert_eq!(df.shape(), (3, 5)); | ||
|
||
let csv = r"1,2,3,4,5 | ||
### this is a comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would work equally well with comment_char
being a single "#" character; could you update the example to something that can only work with a multi-character comment prefix?
We should also probably rename the |
read_csv
crates/polars-lazy/src/scan/csv.rs
Outdated
if c.is_ascii() { | ||
Some(CommentChar::Single(c as u8)) | ||
} else { | ||
None // Or handle non-ASCII characters as needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't that fallback to Multi then?
@alexander-beedie regarding comment char/prefix. I've considered this approach as well. However, these changes will impact the user experience. If the Polars team is in agreement, I am ready to implement it. |
c7184d9
to
850a61c
Compare
You can use the decorator |
850a61c
to
22b1f5d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @dmitrybugakov. Looks great!
This pull request introduces the capability to handle multi-character and single-byte comment identifiers in the read_csv function, addressing the feature request in issue #10583.