Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(rust,python): add u8/i8/u16/i16 parsers to CSV reader #14241

Merged
merged 8 commits into from
Feb 4, 2024

Conversation

mcrumiller
Copy link
Contributor

@mcrumiller mcrumiller commented Feb 3, 2024

Resolves #14226.

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Feb 3, 2024
@mcrumiller mcrumiller marked this pull request as draft February 3, 2024 19:23
@ritchie46
Copy link
Member

@ritchie46 I'm not sure why the parsers weren't implemented for these dtypes--it didn't require anything special, just boilerplate.

Because it needs to compile more code -> leads to larger binaries.

@mcrumiller
Copy link
Contributor Author

@ritchie46 I'm not sure why the parsers weren't implemented for these dtypes--it didn't require anything special, just boilerplate.

Because it needs to compile more code -> leads to larger binaries.

Do we want to leave this out then? CSV parsing is used pretty often, seems like it would be good to support the smaller dtypes directly.

@ritchie46
Copy link
Member

Yes, I am inclined to allow it for those dtypes (feature flagged). However you need to removes those dtypes then in the to_cast function

@mcrumiller
Copy link
Contributor Author

Thanks. I've marked this as draft since it's obviously not ready yet.

@@ -671,6 +767,14 @@ impl Buffer {
missing_is_null,
None,
),
UInt64(buf) => <PrimitiveChunkedBuilder<UInt64Type> as ParsedBuffer>::parse_bytes(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just swapped the u32/u64 order to put u64 after, to be consistent.

@mcrumiller mcrumiller marked this pull request as ready for review February 3, 2024 21:02
@mcrumiller
Copy link
Contributor Author

@ritchie46 all ready I think.

@ritchie46
Copy link
Member

@mcrumiller can you still remove these dtypes from to_cast?

@mcrumiller
Copy link
Contributor Author

mcrumiller commented Feb 4, 2024

@ritchie46 didn't I do that here, or is there somewhere else I need to as well?

@ritchie46
Copy link
Member

@ritchie46 didn't I do that here, or is there somewhere else I need to as well?

Yeap, that's the one. Thanks

@ritchie46 ritchie46 merged commit 46ebef2 into pola-rs:main Feb 4, 2024
22 checks passed
@mcrumiller mcrumiller deleted the csv-int branch February 26, 2024 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

read_csv cannot read 8 or 16-bit ints
2 participants