Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(rust!): Rename to CsvParserOptions to CsvReaderOptions, use in CsvReader #15919

Merged
merged 7 commits into from
Apr 27, 2024

Conversation

stinodego
Copy link
Member

@stinodego stinodego commented Apr 26, 2024

Changes

  • Rename CsvParserOptions to CsvReaderOptions
  • Allow passing a CsvReaderOptions to CsvReader.

There are many more places internally where we can use CsvReaderOptions - this will be implemented in followup PRs

@github-actions github-actions bot added breaking rust Change that breaks backwards compatibility for the Rust crate enhancement New feature or an improvement of an existing feature rust Related to Rust Polars labels Apr 26, 2024
} else {
None
CommentPrefix::Multi(prefix.to_string())
}
}
}
Copy link
Member Author

@stinodego stinodego Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the arbitrary 5 character limit here. Not sure why it was there in the first place 🤔 the original PR/issue do not mention anything about the limit: #12519

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API mentions the 5 character limit here, so that should probably updated too.

Copy link
Member Author

@stinodego stinodego Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, updated. Thanks!

@stinodego stinodego marked this pull request as ready for review April 26, 2024 17:30
Copy link

codspeed-hq bot commented Apr 26, 2024

CodSpeed Performance Report

Merging #15919 will improve performances by 22.17%

Comparing csvreader-options (79c3cc8) with main (a9c8b59)

Summary

⚡ 1 improvements
✅ 12 untouched benchmarks

Benchmarks breakdown

Benchmark main csvreader-options Change
test_filter2 2.8 ms 2.3 ms +22.17%

Copy link

codecov bot commented Apr 26, 2024

Codecov Report

Attention: Patch coverage is 94.92754% with 7 lines in your changes are missing coverage. Please review.

Project coverage is 81.25%. Comparing base (dee176c) to head (79c3cc8).

Files Patch % Lines
crates/polars-io/src/csv/read/options.rs 54.54% 5 Missing ⚠️
crates/polars-io/src/csv/read/reader.rs 98.38% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #15919      +/-   ##
==========================================
+ Coverage   81.24%   81.25%   +0.01%     
==========================================
  Files        1382     1382              
  Lines      176628   176615      -13     
  Branches     3032     3032              
==========================================
+ Hits       143494   143505      +11     
+ Misses      32649    32627      -22     
+ Partials      485      483       -2     
Flag Coverage Δ
python 74.69% <94.92%> (+0.01%) ⬆️
rust 78.37% <94.92%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ritchie46 ritchie46 merged commit 3564a77 into main Apr 27, 2024
27 checks passed
@ritchie46 ritchie46 deleted the csvreader-options branch April 27, 2024 08:06
@ritchie46
Copy link
Member

To keep going back and forth, I am more fan of CsvParserOptions as the reader accepts options that are not fit for the parser, e.g. globbing, hive partitioning. They are passed to the reader, but not to the actual parser.

@stinodego
Copy link
Member Author

stinodego commented Apr 27, 2024

We can revert before the next Rust release if you want.

But I don't think CsvParserOptions makes sense if we consider all file formats. Parquet will have an options struct but not really a 'parser' since it's not flat text. So ParquetParserOptions would be wrong.

I think all Parser structs will accept multiple options structs: CloudOptions, PartitioningOptions, perhaps others. And options specific to that file format. Since the options differ between reading and writing, it makes sense to have ParquetReaderOptions and ParquetWriterOptions, same for Csv/Ipc/etc.

I don't think it's confusing for those to exist next to a CloudOptions struct and possibly others. Maybe CsvReadOptions/CsvWriteOptions would be less confusing?

@nameexhaustion
Copy link
Collaborator

@stinodego , @ritchie46 , I'm currently doing some work to improve the way we read lists of CSV files. As part of that I need to do some refactoring around the options structs, mainly so that they're cheaper to clone. Can I take it from here?

@stinodego
Copy link
Member Author

@nameexhaustion go for it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking rust Change that breaks backwards compatibility for the Rust crate enhancement New feature or an improvement of an existing feature rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants