-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Make DataFrame a Vec of Column
instead of Series
#18664
Conversation
a7b5dd6
to
c7a070f
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #18664 +/- ##
==========================================
+ Coverage 79.82% 79.85% +0.03%
==========================================
Files 1513 1517 +4
Lines 203828 205542 +1714
Branches 2892 2892
==========================================
+ Hits 162705 164142 +1437
- Misses 40575 40852 +277
Partials 548 548 ☔ View full report in Codecov by Sentry. |
a6897fb
to
4bda43a
Compare
@@ -32,8 +32,8 @@ pub(crate) unsafe fn compare_df_rows2( | |||
join_nulls: bool, | |||
) -> bool { | |||
for (l, r) in left.get_columns().iter().zip(right.get_columns()) { | |||
let l = l.get_unchecked(left_idx); | |||
let r = r.get_unchecked(right_idx); | |||
let l = l.as_materialized_series().get_unchecked(left_idx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be very expensive. I think we must materialize before and pass &[Series]
to left
and right
.
let a: &ChunkedArray<$T> = a.as_ref().as_ref().as_ref(); | ||
let b: &ChunkedArray<$T> = b.as_ref().as_ref().as_ref(); | ||
let c: &ChunkedArray<$T> = c.as_ref().as_ref().as_ref(); | ||
let a: &ChunkedArray<$T> = a.as_materialized_series().as_ref().as_ref().as_ref(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can materialize the scalar to a series
of length 1, we will have very large speed ups here as the arithmetic can deal with broadcasting already.
let a: &ChunkedArray<$T> = a.as_ref().as_ref().as_ref(); | ||
let b: &ChunkedArray<$T> = b.as_ref().as_ref().as_ref(); | ||
let c: &ChunkedArray<$T> = c.as_ref().as_ref().as_ref(); | ||
let a: &ChunkedArray<$T> = a.as_materialized_series().as_ref().as_ref().as_ref(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idem
let a: &ChunkedArray<$T> = a.as_ref().as_ref().as_ref(); | ||
let b: &ChunkedArray<$T> = b.as_ref().as_ref().as_ref(); | ||
let c: &ChunkedArray<$T> = c.as_ref().as_ref().as_ref(); | ||
let a: &ChunkedArray<$T> = a.as_materialized_series().as_ref().as_ref().as_ref(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idem
@@ -141,7 +141,8 @@ impl Container for DataFrame { | |||
} | |||
|
|||
fn chunk_lengths(&self) -> impl Iterator<Item = usize> { | |||
self.get_columns()[0].chunk_lengths() | |||
// @scalar-correctness? | |||
self.columns[0].as_materialized_series().chunk_lengths() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems wasteful to materialize here. I think we should create e special chunk_lengths
iterator for Column
. For Column::Scalar
it returns 1 time len
.
@@ -78,7 +78,8 @@ pub(super) unsafe fn call_plugin( | |||
.get(format!("_polars_plugin_{}", symbol).as_bytes()) | |||
.unwrap(); | |||
|
|||
let input = s.iter().map(export_series).collect::<Vec<_>>(); | |||
// @scalar-correctness? | |||
let input = s.iter().map(export_column).collect::<Vec<_>>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should materialize here. 👍
df79e4b
to
4302512
Compare
[skip-ci]
4302512
to
b4b4d52
Compare
let file_path_series = self.include_file_paths.clone().map(|file_path_col| { | ||
StringChunked::full( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from a quick look - IIUC this does mean that (if) we materialize these they won't be shared anymore, but I don't particularly see that as a showstopper
Alright. Great work @coastalwhite. Huge effort in just 4 days. 🙈 Let's quickly get it in, now all is green! :) |
Small typo stemming from pola-rs#18664. Fixes pola-rs#18917.
No description provided.