Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump rust-polars to 0.40.0 #1104

Merged
merged 52 commits into from
May 26, 2024
Merged

Bump rust-polars to 0.40.0 #1104

merged 52 commits into from
May 26, 2024

Conversation

etiennebacher
Copy link
Collaborator

@etiennebacher etiennebacher commented May 21, 2024

https://github.com/pola-rs/polars/releases/tag/rs-0.40.0

@eitsupi I'll have limited time in the next few days to address this (we are not in rush, it's just so you know)

@eitsupi
Copy link
Collaborator

eitsupi commented May 22, 2024

@eitsupi I'll have limited time in the next few days to address this (we are not in rush, it's just so you know)

Thanks for starting the work. Of course I'll take a look at this, but it looks like a pretty tough update and I don't know if I can complete it in a few days.

@eitsupi eitsupi added this to the 0.17 milestone May 22, 2024
@eitsupi
Copy link
Collaborator

eitsupi commented May 23, 2024

It can now be compiled, but the rolling functions will probably need to be rewritten.

@eitsupi
Copy link
Collaborator

eitsupi commented May 26, 2024

@etiennebacher Thanks for your great works!

On my machine, the R session hangs on the expr_name test (so I have renamed the test file to disable the tests for now) and the following part seems to cause a panic.

ul = pl$DataFrame(s)$select(pl$col("")$flatten()$flatten()$flatten())$to_list() |>

I think the hang in the R session may be due to the implementation of Send and Sync at the following location, but I do not know why this has now occurred.
I also do not know why @sorhawell implemented this.

//shamelessly make Robj send + sync
//no crashes so far for the 'data'-SEXPS as Vectors, lists, pairlists
//mainly for debug should not be used in production
//CLOSEXP crashes every time
#[derive(Clone, Debug)]
pub struct ParRObj(pub Robj);
unsafe impl Send for ParRObj {}
unsafe impl Sync for ParRObj {}

@eitsupi
Copy link
Collaborator

eitsupi commented May 26, 2024

Hmmm, it looks like there is a problem with the way the DataFrame is constructed.
It panics when I try to create a DataFrame with columns of nested List types, which is not a problem in Python.

> pl$DataFrame(a = as_polars_series(list(list(1, 2, 3), list(4, 5, 6))))
thread '<unnamed>' panicked at /usr/local/cargo/git/checkouts/polars-b0d90607192fd414/7bc7014/crates/polars-core/src/series/mod.rs:149:37:
called `Result::unwrap()` on an `Err` value: InvalidOperation(ErrString("`vec_hash` operation not supported for dtype `list[f64]`"))
thread '<unnamed>' panicked at src/rdataframe/mod.rs:89:1:
explicit panic
Error: Execution halted with the following contexts
   0: In R: in $DataFrame():
   0: During function call [pl$DataFrame(a = as_polars_series(list(list(1, 2, 3), list(4, 
          5, 6))))]
   1: user function panicked: select
> pl$select(a = as_polars_series(list(list(1, 2, 3), list(4, 5, 6))))
thread '<unnamed>' panicked at /usr/local/cargo/git/checkouts/polars-b0d90607192fd414/7bc7014/crates/polars-core/src/series/mod.rs:149:37:
called `Result::unwrap()` on an `Err` value: InvalidOperation(ErrString("`vec_hash` operation not supported for dtype `list[f64]`"))
thread '<unnamed>' panicked at src/rdataframe/mod.rs:89:1:
explicit panic
Error in .pr$DataFrame$select(self, unpack_list(..., .context = "in $select()")) : 
  user function panicked: select

eitsupi
eitsupi previously approved these changes May 26, 2024
Copy link
Collaborator

@eitsupi eitsupi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several unresolved issues with this PR, but I think we can merge and do the follow-up work for now.

@eitsupi
Copy link
Collaborator

eitsupi commented May 26, 2024

map method errors are only on Linux, so may be related to pola-rs/polars#16365
I thought I had made modifications, but they may not have been sufficient.

Edit: It does not seem related.

@eitsupi eitsupi dismissed their stale review May 26, 2024 03:30

I take a look the remain issue.

@eitsupi
Copy link
Collaborator

eitsupi commented May 26, 2024

More related issue: pola-rs/polars#12568

@eitsupi
Copy link
Collaborator

eitsupi commented May 26, 2024

I do not know why $name$map() does not work only on Ubuntu.
I would like to vote to remove the function as it seems almost unnecessary to me.

@eitsupi
Copy link
Collaborator

eitsupi commented May 26, 2024

I don't understand why this process has to be done on the Rust side.
Wouldn't it be enough to receive the function as an argument in rename, process it on the R side, and then pass it as a list to Rust's rename?
(tidypolars should work well doing that with dplyr::rename_with().)
https://github.com/pola-rs/polars/blob/17cf8b7e17974b51c9bdc760422005fe591bc7c5/py-polars/polars/lazyframe/frame.py#L4393-L4445

In any case, I think we can merge them as they are now and fix them later. (I don't think I will have enough time for a while down the road)

@eitsupi
Copy link
Collaborator

eitsupi commented May 26, 2024

This is a side note, but what I have learned is that upstream is not very interested in Rust's breaking changes, and the release notes are of little use.

So in the future, I think it is better to proactively incorporate changes from HEAD without waiting for Rust releases.

@eitsupi eitsupi linked an issue May 26, 2024 that may be closed by this pull request
@etiennebacher
Copy link
Collaborator Author

Thanks a lot, I'll take a look at the remaining issues to see if I can come up with a fix for them. In any case, I'll update the NEWS before merging.

This is a side note, but what I have learned is that upstream is not very interested in Rust's breaking changes, and the release notes are of little use.

The release notes are sometimes useful. They are messy but it can really help going through bugs or changes in implementation. I agree we shouldn't feel forced to wait for an official release but I also think it matches quite well the pace of development (release every ~1.5 month).

There were several trivial bug fixes upstream just after 0.40.0 were released, I think we should bump the version to an unreleased version but we should do it in another PR.

NEWS.md Outdated Show resolved Hide resolved
@eitsupi
Copy link
Collaborator

eitsupi commented May 26, 2024

Follow-up tasks should include the followings:

  • Update polars to newer edge version.
  • Add *_by() methods.
  • Fix pl$select() for nested list types.
  • Remove $name$map() and add a functionality rename with R function to $rename() (or adds $rename_with())

@etiennebacher etiennebacher merged commit e82ef4c into main May 26, 2024
22 of 25 checks passed
@etiennebacher etiennebacher deleted the rust-polars-0.40.0 branch May 26, 2024 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

set JEMALLOC on macos as well
2 participants