You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.
Issue description
Both the eager and lazy implementations of melt() are heftily beaten by an equivalent usage of cbind() and stack() in R. For 25x1,000 dataframes, Rust took 3ms, R took 4ms. for 10,000 columns, Rust took 127ms, R took 30ms, for 100,000 columns Rust took 16.25s and R took 500ms.
It seems there is a bug with melt() having significantly greater time complexity, noticeable beyond 1000 columns.
Reproducible example
Here is the eager Rust implementation, which will generate a 25 x n ndarray, convert it to a dataframe, and melt it. I am currently compiling Polars like so: polars = { version = "0.27.2", features = ["concat_str", "lazy", "rank", "strings", "performant", "cse"] }
I was very surprised to rewrite this in Rust only to find R was significantly beating it for large N. Ideally my N would be near 100,000, so currently I am thinking I need to call Rust for some parts of the code, then back to R for this, back to Rust, and finally back to R.
Installed versions
features = ["concat_str", "lazy", "rank", "strings", "performant", "cse"]
The text was updated successfully, but these errors were encountered:
Jaage
changed the title
melt() is significantly outperformed by cbind() and stack() in R, 32x slower on a 25 x 10,000 dataframe
melt() is significantly outperformed by cbind() and stack() in R, 32x slower on a 25 x 100,000 dataframe
Feb 17, 2023
Polars version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.
Issue description
Both the eager and lazy implementations of melt() are heftily beaten by an equivalent usage of cbind() and stack() in R. For 25x1,000 dataframes, Rust took 3ms, R took 4ms. for 10,000 columns, Rust took 127ms, R took 30ms, for 100,000 columns Rust took 16.25s and R took 500ms.
It seems there is a bug with melt() having significantly greater time complexity, noticeable beyond 1000 columns.
Reproducible example
Here is the eager Rust implementation, which will generate a 25 x n ndarray, convert it to a dataframe, and melt it. I am currently compiling Polars like so:
polars = { version = "0.27.2", features = ["concat_str", "lazy", "rank", "strings", "performant", "cse"] }
Here is the R code which significantly outperforms it:
Expected behavior
I was very surprised to rewrite this in Rust only to find R was significantly beating it for large N. Ideally my N would be near 100,000, so currently I am thinking I need to call Rust for some parts of the code, then back to R for this, back to Rust, and finally back to R.
Installed versions
The text was updated successfully, but these errors were encountered: