We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.
Spearman correlation in Polars is different than the equivalent in Scipy.
Polars
Scipy
The difference comes from the ranking method for ties - Polars use min whereas Scipy uses average.
min
average
Is this behavior rather a design choice? If so, is it possible to provide alternative which has the same behavior as Scipy?
Note: Spearman correlation in R also uses average as its ranking method.
import polars as pl import scipy df = pl.DataFrame({"a": [1, 1, 1, 2, 3, 7, 4], "b": [4, 3, 2, 2, 4, 3, 1]}) df.select( pl.corr("a", "b", method="spearman"), pl.corr(pl.col("a").rank("min"), pl.col("b").rank("min")).alias("a2"), pl.corr(pl.col("a").rank(), pl.col("b").rank()).alias("a3"), ) # a b c # --- --- --- # f64 f64 f64 # -0.172237 -0.172237 -0.190485
scipy.stats.spearmanr([1, 1, 1, 2, 3, 7, 4], [4, 3, 2, 2, 4, 3, 1]) # -0.1904848294
Polars: 0.17.15
The text was updated successfully, but these errors were encountered:
Wikipedia states that taking the average is the most common approach. We should probably update the calculation.
Sorry, something went wrong.
fix(python,rust): Compute Spearman rank correlations using average ra…
d513f08
…nk for ties Fixes pola-rs#9407.
Fixed this in #9415. Thank you for the clear code example, could use that directly for a unit test.
Successfully merging a pull request may close this issue.
Polars version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.
Issue description
Spearman correlation in
Polars
is different than the equivalent inScipy
.The difference comes from the ranking method for ties -
Polars
usemin
whereasScipy
usesaverage
.Is this behavior rather a design choice? If so, is it possible to provide alternative which has the same behavior as
Scipy
?Note: Spearman correlation in R also uses
average
as its ranking method.Reproducible example
Expected behavior
Installed versions
The text was updated successfully, but these errors were encountered: