Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(python,rust): Compute Spearman rank correlations using average ra… #9415

Merged
merged 3 commits into from
Jun 18, 2023

Conversation

zundertj
Copy link
Collaborator

…nk for ties

Fixes #9407.

@github-actions github-actions bot added fix Bug fix python Related to Python Polars rust Related to Rust Polars labels Jun 17, 2023
@zundertj zundertj marked this pull request as ready for review June 17, 2023 15:15
@ritchie46 ritchie46 merged commit f8a1ee4 into pola-rs:main Jun 18, 2023
@stinodego
Copy link
Contributor

stinodego commented Jun 20, 2023

@zundertj It seems weird to me that Spearman rank correlation is Float32, while Pearson is Float64. This should probably be fixed?

@zundertj
Copy link
Collaborator Author

zundertj commented Jun 20, 2023

The reason this is now different vs before is because if we pass in all integers, the correlation calculation is a different code path vs floats. Now that we change from Rank + Min to Rank + Average the values after ranking are no longer integers but Float32. I didn't change that, because I thought it was done like that on purpose. I.e., if you pass in Float32's from Python, you get a Float32 out:

>>> df
shape: (2, 2)
┌─────┬─────┐
│ ab   │
│ ------ │
│ f32f32 │
╞═════╪═════╡
│ 1.54.3 │
│ 2.32.1 │
└─────┴─────┘
>>> df.select(pl.corr("a","b"))
shape: (1, 1)
┌──────┐
│ a    │
│ ---  │
│ f32  │
╞══════╡
│ -1.0 │
└──────┘

But if you pass in Int32, you get Float64 out:

>>> df
shape: (2, 2)
┌─────┬─────┐
│ ab   │
│ ------ │
│ i32i32 │
╞═════╪═════╡
│ 14   │
│ 22   │
└─────┴─────┘
>>> df.select(pl.corr("a","b"))
shape: (1, 1)
┌──────┐
│ a    │
│ ---  │
│ f64  │
╞══════╡
│ -1.0 │
└──────┘

Do you mean that we should always return Float64, i.e. cast after the rank + average to Float64?

@qqlearn123
Copy link

What was the behavior previously?

@zundertj
Copy link
Collaborator Author

zundertj commented Jul 1, 2023

Spearman returned Float64 before, as it did a rank, which leads to integers as we took the min. This is in line with directly computing Pearson on Python int (32) currently, and imo it makes sense to keep the consistency here as Spearman = ranking + Pearson. However, given we now average, the averaging of Int32's leads to Float32's, and then it is all Float32 downstream through the Pearson computation.

@qqlearn123
Copy link

Do you think making current behavior the same as before reasonable?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Bug fix python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Spearman correlation gives difference result as Scipy
4 participants