-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(python,rust): Compute Spearman rank correlations using average ra… #9415
Conversation
@zundertj It seems weird to me that Spearman rank correlation is |
The reason this is now different vs before is because if we pass in all integers, the correlation calculation is a different code path vs floats. Now that we change from Rank + Min to Rank + Average the values after ranking are no longer integers but Float32. I didn't change that, because I thought it was done like that on purpose. I.e., if you pass in Float32's from Python, you get a Float32 out: >>> df
shape: (2, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ f32 ┆ f32 │
╞═════╪═════╡
│ 1.5 ┆ 4.3 │
│ 2.3 ┆ 2.1 │
└─────┴─────┘
>>> df.select(pl.corr("a","b"))
shape: (1, 1)
┌──────┐
│ a │
│ --- │
│ f32 │
╞══════╡
│ -1.0 │
└──────┘ But if you pass in Int32, you get Float64 out: >>> df
shape: (2, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i32 ┆ i32 │
╞═════╪═════╡
│ 1 ┆ 4 │
│ 2 ┆ 2 │
└─────┴─────┘
>>> df.select(pl.corr("a","b"))
shape: (1, 1)
┌──────┐
│ a │
│ --- │
│ f64 │
╞══════╡
│ -1.0 │
└──────┘ Do you mean that we should always return |
What was the behavior previously? |
Spearman returned Float64 before, as it did a rank, which leads to integers as we took the min. This is in line with directly computing Pearson on Python int (32) currently, and imo it makes sense to keep the consistency here as Spearman = ranking + Pearson. However, given we now average, the averaging of Int32's leads to Float32's, and then it is all Float32 downstream through the Pearson computation. |
Do you think making current behavior the same as before reasonable? |
…nk for ties
Fixes #9407.