Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pairwise correlation #759

Merged

Conversation

cristineguadelupe
Copy link
Contributor

No description provided.

@cristineguadelupe cristineguadelupe marked this pull request as ready for review December 6, 2023 20:36
names_series = cols |> Shared.from_list(:string) |> Shared.create_series()

from_series([{column_name, names_series} | correlations])
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need it in the backend? I think the implementation within Explorer.DataFrame was better, no? Or is this faster in any way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@josevalim we did using the backend because I wanted to raise for the lazy version, since we cannot implement for that right now. Do you think it worth to revert it anyway?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@philss maybe we can implement it with lazy by implementing it with a mutate + select/discard?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way it works with lazy and within Explorer.DataFrame as well! Something like:

mutate_with(df, fn ldf ->
  Enum.map(columns, ...)
end)
|> select(existing_columns -- new_columns, :discard)

WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I'm going to try.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@josevalim I think this won't work the way we want, because mutate_with expects only lazy_series/expressions as values, and we are trying to create it with a list of lazy series. On the other hand, we could try create a column for each pair and try to pivot the results later. But again, pivoting does not work with lazy frames.

Maybe there is another way to reshape this DF, but I don't know yet.
I'm going to investigate more tomorrow :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, now that you mention it, I think you are right. Feel free to ship it. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, don't ship it yet. I would like Chris' approval on the API before. :D

@cigrainger
Copy link
Member

cigrainger commented Dec 7, 2023

I like the API. Matches R's cor() pretty cleanly. One thing we should be noting in the docs is that this is the pearson correlation. We should (in a future PR, I'll raise a ticket) add a method opt and enable spearman https://pola-rs.github.io/polars/docs/rust/dev/polars_lazy/dsl/fn.spearman_rank_corr.html. I can't find kendall in the docs, but spearman would at least open things up for rank correlation.

cristineguadelupe and others added 3 commits December 7, 2023 21:16
The idea is to make clear that this won't work yet for
lazy frames.

Co-authored-by: Cristine Guadelupe <cristineguadelupe@me.com>
@philss
Copy link
Member

philss commented Dec 8, 2023

@cigrainger makes sense! Thanks! We added a note in the function description, just like we have with the Series counter part. Please let us know if this is enough.

Copy link
Member

@cigrainger cigrainger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@philss philss merged commit df3039d into elixir-explorer:main Dec 8, 2023
3 checks passed
@philss philss deleted the cg-pairwise-correlation branch December 8, 2023 16:37
@philss philss mentioned this pull request Dec 8, 2023
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants