You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When you shave() and then stretch() a cor_df class object, the result is a three column tibble where x and y are character vectors containing the row and column names. However, this means when you plot with rplot(), or manually with ggplot(), that the resulting plot is not a reflection of the correlation matrix shown with print.cor_df(). This is especially noticeable when you shave() a correlation matrix because the resulting plot is not a upper/lower triangular plot.
This is because ggplot()s default behaviour is to order character vectors along the axes in alphabetic order. My suggestion is to have the default behaviour for stretch() be to convert x and y to factor() variables based on the column order of the input data.frame to correlate().
Here is a demonstration of the issue:
library(tidyverse)
library(corrr)
x<-mtcars %>%
correlate()
#> #> Correlation method: 'pearson'#> Missing treated using: 'pairwise.complete.obs'# Note the column and row order, same as mtcars
print(x)
#> # A tibble: 11 x 12#> rowname mpg cyl disp hp drat wt qsec vs am#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>#> 1 mpg NA -0.852 -0.848 -0.776 0.681 -0.868 0.419 0.664 0.600 #> 2 cyl -0.852 NA 0.902 0.832 -0.700 0.782 -0.591 -0.811 -0.523 #> 3 disp -0.848 0.902 NA 0.791 -0.710 0.888 -0.434 -0.710 -0.591 #> 4 hp -0.776 0.832 0.791 NA -0.449 0.659 -0.708 -0.723 -0.243 #> 5 drat 0.681 -0.700 -0.710 -0.449 NA -0.712 0.0912 0.440 0.713 #> 6 wt -0.868 0.782 0.888 0.659 -0.712 NA -0.175 -0.555 -0.692 #> 7 qsec 0.419 -0.591 -0.434 -0.708 0.0912 -0.175 NA 0.745 -0.230 #> 8 vs 0.664 -0.811 -0.710 -0.723 0.440 -0.555 0.745 NA 0.168 #> 9 am 0.600 -0.523 -0.591 -0.243 0.713 -0.692 -0.230 0.168 NA #> 10 gear 0.480 -0.493 -0.556 -0.126 0.700 -0.583 -0.213 0.206 0.794 #> 11 carb -0.551 0.527 0.395 0.750 -0.0908 0.428 -0.656 -0.570 0.0575#> # … with 2 more variables: gear <dbl>, carb <dbl># Note the axis orders, alphabetic
rplot(x)
#> Don't know how to automatically pick scale for object of type noquote. Defaulting to continuous.
# Note the axis orders, alphabeticx %>%
stretch() %>%
ggplot(aes(x, y, fill=r)) +
geom_tile()
# Unexpected behaviour when shavingx %>%
shave() %>%
stretch() %>%
ggplot(aes(x, y, fill=r)) +
geom_tile()
# We can recover our triangular matrixx %>%
shave() %>%
stretch() %>%
# replace with across() with dplyr 1.0.0
mutate_at(vars(x, y), factor, levels=x$rowname) %>%
ggplot(aes(x, y, fill=r)) +
geom_tile()
x %>%
shave(upper=FALSE) %>%
stretch() %>%
# replace with across() with dplyr 1.0.0
mutate_at(vars(x, y), factor, levels=x$rowname) %>%
ggplot(aes(x, y, fill=r)) +
geom_tile()
My suggestion would be to change the default behaviour of stretch() to convert to factor()s and perhaps offer an argument in stretch() to revert back to the old behaviour where x and y are character vectors. Argument could be keep_order = TRUE, or something along those lines.
I look forward to hearing your thoughts and would be happy to submit a PR if you agree that this change should be implemented.
The text was updated successfully, but these errors were encountered:
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.
Hi,
When you
shave()
and thenstretch()
acor_df
class object, the result is a three columntibble
wherex
andy
are character vectors containing the row and column names. However, this means when you plot withrplot()
, or manually withggplot()
, that the resulting plot is not a reflection of the correlation matrix shown withprint.cor_df()
. This is especially noticeable when youshave()
a correlation matrix because the resulting plot is not a upper/lower triangular plot.This is because
ggplot()
s default behaviour is to order character vectors along the axes in alphabetic order. My suggestion is to have the default behaviour forstretch()
be to convertx
andy
tofactor()
variables based on the column order of the inputdata.frame
tocorrelate()
.Here is a demonstration of the issue:
Created on 2020-04-11 by the reprex package (v0.3.0)
My suggestion would be to change the default behaviour of
stretch()
to convert tofactor()
s and perhaps offer an argument instretch()
to revert back to the old behaviour wherex
andy
are character vectors. Argument could bekeep_order = TRUE
, or something along those lines.I look forward to hearing your thoughts and would be happy to submit a PR if you agree that this change should be implemented.
The text was updated successfully, but these errors were encountered: