Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change presentation order of the grouping variable #792

Open
martynagalazka opened this issue Sep 26, 2022 · 2 comments
Open

Change presentation order of the grouping variable #792

martynagalazka opened this issue Sep 26, 2022 · 2 comments
Labels
bug 🐜 Something isn't working

Comments

@martynagalazka
Copy link

I am plotting grouped ggwithinstats plot and the graphs display in alphabetical or ascending order (if numbers).

In cases in which one cannot change the name of the grouping variable but for the sake of clarity it would be good to present it in a certain order, is there a way to control whether grouping variable subgroups are presented on the left or the right`?

Thank you.

@IndrajeetPatil
Copy link
Owner

Hmm, seems like none of the functions respect the original order of the grouping.var column. This definitely shouldn't happen.

library(ggstatsplot)

(df <- dplyr::tibble(
  grp = c(rep("c", 5), rep("a", 5), rep("b", 5)),
  val1 = runif(15),
  val2 = runif(15)
))
#> # A tibble: 15 × 3
#>    grp     val1   val2
#>    <chr>  <dbl>  <dbl>
#>  1 c     0.732  0.754 
#>  2 c     0.825  0.0488
#>  3 c     0.214  0.113 
#>  4 c     0.288  0.609 
#>  5 c     0.574  0.498 
#>  6 a     0.209  0.276 
#>  7 a     0.971  0.0530
#>  8 a     0.227  0.328 
#>  9 a     0.748  0.550 
#> 10 a     0.866  0.734 
#> 11 b     0.472  0.171 
#> 12 b     0.330  0.881 
#> 13 b     0.262  0.604 
#> 14 b     0.956  0.734 
#> 15 b     0.0407 0.0470
  
grouped_ggscatterstats(df, val1, val2, grouping.var = grp)

Created on 2022-09-26 with reprex v2.0.2

@IndrajeetPatil IndrajeetPatil added the bug 🐜 Something isn't working label Sep 26, 2022
@etiennebacher
Copy link

The problem probably comes from .grouped_list(), and more particularly from split() which automatically reorders the groups. One solution (taken on SO) is to manually specified the levels.

test <- data.frame(
  id = c("b", "c", "a"),
  val = 1:3
)

# reordered
split(test, ~ id)
#> $a
#>   id val
#> 3  a   3
#> 
#> $b
#>   id val
#> 1  b   1
#> 
#> $c
#>   id val
#> 2  c   2

# not reordered
test$id <- factor(test$id, levels=unique(test$id))
split(test, ~ id)
#> $b
#>   id val
#> 1  b   1
#> 
#> $c
#>   id val
#> 2  c   2
#> 
#> $a
#>   id val
#> 3  a   3

Therefore, in .grouped_list(), you can apply this to all grouping variables:

.grouped_list <- function(data, grouping.var = NULL) {
  data <- as_tibble(data)

  if (quo_is_null(enquo(grouping.var))) {
    return(data)
  }

  data %<>%
    mutate(
      across(
        {{ grouping.var }},
        ~ factor(.x, levels = unique(.x))
      )
    )

  data %>% split(f = new_formula(NULL, enquo(grouping.var)), drop = FALSE)
}

And now the order is correct:

library(ggstatsplot)
#> You can cite this package as:
#>      Patil, I. (2021). Visualizations with statistical details: The 'ggstatsplot' approach.
#>      Journal of Open Source Software, 6(61), 3167, doi:10.21105/joss.03167

(df <- dplyr::tibble(
  grp = c(rep("c", 5), rep("a", 5), rep("b", 5)),
  val1 = runif(15),
  val2 = runif(15)
))
#> # A tibble: 15 × 3
#>    grp     val1   val2
#>    <chr>  <dbl>  <dbl>
#>  1 c     0.144  0.700 
#>  2 c     0.210  0.461 
#>  3 c     0.0319 0.875 
#>  4 c     0.609  0.978 
#>  5 c     0.213  0.317 
#>  6 a     0.0133 0.502 
#>  7 a     0.778  0.0390
#>  8 a     0.553  0.979 
#>  9 a     0.0742 0.0475
#> 10 a     0.616  0.339 
#> 11 b     0.869  0.322 
#> 12 b     0.501  0.0567
#> 13 b     0.632  0.347 
#> 14 b     0.758  0.953 
#> 15 b     0.391  0.687

grouped_ggscatterstats(df, val1, val2, grouping.var = grp)
#> Registered S3 method overwritten by 'ggside':
#>   method from   
#>   +.gg   ggplot2
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

This should probably work with other grouped plots but I didn't check. @IndrajeetPatil I'm not familiar with this package so I'll let you apply the solution if it is correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐜 Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants