Integration with `labelled` package `set_value_labels()` + `haven_labelled` class #488

calebasaraba · 2020-05-04T19:45:09Z

Amazing package -- really love this project. I am trying to use it alongside with the labelled package and when using the set_value_labels() function I get an error:

library(tidyverse)
library(gtsummary)
library(labelled)

mtcars %>%
  select(cyl, mpg) %>%
  set_variable_labels(cyl = "Cylinders",
                      mpg = "Miles per gallon") %>%
  set_value_labels(cyl = c("Four" = 4, "Six" = 6, "Eight" = 8)) %>%
  tbl_summary(by = cyl)

Column(s) ‘cyl’ omitted from output.
Accepted classes are ‘character’, ‘factor’, ‘numeric’, ‘logical’, ‘integer’, or ‘difftime’.
Error in class(data[[by]]) <- setdiff(class(data[[by]]), "labelled") : 
  attempt to set an attribute on NULL

Not sure if this intentional behavior for the package, or if it would be an easy fix. Using factor labels for the variable levels (like below) works, but it would be great if gtsummary() would also accept the haven_labelled class, as I'm seeing it used more and more.

mtcars %>%
  select(cyl, mpg) %>%
  set_variable_labels(cyl = "Cylinders",
                      mpg = "Miles per gallon") %>%
  mutate(cyl = factor(cyl, labels = c("Four","Six","Eight"))) %>%
  tbl_summary(by = cyl)

The text was updated successfully, but these errors were encountered:

ddsjoberg · 2020-05-04T23:54:32Z

Hello @calebasaraba ! Thank you for the note!!

I will need to put more thought into whether or not to extend gtsummary to accept other classes. The the case of haven labelled, it was never meant to be a class that was used in analysis or data exploration. Rather, it was created as an in-between when importing data from other languages where the data types don't have a one-to-one relationship with R. This is from a tidyverse blogpost about the haven labelled class of variables. (https://haven.tidyverse.org/articles/semantics.html)

The goal of haven is not to provide a labelled vector that you can use everywhere in your analysis. The goal is to provide an intermediate datastructure that you can convert into a regular R data frame.

For the time being, I recommend you convert the variables to factor with as_factor() (can be run on the entire data frame) to convert the haven labelled data to factors.

Happy Coding!

calebasaraba · 2020-05-05T00:46:30Z

Got it, thanks for the clarification about intended use of the haven_labelled class @ddsjoberg!

I have been enjoying the way set_variable_labels() and set_value_labels() from labelled fit into my workflow (I receive a lot of original data files from SPSS), but it makes a lot of sense to return to factors using as_factor(). I'll close this issue up.

Thanks for your quick response and all your awesome work :)

karissawhiting · 2020-07-07T15:35:41Z

For now, we are going to add more specific messaging aroundhaven_labelled class to indicate it is not an accepted class and that user can use as_factor() to convert.

larmarange · 2020-08-28T13:30:40Z

Just a quick comment, labelled vectors are not always intended to be converted into factors. For example, you could have an age variable and add a label to value 99 to say that 99 represent "99 or more".

This is why it is the responsability of the user to unclass or to convert into a factor, depending on the fact that the variable should be treated as continuous or categorical.

A quick type is to use labelled::unlabelled() who perform a conditional conversion. By default, unlabelled() works as follow:

if a column doesn’t inherit the haven_labelled class, it will be not affected;
if all observed values have a corresponding value label, the column will be converted into a facter;
otherwise, the column will be unclassed (and converted back to a numeric or character vector).

But these hypothesis works only if the users have documented properly the vectors.

More details on https://larmarange.github.io/labelled/articles/intro_labelled.html#conditionnal-conversion-to-factors-1

muminbayoumi · 2021-01-22T20:16:57Z

Im having a similar issue
I use expss::apply_labels(v1=label1,...) to set labels to my variables(not their values), It seems tbl_summary is unable to pickup the labels for factor variables, ie ones I set explicitly to factor using as_factor(). All other variable types and their labels are being picked up very nicely.

ddsjoberg · 2021-01-22T20:21:48Z

@muminbayoumi can you post an example I can run on my machine? Aka A reprex

muminbayoumi · 2021-01-23T14:19:45Z

I'll have to apologise - seems the base issue is with base R and using droplevels function. However this reprex illustrates how the factor levels which empty are still printed . Are you planning on adding an option to exclude those?

library(expss)
library(tidyverse)
library(forcats)
library(gtsummary)
library(sjmisc)



data <- tibble(.rows = 200)
data$CatColumAsFactor <- as_factor(sample(c('Apple','Banana','Cherry'),200,replace = T))
data$CatColumAsCharacter <- sample(c('Apple','Banana','Cherry'),200,replace = T)


data <- apply_labels(data,
                     CatColumAsCharacter='Character Column',
                     CatColumAsFactor= 'Factor Column')

## Without dropping filtered levels all levels printed on factor column 
##  Only levels with values printed on character column
data %>% filter(CatColumAsFactor!='Cherry',CatColumAsCharacter!='Apple')%>%
        tbl_summary()
## On dropping levels
##Label attribute lost and therefore not  picked up  by gtsummary
data %>% filter(CatColumAsFactor!='Cherry',CatColumAsCharacter!='Apple') %>% droplevels() %>%
        tbl_summary()

## to_label preserves the labels
data %>% filter(CatColumAsFactor!='Cherry',) %>% sjmisc::to_label(drop.levels=T) %>%
        tbl_summary()

The tables aren't rendering very well with reprex() function - so I took them out.
Again i am sorry it isn't an issue with gtsummary.
And thanks for this wonderful package.

ddsjoberg · 2021-01-23T14:37:09Z

Thank you for showing me this package! I hadn't heard of exprss before, and it's such a popular package! @muminbayoumi

Showing the unobserved factors is a feature I think is useful. If you want unobserved factors removed, you can remove the levels before passing the data frame to tbl_summary(). There is likely a nice function in forcats to do this, but it can also be done with factor().

data %>%
  filter(CatColumAsFactor != 'Cherry', CatColumAsCharacter != 'Apple') %>%
  mutate_if(is.factor, factor) %>% # removes unobserved levels
  tbl_summary()

calebasaraba closed this as completed May 5, 2020

karissawhiting reopened this Jul 7, 2020

ddsjoberg mentioned this issue Jul 8, 2020

tbl_summary() update for ordered factors and other class handling #569

Closed

4 tasks

ddsjoberg mentioned this issue Aug 2, 2020

allowing for any class to be passed to tbl_summary #603

Closed

15 tasks

ddsjoberg added this to the 1.3.5 milestone Sep 5, 2020

ddsjoberg modified the milestones: 1.3.5, 1.3.6 Oct 1, 2020

ddsjoberg modified the milestones: 1.3.6, 1.4.0 Jan 8, 2021

ddsjoberg mentioned this issue Feb 20, 2021

tbl_summary() will accept columns of any class #794

Merged

14 tasks

ddsjoberg closed this as completed in #794 Mar 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integration with `labelled` package `set_value_labels()` + `haven_labelled` class #488

Integration with `labelled` package `set_value_labels()` + `haven_labelled` class #488

calebasaraba commented May 4, 2020

ddsjoberg commented May 4, 2020

calebasaraba commented May 5, 2020

karissawhiting commented Jul 7, 2020

larmarange commented Aug 28, 2020

muminbayoumi commented Jan 22, 2021

ddsjoberg commented Jan 22, 2021

muminbayoumi commented Jan 23, 2021

ddsjoberg commented Jan 23, 2021

Integration with labelled package set_value_labels() + haven_labelled class #488

Integration with labelled package set_value_labels() + haven_labelled class #488

Comments

calebasaraba commented May 4, 2020

ddsjoberg commented May 4, 2020

calebasaraba commented May 5, 2020

karissawhiting commented Jul 7, 2020

larmarange commented Aug 28, 2020

muminbayoumi commented Jan 22, 2021

ddsjoberg commented Jan 22, 2021

muminbayoumi commented Jan 23, 2021

ddsjoberg commented Jan 23, 2021

Integration with `labelled` package `set_value_labels()` + `haven_labelled` class #488

Integration with `labelled` package `set_value_labels()` + `haven_labelled` class #488