Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mis_val(,with_labels = TRUE) incompatible with mutate() #68

Closed
bhaney22 opened this issue Jul 17, 2020 · 6 comments
Closed

mis_val(,with_labels = TRUE) incompatible with mutate() #68

bhaney22 opened this issue Jul 17, 2020 · 6 comments

Comments

@bhaney22
Copy link

The "with_labels = TRUE" option in the "mis_val()" command changes the class of the variable to "labelled". This variable no longer be used as input into mutate() commands.

See example below:

I read in the GSS.sav file

mygss <- as.data.frame(read_sav("GSS2018.sav")) %>%
select(YEAR,ID,WRKSTAT,SEX,RACE,RINCOME,INCOME,WEALTH,REGION,
AGE,MARITAL,CHILDS,EDUC,DEGREE,
PARTYID,PRES16,NATEDUC,COURTS,TAX,NATFARE,NATENRGY,PARTTEAM,LIFE)

Here is the structure of the variable before recording with mis_val()

str(mygss$NATENRGY)
dbl+lbl [1:2348] 1, 2, 1, 1, 1, 2, 2, 1, 2, 1, 2, 2, 2, 1, 2, NA, 1, N...
@ label : chr "Developing alternative energy sources"
@ format.spss : chr "F1.0"
@ display_width: int 10
@ labels : Named num [1:6] 0 1 2 3 8 9
..- attr(*, "names")= chr [1:6] "IAP" "Too little" "About right" "Too much" …

mis_val does not change type if with_labels option is not used

mygss$NATENRGY <- mis_val(mygss$NATENRGY, c(0,8,9))

str(mygss$NATENRGY)
dbl+lbl [1:2348] 1, 2, 1, 1, 1, 2, 2, 1, 2, 1, 2, 2, 2, 1, 2, NA, 1, N...
@ label : chr "Developing alternative energy sources"
@ format.spss : chr "F1.0"
@ display_width: int 10
@ labels : Named num [1:6] 0 1 2 3 8 9
..- attr(*, "names")= chr [1:6] "IAP" "Too little" "About right" "Too much" …

The recoded variable works fine with mutate:

mygss <- mygss %>%

  • mutate(renew = case_when(NATENRGY == 1 ~ 1,
  •                       NATENRGY != 1 ~ 0)) 
    

calc_cro(mygss,NATENRGY,renew)

| | | renew | |

0 1
Developing alternative energy sources IAP
Too little 1277
About right 788
Too much 169
DON'T KNOW
No answer
#Total cases 957 1277

But, the labels are left in there, so I use the with_labels = TRUE option

mis_val using the with_labels = TRUE option changes the class of the variable

mygss$NATENRGY <- mis_val(mygss$NATENRGY, c(0,8,9),with_labels = TRUE)

Now the variable is of the 'labelled' class

str(mygss$NATENRGY)
Class 'labelled' dbl+lbl [1:2348] 1, 2, 1, 1, 1, 2, 2, 1, 2, 1, 2, 2, 2, 1, 2, NA, 1, N...
@ format.spss : chr "F1.0"
@ display_width: int 10
.. .. LABEL: Developing alternative energy sources
.. .. VALUE LABELS [1:3]: 1=Too little, 2=About right, 3=Too much

labelled class variables cannot be used in the mutate() commands

mygss <- mygss %>%

  • mutate(renew = case_when(NATENRGY == 1 ~ 1,
  •                       NATENRGY != 1 ~ 0)) 
    

Error: Problem with mutate() input renew.
x Can't combine ..1 <labelled> and ..2 .
ℹ Input renew is case_when(NATENRGY == 1 ~ 1, NATENRGY != 1 ~ 0).

@gdemin
Copy link
Owner

gdemin commented Jul 18, 2020

Hi,
Thank you for report.
I can't reproduce your problem with latest dplyr. Could you provide result of the sessionInfo()?

@bhaney22
Copy link
Author

Thank you for your quick response! Here is the result of sessionInfo() just after running the error causing command:

sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] plm_2.2-3 dynlm_0.3-6 visualize_4.3.0 effects_4.1-3
[5] AER_1.2-7 survival_3.2-3 sandwich_2.5-1 lmtest_0.9-37
[9] zoo_1.8-8 car_3.0-8 carData_3.0-4 expss_0.10.5
[13] ggformula_0.9.4.9002 ggridges_0.5.2 scales_1.1.1 ggstance_0.3.4
[17] ggplot2_3.3.2 stargazer_5.2.2 DataExplorer_0.8.1 wooldridge_1.3.1
[21] dplyr_1.0.0 haven_2.3.1 readr_1.3.1

loaded via a namespace (and not attached):
[1] nlme_3.1-148 matrixStats_0.56.0 tools_3.6.3 backports_1.1.8
[5] R6_2.4.1 DBI_1.1.0 lazyeval_0.2.2 colorspace_1.4-1
[9] nnet_7.3-14 withr_2.2.0 tidyselect_1.1.0 gridExtra_2.3
[13] curl_4.3 compiler_3.6.3 cli_2.0.2 htmlTable_2.0.1
[17] mosaicCore_0.6.0 checkmate_2.0.0 stringr_1.4.0 digest_0.6.25
[21] foreign_0.8-76 minqa_1.2.4 rmarkdown_2.3 rio_0.5.16
[25] pkgconfig_2.0.3 htmltools_0.5.0 bibtex_0.4.2.2 lme4_1.1-23
[29] htmlwidgets_1.5.1 rlang_0.4.7 readxl_1.3.1 rstudioapi_0.11
[33] farver_2.0.3 generics_0.0.2 zip_2.0.4 magrittr_1.5
[37] Formula_1.2-3 Matrix_1.2-18 fansi_0.4.1 Rcpp_1.0.5
[41] munsell_0.5.0 abind_1.4-5 lifecycle_0.2.0 stringi_1.4.6
[45] gbRd_0.4-11 MASS_7.3-51.6 plyr_1.8.6 grid_3.6.3
[49] parallel_3.6.3 bdsmatrix_1.3-3 forcats_0.5.0 crayon_1.3.4
[53] lattice_0.20-41 splines_3.6.3 hms_0.5.3 knitr_1.29
[57] pillar_1.4.6 igraph_1.2.5 boot_1.3-25 glue_1.4.1
[61] packrat_0.5.0-25 evaluate_0.14 mitools_2.4 data.table_1.12.8
[65] vctrs_0.3.2 nloptr_1.2.2.2 tweenr_1.0.1 Rdpack_0.11-0
[69] miscTools_0.6-26 networkD3_0.4 cellranger_1.1.0 gtable_0.3.0
[73] purrr_0.3.4 polyclip_1.10-0 tidyr_1.1.0 assertthat_0.2.1
[77] xfun_0.15 ggforce_0.3.2 openxlsx_4.1.5 survey_3.36
[81] tibble_3.0.3 maxLik_1.3-8 statmod_1.4.34 ellipsis_0.3.1

@bhaney22
Copy link
Author

UPDATE:

Here is a minimum working example to illustrate the error:
`
library(dplyr)
library(haven)
library(expss)

var1 <- haven::labelled(c(0, 1, 9, 0, 0),c(No = 0, Yes = 1, DK = 9))
var2 <- haven::labelled(c(0, 1, 2, 3, 1), c(DK = 0, first = 1, second = 2, third = 3))

mytest <- tibble(var1,var2)

mytest$var1 <- expss::mis_val(mytest$var1, c(9))
mytest$var2 <- expss::mis_val(mytest$var2, c(0), with_labels = TRUE)

This mutate command works fine

mytest <- mytest %>%
mutate(newvar1 = case_when(var1 == 1 ~ 1,
var1 != 1 ~ 0))

This mutate command causes the error

mytest <- mytest %>%
mutate(newvar2 = case_when(var2 == 1 ~ 1,
var2 != 1 ~ 0))
`

Here is the error I get:

Error: Problem with mutate() input newvar2.
x Can't combine ..1 <labelled> and ..2 .
ℹ Input newvar2 is case_when(var2 == 1 ~ 1, var2 != 1 ~ 0).

Here is the sessionInfo() results:

sessionInfo()
R version 3.6.3 (2020-02-29) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.4 LTS Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] haven_2.3.1 expss_0.10.5 dplyr_1.0.0 loaded via a namespace (and not attached): [1] Rcpp_1.0.5 rstudioapi_0.11 knitr_1.29 magrittr_1.5 [5] hms_0.5.3 tidyselect_1.1.0 R6_2.4.1 rlang_0.4.7 [9] fansi_0.4.1 stringr_1.4.0 tools_3.6.3 packrat_0.5.0-25 [13] htmlTable_2.0.1 checkmate_2.0.0 data.table_1.12.8 xfun_0.15 [17] utf8_1.1.4 cli_2.0.2 matrixStats_0.56.0 htmltools_0.5.0 [21] ellipsis_0.3.1 assertthat_0.2.1 digest_0.6.25 tibble_3.0.3 [25] lifecycle_0.2.0 crayon_1.3.4 readr_1.3.1 purrr_0.3.4 [29] htmlwidgets_1.5.1 vctrs_0.3.2 glue_1.4.1 stringi_1.4.6 [33] compiler_3.6.3 pillar_1.4.6 forcats_0.5.0 generics_0.0.2 [37] backports_1.1.8 foreign_0.8-76 pkgconfig_2.0.3
--
 
  |  
 

@bhaney22
Copy link
Author

Since this is a combination of packages causing the error, I also submitted the following request to dplyr:

tidyverse/dplyr#5424 (comment)

@gdemin
Copy link
Owner

gdemin commented Jul 19, 2020

Thanks, now I can reproduce the issue. Currently I don't understand why it is happen. It is not about labelled or haven_labelled class. It seems it is something connected with vctrs_vctr class inside dplyr. There is no error when I remove this class:

library(dplyr)
library(haven)
library(expss)

var2 <- haven::labelled(c(0, 1, 2, 3, 1), c(DK = 0, first = 1, second = 2, third = 3))

mytest <- tibble(var2)

mytest$var2 <- expss::mis_val(mytest$var2, c(0), with_labels = TRUE)

class(mytest$var2)
# remove vctrs_vctr class
class(mytest$var2) = setdiff(class(mytest$var2), "vctrs_vctr")

# works as expected
mytest <- mytest %>%
    mutate(newvar2 = case_when(var2 == 1 ~ 1,
                               var2 != 1 ~ 0))

I will try to make a workaround for this issue in the next version.
As quick fix you can use read_spss from expss:

library(dplyr)
library(haven)
library(expss)

mygss <- expss::read_spss("~/GSS2018.sav") %>%
    select(YEAR,ID,WRKSTAT,SEX,RACE,RINCOME,INCOME,WEALTH,REGION,
           AGE,MARITAL,CHILDS,EDUC,DEGREE,
           PARTYID,PRES16,NATEDUC,COURTS,TAX,NATFARE,NATENRGY,PARTTEAM,LIFE)


mygss$NATENRGY <- mis_val(mygss$NATENRGY, c(0,8,9),with_labels = TRUE)

mygss = mutate(mygss, renew = case_when(NATENRGY == 1 ~ 1,
                                        NATENRGY != 1 ~ 0)) 

calc_cro(mygss,NATENRGY,renew)

or, if you want to go with haven:

library(dplyr)
library(haven)
library(expss)

mygss <- haven::read_spss("~/GSS2018.sav") %>%
    select(YEAR,ID,WRKSTAT,SEX,RACE,RINCOME,INCOME,WEALTH,REGION,
           AGE,MARITAL,CHILDS,EDUC,DEGREE,
           PARTYID,PRES16,NATEDUC,COURTS,TAX,NATFARE,NATENRGY,PARTTEAM,LIFE)

mygss = add_labelled_class(mygss, remove_classes = c("haven_labelled", "vctrs_vctr" ))

mygss$NATENRGY <- mis_val(mygss$NATENRGY, c(0,8,9),with_labels = TRUE)

mygss = mutate(mygss, renew = case_when(NATENRGY == 1 ~ 1,
                                        NATENRGY != 1 ~ 0)) 

calc_cro(mygss,NATENRGY,renew)

@bhaney22
Copy link
Author

Thank you for your quick response and helpful feedback! I agree that this seems like an error caused by different approaches to labels by haven in the tidyverse and in expss. As I mentioned in a previous comment, I did submit this issue to dplyr. I try to work within the tidyverse as much as possible, which is why I use haven to read in the SPSS file. But, there is no tidyverse package for creating the wide variety of human-readable tables that expss provides. Thank you again for providing the expss package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants