Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random error 'set_val_lab' - duplicated values in labels: #107

Open
bkerwick opened this issue Mar 22, 2023 · 15 comments
Open

Random error 'set_val_lab' - duplicated values in labels: #107

bkerwick opened this issue Mar 22, 2023 · 15 comments

Comments

@bkerwick
Copy link

bkerwick commented Mar 22, 2023

We have started using expss to create a lot of tabs and workbooks and found we are getting a random error in the workflow.
To replicate a random error is hard but below is my attempt with screenshots where in the loop the error occurred.

Would like to know if there is a way to avoid this and if i am missing something obvious. Thank you for your time




transport <- sample(c("Car", "Bike"), 10000, replace = TRUE)
age <- sample(c("Under 30", "Over 30"), 10000, replace = TRUE)
gender <- sample(c("M", "F"), 10000, replace = TRUE)
education <- sample(c("High School", "Bachelor's Degree", "Master's Degree", "PhD"), 10000, replace = TRUE)
occupation <- sample(c("Teacher", "Engineer", "Software Developer", "Lawyer", "Nurse", "Professor", "Salesperson", "Doctor", "Marketing Manager", "CEO"), 10000, replace = TRUE)
income <- sample(c("Under 60k", "Over 60k"), 10000, replace = TRUE)
products.Held.Banking..Transaction.Cheque.Current.account <- sample(c('Transaction / Cheque / Current account', ''),10000, replace=TRUE)
products.Held.Banking..Savings.Passbook.Call.account <- sample(c('Savings / Passbook / Call account',''), 10000, replace=TRUE)
products.Held.Banking..Bonus.Bonds <- sample(c( 'Bonus Bonds', ''), 10000, replace=TRUE)
products.Held.Banking..Term.Deposit.Term.Investment <- sample(c( 'Term Deposit / Term Investment', ''), 10000, replace=TRUE)
products.Held.Banking..Unit.Trust.or.Managed.Fund <- sample(c( 'Unit Trust or Managed Fund', ''), 10000, replace=TRUE)
products.Held.Banking..Personal.Retirement.Savings.Superannuation <- sample(c('Personal Retirement Savings / Superannuation', ''), 10000, replace=TRUE)
products.Held.Banking..KiwiSaver <- sample(c( 'KiwiSaver',  ''), 10000, replace=TRUE)
products.Held.Banking..Mortgage.or.Loan.on.the.home.you.live.in <- sample(c( 'Mortgage or Loan on the home you live in',  ''), 10000, replace=TRUE)
products.Held.Banking..Mortgage.or.Loan.on.other.properties.you.own <- sample(c( 'Mortgage or Loan on other properties you own',  ''), 10000, replace=TRUE)
products.Held.Banking..Personal.Loan <- sample(c( 'Personal Loan', ''), 10000, replace=TRUE)
products.Held.Banking..Credit.Card <- sample(c( 'Credit Card', ''), 10000, replace=TRUE)
products.Held.Banking..Debit.Card <- sample(c( 'Debit Card',""), 10000, replace=TRUE)
employment.new <- sample(c('Self-employed - own your own business', 'Self-employed - own your own farm', 'Work in full-time paid employment (i.e. 30 hours or more per week)', 'Work in part-time paid employment (i.e. less than 30 hours per week)', 'Full-time Home Executive', 'Student - Full time', 'Student - Part time', 'Not working at the moment', 'Retired and not working at all', 'Retired, but working occasionally', 'Other', "I'd prefer not to say"), 10000, replace=TRUE)
home.Ownership <- sample(c('The owner of your house', 'Renting or leasing your house', 'A boarder at your house', 'Living with your parents or other relatives', 'Other'), 10000, replace=TRUE)
household.Situation <- sample(c('Single person living alone', 'Single parent living with child / children', 'Single person - have children but they have all left home', "Couple - don't have any children", 'Couple - have child / children living at home', 'Couple - have children, but they have all left home', 'Share household (i.e. adults sharing a house / flatting together)', 'Live with parents', 'Extended family household (i.e. more than two generations living together)', 'Other household arrangement', 'Prefer not to say'), 10000, replace=TRUE)

month.Wave <- sample(c('2020-Jan','2020-Feb','2020-Mar','2020-Apr','2020-May', '2020-Jun', '2020-Jul', '2020-Aug', '2020-Sep', '2020-Oct', '2020-Nov', '2020-Dec','2021-Jan','2021-Feb','2021-Mar','2021-Apr','2021-May', '2021-Jun', '2021-Jul', '2021-Aug', '2021-Sep', '2021-Oct', '2021-Nov', '2021-Dec','2022-Jan','2022-Feb','2022-Mar','2022-Apr','2022-May','2022-Jun','2022-Jul','2022-Aug','2022-Sep','2022-Oct','2022-Nov','2022-Dec','2023-Jan','2023-Feb','2023-Mar','2023-Apr'), 10000, replace=TRUE)




weight2 <- runif(10000, min = 0.2, max = 2.0)
duration <- runif(10000, min = 5, max = 15)
nps <- runif(10000, min = -100, max = 100)
expense <- runif(10000, min = 50, max = 100000)
recommend  <- runif(10000, min = 1, max = 10)



df <- data.frame(
  Transport = transport,
  Age = age,
  Gender = gender,
  Education = education,
  Occupation = occupation,
  Income = income,
  Weightinput = weight2,
  Products.Held.Banking..Transaction.Cheque.Current.account = products.Held.Banking..Transaction.Cheque.Current.account,
  Products.Held.Banking..Savings.Passbook.Call.account = products.Held.Banking..Savings.Passbook.Call.account,
  Products.Held.Banking..Bonus.Bonds = products.Held.Banking..Bonus.Bonds,
  Products.Held.Banking..Term.Deposit.Term.Investment = products.Held.Banking..Term.Deposit.Term.Investment,
  Products.Held.Banking..Unit.Trust.or.Managed.Fund = products.Held.Banking..Unit.Trust.or.Managed.Fund,
  Products.Held.Banking..Personal.Retirement.Savings.Superannuation = products.Held.Banking..Personal.Retirement.Savings.Superannuation,
  Products.Held.Banking..KiwiSaver = products.Held.Banking..KiwiSaver,
  Products.Held.Banking..Mortgage.or.Loan.on.the.home.you.live.in = products.Held.Banking..Mortgage.or.Loan.on.the.home.you.live.in,
  Products.Held.Banking..Mortgage.or.Loan.on.other.properties.you.own = products.Held.Banking..Mortgage.or.Loan.on.other.properties.you.own,
  Products.Held.Banking..Personal.Loan = products.Held.Banking..Personal.Loan,
  Products.Held.Banking..Credit.Card = products.Held.Banking..Credit.Card,
  Products.Held.Banking..Debit.Card = products.Held.Banking..Debit.Card,
  Employment.new = employment.new,
  Home.Ownership = home.Ownership,
  Household.Situation = household.Situation,
  Month.Wave = month.Wave,
  Duration = duration,
  NPS = nps,
  Expense = expense,
  Recommend = recommend
)





df$Transport <- factor(df$Transport, levels=c('Car', 'Bike'), ordered=TRUE)
df$Age <- factor(df$Age, levels=c('Under 30', 'Over 30'), ordered=TRUE)
df$Gender <- factor(df$Gender, levels=c('M', 'F'), ordered=TRUE)
df$Education <- factor(df$Education, levels=c('High School', "Bachelor's Degree", "Master's Degree", "PhD"), ordered=TRUE)
df$Occupation <- factor(df$Occupation, levels=c('Teacher', 'Engineer', 'Software Developer', 'Lawyer', 'Nurse', 'Professor', 'Salesperson', 'Doctor', 'Marketing Manager', 'CEO'), ordered=TRUE)
df$Income <- factor(df$Income, levels=c('Under 60k', 'Over 60k'), ordered=TRUE)
df$Products.Held.Banking..Transaction.Cheque.Current.account <- factor(df$Products.Held.Banking..Transaction.Cheque.Current.account, levels=c('Transaction / Cheque / Current account', 'Savings / Passbook / Call account', 'Bonus Bonds', 'Term Deposit / Term Investment', 'Unit Trust or Managed Fund', 'Personal Retirement Savings / Superannuation', 'KiwiSaver', 'Mortgage or Loan on the home you live in', 'Mortgage or Loan on other properties you own', 'Personal Loan', 'Credit Card', 'Debit Card'), ordered=TRUE)
df$Products.Held.Banking..Savings.Passbook.Call.account <- factor(df$Products.Held.Banking..Savings.Passbook.Call.account, levels=c('Transaction / Cheque / Current account', 'Savings / Passbook / Call account', 'Bonus Bonds', 'Term Deposit / Term Investment', 'Unit Trust or Managed Fund', 'Personal Retirement Savings / Superannuation', 'KiwiSaver', 'Mortgage or Loan on the home you live in', 'Mortgage or Loan on other properties you own', 'Personal Loan', 'Credit Card', 'Debit Card'), ordered=TRUE)
df$Products.Held.Banking..Bonus.Bonds <- factor(df$Products.Held.Banking..Bonus.Bonds, levels=c('Transaction / Cheque / Current account', 'Savings / Passbook / Call account', 'Bonus Bonds', 'Term Deposit / Term Investment', 'Unit Trust or Managed Fund', 'Personal Retirement Savings / Superannuation', 'KiwiSaver', 'Mortgage or Loan on the home you live in', 'Mortgage or Loan on other properties you own', 'Personal Loan', 'Credit Card', 'Debit Card'), ordered=TRUE)
df$Products.Held.Banking..Term.Deposit.Term.Investment <- factor(df$Products.Held.Banking..Term.Deposit.Term.Investment, levels=c('Transaction / Cheque / Current account', 'Savings / Passbook / Call account', 'Bonus Bonds', 'Term Deposit / Term Investment', 'Unit Trust or Managed Fund', 'Personal Retirement Savings / Superannuation', 'KiwiSaver', 'Mortgage or Loan on the home you live in', 'Mortgage or Loan on other properties you own', 'Personal Loan', 'Credit Card', 'Debit Card'), ordered=TRUE)
df$Products.Held.Banking..Unit.Trust.or.Managed.Fund <- factor(df$Products.Held.Banking..Unit.Trust.or.Managed.Fund, levels=c('Transaction / Cheque / Current account', 'Savings / Passbook / Call account', 'Bonus Bonds', 'Term Deposit / Term Investment', 'Unit Trust or Managed Fund', 'Personal Retirement Savings / Superannuation', 'KiwiSaver', 'Mortgage or Loan on the home you live in', 'Mortgage or Loan on other properties you own', 'Personal Loan', 'Credit Card', 'Debit Card'), ordered=TRUE)
df$Products.Held.Banking..Personal.Retirement.Savings.Superannuation <- factor(df$Products.Held.Banking..Personal.Retirement.Savings.Superannuation, levels=c('Transaction / Cheque / Current account', 'Savings / Passbook / Call account', 'Bonus Bonds', 'Term Deposit / Term Investment', 'Unit Trust or Managed Fund', 'Personal Retirement Savings / Superannuation', 'KiwiSaver', 'Mortgage or Loan on the home you live in', 'Mortgage or Loan on other properties you own', 'Personal Loan', 'Credit Card', 'Debit Card'), ordered=TRUE)
df$Products.Held.Banking..KiwiSaver <- factor(df$Products.Held.Banking..KiwiSaver, levels=c('Transaction / Cheque / Current account', 'Savings / Passbook / Call account', 'Bonus Bonds', 'Term Deposit / Term Investment', 'Unit Trust or Managed Fund', 'Personal Retirement Savings / Superannuation', 'KiwiSaver', 'Mortgage or Loan on the home you live in', 'Mortgage or Loan on other properties you own', 'Personal Loan', 'Credit Card', 'Debit Card'), ordered=TRUE)
df$Products.Held.Banking..Mortgage.or.Loan.on.the.home.you.live.in <- factor(df$Products.Held.Banking..Mortgage.or.Loan.on.the.home.you.live.in, levels=c('Transaction / Cheque / Current account', 'Savings / Passbook / Call account', 'Bonus Bonds', 'Term Deposit / Term Investment', 'Unit Trust or Managed Fund', 'Personal Retirement Savings / Superannuation', 'KiwiSaver', 'Mortgage or Loan on the home you live in', 'Mortgage or Loan on other properties you own', 'Personal Loan', 'Credit Card', 'Debit Card'), ordered=TRUE)
df$Products.Held.Banking..Mortgage.or.Loan.on.other.properties.you.own <- factor(df$Products.Held.Banking..Mortgage.or.Loan.on.other.properties.you.own, levels=c('Transaction / Cheque / Current account', 'Savings / Passbook / Call account', 'Bonus Bonds', 'Term Deposit / Term Investment', 'Unit Trust or Managed Fund', 'Personal Retirement Savings / Superannuation', 'KiwiSaver', 'Mortgage or Loan on the home you live in', 'Mortgage or Loan on other properties you own', 'Personal Loan', 'Credit Card', 'Debit Card'), ordered=TRUE)
df$Products.Held.Banking..Personal.Loan <- factor(df$Products.Held.Banking..Personal.Loan, levels=c('Transaction / Cheque / Current account', 'Savings / Passbook / Call account', 'Bonus Bonds', 'Term Deposit / Term Investment', 'Unit Trust or Managed Fund', 'Personal Retirement Savings / Superannuation', 'KiwiSaver', 'Mortgage or Loan on the home you live in', 'Mortgage or Loan on other properties you own', 'Personal Loan', 'Credit Card', 'Debit Card'), ordered=TRUE)
df$Products.Held.Banking..Credit.Card <- factor(df$Products.Held.Banking..Credit.Card, levels=c('Transaction / Cheque / Current account', 'Savings / Passbook / Call account', 'Bonus Bonds', 'Term Deposit / Term Investment', 'Unit Trust or Managed Fund', 'Personal Retirement Savings / Superannuation', 'KiwiSaver', 'Mortgage or Loan on the home you live in', 'Mortgage or Loan on other properties you own', 'Personal Loan', 'Credit Card', 'Debit Card'), ordered=TRUE)
df$Products.Held.Banking..Debit.Card <- factor(df$Products.Held.Banking..Debit.Card, levels=c('Transaction / Cheque / Current account', 'Savings / Passbook / Call account', 'Bonus Bonds', 'Term Deposit / Term Investment', 'Unit Trust or Managed Fund', 'Personal Retirement Savings / Superannuation', 'KiwiSaver', 'Mortgage or Loan on the home you live in', 'Mortgage or Loan on other properties you own', 'Personal Loan', 'Credit Card', 'Debit Card'), ordered=TRUE)
df$Employment.new <- factor(df$Employment.new, levels=c('Self-employed - own your own business', 'Self-employed - own your own farm', 'Work in full-time paid employment (i.e. 30 hours or more per week)', 'Work in part-time paid employment (i.e. less than 30 hours per week)', 'Full-time Home Executive', 'Student - Full time', 'Student - Part time', 'Not working at the moment', 'Retired and not working at all', 'Retired, but working occasionally', 'Other', "I'd prefer not to say"), ordered=TRUE)
df$Home.Ownership <- factor(df$Home.Ownership, levels=c('The owner of your house', 'Renting or leasing your house', 'A boarder at your house', 'Living with your parents or other relatives', 'Other'), ordered=TRUE)
df$Household.Situation <- factor(df$Household.Situation, levels=c('Single person living alone', 'Single parent living with child / children', 'Single person - have children but they have all left home', "Couple - don't have any children", 'Couple - have child / children living at home', 'Couple - have children, but they have all left home', 'Share household (i.e. adults sharing a house / flatting together)', 'Live with parents', 'Extended family household (i.e. more than two generations living together)', 'Other household arrangement', 'Prefer not to say'), ordered=TRUE)

df$Month.Wave <- factor(df$Month.Wave, levels=c('2020-Jan','2020-Feb','2020-Mar','2020-Apr','2020-May', '2020-Jun', '2020-Jul', '2020-Aug', '2020-Sep', '2020-Oct', '2020-Nov', '2020-Dec','2021-Jan','2021-Feb','2021-Mar','2021-Apr','2021-May', '2021-Jun', '2021-Jul', '2021-Aug', '2021-Sep', '2021-Oct', '2021-Nov', '2021-Dec','2022-Jan','2022-Feb','2022-Mar','2022-Apr','2022-May','2022-Jun','2022-Jul','2022-Aug','2022-Sep','2022-Oct','2022-Nov','2022-Dec','2023-Jan','2023-Feb','2023-Mar','2023-Apr'), ordered=TRUE)

df$total <- 'Base'

df$Weight <-   df$Weightinput
df$Weight <- as.numeric(df$Weight)
##default Labels:
for (var in names(df)) {
  if (is.null(var_lab(df[[var]]))) {
    var_lab(df[[var]]) = var
  }
}


tableCaption  <- "show set_val_lab error"



var_lab(df$total) = ""



for (i in 1:3000) {
  print(i)


first_table = df %>%
  tab_significance_options(compare_type = "adjusted_first_column",min_base = 30,subtable_marks = "both",sig_labels_first_column = c("Batman+", "Joker-"),mode = c("replace")) %>%
  tab_cols(
    eval( expression(list(
      total(),
      df$Month.Wave
      
      
    )))
  ) %>%
  tab_cells(list(df$total)) %>%
  tab_stat_cases(total_row_position = "none", label = "row %",total_statistic = "u_responses") %>%
  tab_stat_cases(total_row_position = "none", label = "Unweighted") %>%
  tab_weight(df$Weight) %>%
  tab_stat_cases(total_row_position = "none", label = "Weighted") %>%
  tab_stat_cases(total_row_position = "none", label = "row %",total_statistic = "u_responses") %>%
  tab_cells(
    #add custom variables and rtable.txt
    eval(expression(list(
      
      df$Transport,
      df$Age,
      df$Gender,
      df$Education,
      df$Occupation,
      df$Income,
      mrset(Products.Held.Banking..Transaction.Cheque.Current.account %to% Products.Held.Banking..Debit.Card, label = 'Products Held Banking'), #3 months rolling
      df$Employment.new,
      df$Home.Ownership,
      df$Household.Situation
      
      
      
      
    )))
  ) %>%
  #tab_stat_cases(total_row_position = "above") %>%
  tab_stat_cpct(total_row_position = "above",total_label = c("row %","Unweighted", "Weighted"),total_statistic = c("u_responses","u_cases", "w_cases")) %>%
  tab_last_sig_cpct(mode = "replace") %>%
  tab_row_label("#Mean Statistics") %>%
  tab_cells(
    #means go here
    eval(expression(list(
      df$Weight,
      df$Duration,
      df$NPS,
      df$Expense,
      df$Recommend
      
    )))
  ) %>%
  tab_stat_mean_sd_n(weighted_valid_n = TRUE) %>%
  tab_last_sig_means(mode = "replace") %>%
  tab_pivot(stat_position = "inside_rows")%>%
  set_caption(tableCaption)

}

Screenshot 2023-03-22 at 1 12 45 PM
Screenshot 2023-03-22 at 2 03 11 PM
Screenshot 2023-03-22 at 2 19 13 PM
Screenshot 2023-03-22 at 2 42 47 PM
Screenshot 2023-03-22 at 2 58 46 PM

@gdemin
Copy link
Owner

gdemin commented Mar 22, 2023

Thank you for the detailed description. Could you provide the full result of the sessionInfo()? I need to see the list of attached packages. There is no such information in your screenshot.

@bkerwick
Copy link
Author

Sorry about that here you go

Screenshot 2023-03-23 at 11 06 36 AM

@gdemin
Copy link
Owner

gdemin commented Mar 27, 2023

I have run your code several times and didn't see any errors. Also tried with loaded dplyr with the same result. As far as I can see in the "attached packages" you load other packages except the expss. Could you give me all the code with library's  which you execute before running the code above?

@bkerwick
Copy link
Author

bkerwick commented Mar 28, 2023

I use macos but today i tried a fresh install of R on windows, installed expss and it's dependencies. Ran the code without doing anything else and below occurred. I hope this helps

envir

@bkerwick
Copy link
Author

bkerwick commented May 3, 2023

Was this what you were looking for?
Were you not able to recreate the error?

@gdemin
Copy link
Owner

gdemin commented May 6, 2023

@bkerwick I have reproduced this issue. It seems this bug is Windows specific. Currently I don't know why this happens. Will investigate further.

@quicly
Copy link

quicly commented Jul 13, 2023

I also encountered this issue today. I couldn't figure out what might be causing it, but it seems to be somehow related to unexpected behavior of the "/" character in value labels. Removing them has prevented this error, but I haven't tested it further yet, maybe it is just a coincidence, as this error was pretty random for me.

@JB0207
Copy link

JB0207 commented Sep 26, 2023

I also recently ran into the error when trying to create a crosstab with 208 variables in tab_cells and 16 in tab_cols. Removing "/" character did not solve the problem for me. I have observed that the error becomes more likely the more variables you include.

I wrote a code with trycatch as a temporary solution, so that I don't always have to re-run the code myself until it works. For the above scenario it took 7 tries until I got the record/table. Maybe it helps someone:

c = 0 # set counter to zero

repeat{
        
    error <- FALSE
    print(c)
    
    tryCatch(my_table <- datasetSAV %>%
               tab_cells(X1,
                             X2
               ) %>%
               tab_cols(total(),
                             X3,
                             X4
               ) %>%
               tab_stat_cases(label = "N", total_row_position = "above") %>%
               tab_stat_cpct(label="%", total_statistic = "w_cpct", total_label = "#Total cases") %>%
               tab_pivot(stat_position = "inside_columns") %>%
               drop_empty_rows() %>%
               drop_empty_columns(), 
               error = function(e){ error <<- TRUE})

    if(error == FALSE){ break } 
  
    if(c == 10){ break}
  
    c = c + 1
    print("Error")

 }

@gdemin Many thanks for your efforts and the great package!

@wck01
Copy link

wck01 commented Oct 5, 2023

Thank you so much, @JB0207, for your valuable comment. Your suggestion to use the "TryCatch function with Repeat" worked perfectly for me. I appreciate your time and effort in helping me with this issue.

@Waschoi
Copy link

Waschoi commented Feb 2, 2024

I also encountered this issue today. I couldn't figure out what might be causing it, but it seems to be somehow related to unexpected behavior of the "/" character in value labels. Removing them has prevented this error, but I haven't tested it further yet, maybe it is just a coincidence, as this error was pretty random for me.

I tried this approch and had no success. There problem seems to be somewhere else

Waschoi added a commit to Waschoi/expss that referenced this issue Feb 2, 2024
fix for gdemin#107
duplicated values in labels:
Waschoi added a commit to Waschoi/expss that referenced this issue Feb 2, 2024
fix for gdemin#107
duplicated values in labels:
@Waschoi
Copy link

Waschoi commented Feb 2, 2024

I forked your package and changed these 2 lines:
master...Waschoi:expss:master
Maybe this could be done as an option, but I am not clever enough to figure this out.

This works well for me.

@gdemin
Copy link
Owner

gdemin commented Feb 12, 2024

@Waschoi
Thank you for your investigation but I can't use this workaround in the CRAN version.

You removed the check for label code duplication. And duplicated codes in value labels can produce unpredictable bugs in further processing, such as table creation.

@Waschoi
Copy link

Waschoi commented Feb 14, 2024

This was not intended to be a permanent solution, but rather a good workaround for me. As soon as there is a real fix, I would use your version again. The bug really drove us crazy because it is not reproducible.

@Waschoi
Copy link

Waschoi commented Sep 19, 2024

Since R 4.4.1 the problem is gone 😁

@bkerwick
Copy link
Author

Thats great news thank you for your work around also

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants