Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closes #142 issue_142_updated to account for DT, DTM, TM variables #145

Merged
merged 18 commits into from
Jun 15, 2023

Conversation

adchan11
Copy link
Collaborator

Thank you for your Pull Request!

We have developed a Pull Request template to aid you and our reviewers. Completing the below tasks helps to ensure our reviewers can maximize their time on your code as well as making sure the xportr codebase remains robust and consistent.

The scope of {xportr}

{xportr}'s scope is to enable R users to write out submission compliant xpt files that can be delivered to a Health Authority or to downstream validation software programs. We see labels, lengths, types, ordering and formats from a dataset specification object (SDTM and ADaM) as being our primary focus. We also see messaging and warnings to users around applying information from the specification file as a primary focus. Please make sure your Pull Request meets this scope of {xportr}. If your Pull Request moves beyond this scope, please get in touch with the {xportr} team on slack or create an issue to discuss.

Please check off each task box as an acknowledgment that you completed the task. This checklist is part of the Github Action workflows and the Pull Request will not be merged into the devel branch until you have checked off each task.

Changes Description

#142

do not convert DT, DT, and DTM variables with a format specified in the metacore specs (e.g. date9., datetime20.) to numeric, which will cause a 10 year difference when reading it back by read_xpt

Task List

  • The spirit of xportr is met in your Pull Request
  • Place Closes #<insert_issue_number> into the beginning of your Pull Request Title (Use Edit button in top-right if you need to update)
  • Summary of changes filled out in the above Changes Description. Can be removed or left blank if changes are minor/self-explanatory.
  • Check that your Pull Request is targeting the devel branch, Pull Requests to main should use the Release Pull Request Template
  • Code is formatted according to the tidyverse style guide. Use styler package and functions to style files accordingly.
  • Updated relevant unit tests or have written new unit tests. See our Wiki for conventions used in this package.
  • Creation/updated relevant roxygen headers and examples. See our Wiki for conventions used in this package.
  • Run devtools::document() so all .Rd files in the man folder and the NAMESPACE file in the project root are updated appropriately
  • Run pkgdown::build_site() and check that all affected examples are displayed correctly and that all new/updated functions occur on the "Reference" page.
  • Update NEWS.md if the changes pertain to a user-facing function (i.e. it has an @export tag) or documentation aimed at users (rather than developers)
  • Address any updates needed for vignettes and/or templates
  • Link the issue Development Panel so that it closes after successful merging.
  • Fix merge conflicts
  • Pat yourself on the back for a job well done! Much love to your accomplishment!

@adchan11 adchan11 requested review from kaz462 and cpiraux May 31, 2023 03:41
R/type.R Outdated Show resolved Hide resolved
R/type.R Outdated
attributes(.df[[i]]) <<- orig_attributes
} else {
attributes(.df[[i]]) <- NULL
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data are still converted to numeric R class. The attribute should numeric but the R class should stay date/dttm/time. The numeric date variables should not go through ".df[[i]] <<- as.numeric(.df[[i]])"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your updates. Numeric dates are still coerced to dbl, it should not be the case.

@cpiraux can you give an example dataframe where this happens and what the expected result should be?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that it is actually not coerced but the message is still present. I used df and metadata in test-type for the review:
image
image
image

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fixed now @cpiraux

@elimillera can you take a look too? I've implemented your suggestion in today's meeting. Thanks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't use a metacore object but a dataframe for the metadata

metadata <- data.frame(
  dataset = c("adsl", "adsl", "adsl", "adsl"),
  variable = c("USUBJID", "DMDTC", "RFICDT", "RFICDTM"),
  type = c("text", "date", "integer", "integer"),
  format = c(NA, NA, "date9.", "datetime15.")
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jump in the conversation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went through and everything looked ok from the code side. I would rely on others to make sure it behaves as expected. I'll be sure to update this in the type.R documentation

I updated the testthat for this function and I see a new issue where in line 205 of test-type.R, the original df and xpt file have different timezones.

── Failure (test-type.R:208:3): xportr_type: date variables are not converted to numeric ──
df$RFICDTM (actual) not equal to df_xpt$RFICDTM (expected).

actual: 1490824800
expected: 1490832000

image

I checked the documentation and it comes from this parameter in the write_xpt function of haven:

image

Is this something to be concerned about? @cpiraux @elimillera @kaz462

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this! Timezone should be ignored. As the default value in haven for adjust_tz = TRUE, I think it is okay. I open dfdates.xpt in SAS and dates are okay. Is it possible to ignore the timezone when you compare the dates in testthat?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cpiraux test updated to expect that the as_character() of each datetime are equal so that timezone is ignored.

@cpiraux
Copy link
Collaborator

cpiraux commented Jun 1, 2023

Could you add a test for numeric date variables in test-type.R?

@adchan11
Copy link
Collaborator Author

adchan11 commented Jun 1, 2023

@cpiraux testthat updated

Chan, Adrian {MDBT~Mississauga} added 2 commits June 1, 2023 19:14
Copy link
Collaborator

@cpiraux cpiraux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your updates. Numeric dates are still coerced to dbl, it should not be the case.

@adchan11 adchan11 requested a review from elimillera June 6, 2023 15:56
@cpiraux
Copy link
Collaborator

cpiraux commented Jun 7, 2023

I did a test using a df iso metacore object as metadata. The numeric dates have been coerced to character without giving a message:

metadata <- data.frame(
  dataset = c("adsl", "adsl", "adsl", "adsl"),
  variable = c("USUBJID", "DMDTC", "RFICDT", "RFICDTM"),
  type = c("text", "date", "integer", "integer"),
  format = c(NA, NA, "date9.", "datetime15.")
)

adsl_original <- tibble::tribble(
  ~USUBJID, ~DMDTC, ~RFICDT, ~RFICDTM,
  "test1", "2017-03-30", "2017-03-30", "2017-03-30",
  "test2", "2017-01-08", "2017-01-08", "2017-01-08"
)

adsl_original$RFICDT <- as.Date(adsl_original$RFICDT)

adsl_original$RFICDTM <- as.POSIXct(adsl_original$RFICDTM)

adsl_xpt2 <- adsl_original %>%
  xportr_type(metadata) # Coerce variable type to match spec - stop if any mismatch

image

image

metacore does not contains any row:

image

image

and then type.x is _character
image

@cpiraux
Copy link
Collaborator

cpiraux commented Jun 7, 2023

Another issue that @kaz462 pointed out is when a date in the data frame is numeric, but the type in the metadata belongs to the character type. In this case, the date is not coerced to a character, and no message is provided to inform the user that the types in the metadata and data are different.

df <- data.frame(RFDTC = as.Date('2017-03-30'), RFICDT = as.Date('2017-03-30'), RFICDTM = as.POSIXct('2017-03-30'))


metacore_meta <- suppressWarnings(
  metacore::metacore(
    var_spec = data.frame(
      variable = c("RFDTC", "RFICDT", "RFICDTM"),
      type = c("date", "integer", "integer"),
      label = c("RFDTC Label", "RFICDT Label", "RFICDTM Label"),
      length = c(20, 8, 8),
      common = NA_character_,
      format = c(NA, "date9.", "datetime20.")
    )
  )
)

processed_df <- xportr_type(df, metacore_meta)

processed_df
image

In type.R, meta_ordered:

image

@adchan11
Copy link
Collaborator Author

adchan11 commented Jun 8, 2023

@cpiraux Pushed new changes. Please let me know your feedback.

@elimillera could you take a look when you get a chance? I'm finding the function more complicated more than I thought and I feel like I'm just putting 'workarounds' to ensure that Celine's issues are fixed while still passing the test-type.R. E.g. the meta_ordered logic is getting longer and longer to account for the discrepancies that Celine is finding

@cpiraux cpiraux mentioned this pull request Jun 12, 2023
14 tasks
Copy link
Member

@elimillera elimillera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went through and everything looked ok from the code side. I would rely on others to make sure it behaves as expected. I'll be sure to update this in the type.R documentation

Copy link
Collaborator

@cpiraux cpiraux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for the updates. It seems to work pretty well. I only see coerced message when no coercion is done. Could you check this?

adsl_original <- tibble::tribble(
  ~USUBJID, ~DMDTC, ~RFICDT, ~RFICDTM,
  "test1", "2017-03-30", "2017-03-30", "2017-03-30",
  "test2", "2017-01-08", "2017-01-08", "2017-01-08"
)

adsl_original$DMDTC <- as.Date(adsl_original$DMDTC)
adsl_original$RFICDT <- as.Date(adsl_original$RFICDT)

adsl_original$RFICDTM <- as.POSIXct(adsl_original$RFICDTM)

adsl_xpt <- adsl_original %>%
  xportr_type(metacore) 

image

@adchan11
Copy link
Collaborator Author

@cpiraux When I run this code, I get one coercion that seems to make sense:

adsl_original <- tibble::tribble(
  ~USUBJID, ~DMDTC, ~RFICDT, ~RFICDTM,
  "test1", "2017-03-30", "2017-03-30", "2017-03-30",
  "test2", "2017-01-08", "2017-01-08", "2017-01-08"
)

metadata <- data.frame(
  dataset = c("adsl", "adsl", "adsl", "adsl"),
  variable = c("USUBJID", "DMDTC", "RFICDT", "RFICDTM"),
  type = c("text", "date", "integer", "integer"),
  format = c(NA, NA, "date9.", "datetime15.")
)

adsl_original$DMDTC <- as.Date(adsl_original$DMDTC)
adsl_original$RFICDT <- as.Date(adsl_original$RFICDT)

adsl_original$RFICDTM <- as.POSIXct(adsl_original$RFICDTM)

adsl_xpt <- adsl_original %>%
  xportr_type(metadata) 
── Variable type mismatches found. ──

✔ 1 variables coerced
> View(metadata)
> View(adsl_original)
> lapply(adsl_original, class)
$USUBJID
[1] "character"

$DMDTC
[1] "Date"

$RFICDT
[1] "Date"

$RFICDTM
[1] "POSIXct" "POSIXt" 

> lapply(adsl_xpt, class)
$USUBJID
[1] "character"

$DMDTC
[1] "character"

$RFICDT
[1] "Date"

$RFICDTM
[1] "POSIXct" "POSIXt" 

Can you provide the code for your metacore object that gives you this issue where coercion shouldn't happen and the expected outcome?

@cpiraux
Copy link
Collaborator

cpiraux commented Jun 14, 2023

@cpiraux When I run this code, I get one coercion that seems to make sense:

Can you provide the code for your metacore object that gives you this issue where coercion shouldn't happen and the expected outcome?

Sorry I did not give the code with an issue. Please use this one:

metadata <- data.frame(
  dataset = c("adsl", "adsl", "adsl", "adsl"),
  variable = c("USUBJID", "DMDTC", "RFICDT", "RFICDTM"),
  type = c("text", "date", "integer", "integer"),
  format = c(NA, NA, "date9.", "datetime15.")
)

adsl_original <- tibble::tribble(
  ~USUBJID, ~DMDTC, ~RFICDT, ~RFICDTM,
  "test1", "2017-03-30", "2017-03-30", "2017-03-30",
  "test2", "2017-01-08", "2017-01-08", "2017-01-08"
)


adsl_original$RFICDT <- as.Date(adsl_original$RFICDT)
adsl_original$RFICDTM <- as.POSIXct(adsl_original$RFICDTM)

adsl_xpt2 <- adsl_original %>%
  xportr_type(metadata)

image

@adchan11
Copy link
Collaborator Author

adchan11 commented Jun 14, 2023

@cpiraux When I run this code, I get one coercion that seems to make sense:

Can you provide the code for your metacore object that gives you this issue where coercion shouldn't happen and the expected outcome?

Sorry I did not give the code with an issue. Please use this one:

metadata <- data.frame(
  dataset = c("adsl", "adsl", "adsl", "adsl"),
  variable = c("USUBJID", "DMDTC", "RFICDT", "RFICDTM"),
  type = c("text", "date", "integer", "integer"),
  format = c(NA, NA, "date9.", "datetime15.")
)

adsl_original <- tibble::tribble(
  ~USUBJID, ~DMDTC, ~RFICDT, ~RFICDTM,
  "test1", "2017-03-30", "2017-03-30", "2017-03-30",
  "test2", "2017-01-08", "2017-01-08", "2017-01-08"
)


adsl_original$RFICDT <- as.Date(adsl_original$RFICDT)
adsl_original$RFICDTM <- as.POSIXct(adsl_original$RFICDTM)

adsl_xpt2 <- adsl_original %>%
  xportr_type(metadata)

image

@cpiraux I fixed this issue but now we get the other issue where if a DTC variable is a date class, it is not coerced to character. But is it necessary for DTC variables to be coerced to character? When I use xportr_write and read in the xpt file, the date value is still the same so I feel like it is ok to leave it as it is.

Please let me know your thoughts. If you have any suggestions for improved logic in the code for xportr_type, that would be great as there is a lot of different scenarios with dates/datetimes that is making this quite confusing.

df <- data.frame(RFDTC = as.Date('2017-03-30'), RFICDT = as.Date('2017-03-30'), RFICDTM = as.POSIXct('2017-03-30'))


metacore_meta <- suppressWarnings(
  metacore::metacore(
    var_spec = data.frame(
      variable = c("RFDTC", "RFICDT", "RFICDTM"),
      type = c("date", "integer", "integer"),
      label = c("RFDTC Label", "RFICDT Label", "RFICDTM Label"),
      length = c(20, 8, 8),
      common = NA_character_,
      format = c(NA, "date9.", "datetime20.")
    )
  )
)

processed_df <- xportr_type(df, metacore_meta)

xportr_write(processed_df, file.path(system.file("extdata", package="xportr"), "dfdates2.xpt"))
df_xpt <- read_xpt(file.path(system.file("extdata", package="xportr"), "dfdates2.xpt"))

image

@cpiraux
Copy link
Collaborator

cpiraux commented Jun 14, 2023

@cpiraux I fixed this issue but now we get the other issue where if a DTC variable is a date class, it is not coerced to character. But is it necessary for DTC variables to be coerced to character? When I use xportr_write and read in the xpt file, the date value is still the same so I feel like it is ok to leave it as it is.

Please let me know your thoughts. If you have any suggestions for improved logic in the code for xportr_type, that would be great as there is a lot of different scenarios with dates/datetimes that is making this quite confusing.

It would be great to have both. --DTC variables coerced to chr when they have an other type and no message when there is no coercion.

If it is not possible to have it for this release, I suggest to have no message when no coercion and add an issue for --DTC variables to be done for the next release. What do you think?

Copy link
Collaborator

@cpiraux cpiraux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep it like this and resolve the coercion from --DTC variable to for the next release. Could you add an issue for this?

@adchan11
Copy link
Collaborator Author

It would be great to have both. --DTC variables coerced to chr when they have an other type and no message when there is no coercion.

Done! Thanks for your review.

@bms63 bms63 linked an issue Jun 15, 2023 that may be closed by this pull request
@bms63 bms63 changed the title issue_142_updated to account for DT, DTM, TM variables Closes #142 issue_142_updated to account for DT, DTM, TM variables Jun 15, 2023
@bms63
Copy link
Collaborator

bms63 commented Jun 15, 2023

LGTM!

@bms63 bms63 merged commit 79550f9 into devel Jun 15, 2023
@bms63 bms63 deleted the issue_142_dates branch June 15, 2023 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dates are not correctly derived (SAS vs R)
4 participants