Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Features and fixes #28

Merged
merged 12 commits into from
May 13, 2021
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Imports: Rcpp
LinkingTo: Rcpp
URL: https://github.com/metrumresearchgroup/lastdose
BugReports: https://github.com/metrumresearchgroup/lastdose/issues
Suggests: testthat
Suggests: testthat, withr
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.1.1
Expand Down
10 changes: 10 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,15 @@
# lastdose (development version)

- Change default value for `include_tafd` to FALSE (#29)
- Add option `lastdose.include_tafd` (#29)
- Add option `lastdose.id_col` (#27)
- Add option `lastdose.time_units` (#27)
- Search column names for candidate ID columns using `find_id_col()`;
this function is unexported (#25)
- Search column names for candidate time columns using `find_time_col()`;
this function is unexported (#25)
- Handle missing values (`NA`) in the time column (#30)

# lastdose 0.3.2

- Fix bug where `II` column was not properly detected resulting in incorrect
Expand Down
169 changes: 135 additions & 34 deletions R/lastdose.R
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,15 @@ NULL
#' @param time_col character name for the `TIME` column; this could be time after
#' first dose or time after first record or time relative to any origin; input
#' may be `numeric` or `POSIXct` (e.g. `DATETIME`); if `POSIXct`, a numeric
#' value will be calculated based on the value of `time_units`
#' value will be calculated based on the value of `time_units`. The data frame
#' will be searched for the first matching candidate time column using
#' [find_time_col()].
#' @param time_units for calculating time when the time column inherits
#' `POSIXct`; you may use any value that is valid for [difftime()]
#' @param id_col character name for the subject `ID` column; may be numeric
#' or character; if character, a numeric value is derived
#' or character; if character, a numeric value is derived. The data frame
#' will be searched for the first matching candidate `ID` column using
#' [find_id_col()].
#' @param back_calc if `TRUE`, then the time before the first dose
#' is calculated for records prior to the first dosing record when
#' at least one dosing record is found in the data set. Records before
Expand All @@ -41,13 +45,31 @@ NULL
#' @param include_tafd `logical`; if `FALSE`, then `TAFD` data is not appended
#' to the data set. Only used for the [lastdose()] function.
#'
#' @section Options:
#'
#' These are options that can be set to customize `lastdose` behavior
#' for the current context. See `?options` for how to set an option.
#'
#' - `lastdose.time_units`: sets the default time unit that is used to calculate
#' relative times when the time column is represented as date-time data
#' (`POSIXct`)
#' - `lastdose.id_col`: sets the default value for the `id_col` argument
#' to last dose; this identifies the column that is to be used to distinguish
#' individuals; the data in this column may be numeric or character
#' - `lastdose.include_tafd`: sets default value for `include_tafd`; if `TRUE`
#' then the time since the first dose record (EVID 1 or EVID 4) in the data
#' set will be automatically appended to the output data frame when
#' calling `lastdose()`; `tafd` is always included when calling
#' `lastdose_df()` and `lastdose_list()`
#'
#' @details
#'
#' When calling [lastdose()] to modify the data frame, three columns will be
#' When calling [lastdose()] to modify the data frame, two columns will be
#' added (by default): `TAD` indicating the time after the most-recent dose,
#' `TAFD` indicating the time after the first dose and `LDOS` indicating the
#' amount of the most recent dose. This default behavior can be modified with
#' the `include_ldos` and `include_tafd` arguments.
#' and `LDOS` indicating the amount of the most recent dose. `TAFD` indicating
#' the time after the first dose record (EVID 1 or 4) can be added via the
#' `include_tafd` argument and users can opt out from adding `LDOS` with the
#' `include_ldos` argument.
#'
#' When calling [lastdose_list()] or [lastdose_df()], the respective items are
#' accessible with `tad`, `tafd`, and `ldos` (note the lower case form here to
Expand Down Expand Up @@ -102,8 +124,9 @@ NULL
#'
#'
#' @export
lastdose <- function(data,..., include_ldos = TRUE, include_tafd = TRUE) {
ans <- lastdose_list(data,...)
lastdose <- function(data, ..., include_ldos = TRUE,
include_tafd = getOption("lastdose.include_tafd", FALSE)) {
ans <- lastdose_list(data, ...)
data[["TAD"]] <- ans[["tad"]]
if(include_tafd) data[["TAFD"]] <- ans[["tafd"]]
if(include_ldos) data[["LDOS"]] <- ans[["ldos"]]
Expand All @@ -113,40 +136,38 @@ lastdose <- function(data,..., include_ldos = TRUE, include_tafd = TRUE) {
#' @rdname lastdose
#' @export
lastdose_list <- function(data,
time_col = "TIME",
time_units = NULL,
id_col = "ID",
time_col = find_time_col(data),
time_units = getOption("lastdose.time_units", NULL),
id_col = find_id_col(data),
fill = -99,
back_calc = TRUE,
addl_ties = c("obs_first", "dose_first"),
comments = find_comments(data)) {
if(length(comments)==1) {

if(length(comments) == 1) {
comments <- rep(comments,nrow(data))
}
if(!length(comments)==nrow(data)) {
stop("'comments' must be have length equal to the number of rows in 'data'",call.=FALSE)
if(!length(comments) == nrow(data)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this:

  if(!length(comments) == nrow(data)) {

identical to:

  if(length(comments) != nrow(data)) {

When I run !length(comments) by itself I get FALSE, so wanted to confirm the functionality.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seemed to work as designed, just hadn't seen that syntax before
image

stop(
"'comments' must be have length equal to the number of rows in 'data'",
call. = FALSE
)
}
addl_ties <- match.arg(addl_ties)
sort1 <- addl_ties == "obs_first"
lower_names <- tolower(names(data))
wid <- match(id_col, names(data))
if(is.na(wid)) {
stop("did not find id column `", id_col, "` in `data`", call.=FALSE)
}
col_id <- data[[wid]]
if(is.character(col_id)) {
col_id <- match(col_id, unique(col_id))
}
if(!is.numeric(col_id)) {
stop("id column is required to be numeric", call.=FALSE)
}
wtime <- match(time_col, names(data))
if(is.na(wtime)) {
stop("did not find time column `", time_col, "` in `data`", call.=FALSE)
stop("did not find time column `", time_col, "` in `data`", call. = FALSE)
}
has_na_time <- anyNA(data[[wtime]])
if(has_na_time) {
na_time <- is.na(data[[wtime]])
data <- data[!na_time,, drop = FALSE]
}
col_time <- data[[wtime]]
if(inherits(col_time, "POSIXct")) {
if(missing(time_units)) {
if(is.null(time_units)) {
stop(
"`time_units` is required when time column inherits `POSIXct`",
call.=FALSE
Expand All @@ -164,6 +185,17 @@ lastdose_list <- function(data,
if(!is.numeric(col_time)) {
stop("time column is required to be numeric", call.=FALSE)
}
wid <- match(id_col, names(data))
if(is.na(wid)) {
stop("did not find id column `", id_col, "` in `data`", call.=FALSE)
}
col_id <- data[[wid]]
if(is.character(col_id)) {
col_id <- match(col_id, unique(col_id))
}
if(!is.numeric(col_id)) {
stop("id column is required to be numeric", call.=FALSE)
}
wamt <- match("amt", lower_names)
if(is.na(wamt)) {
stop("column AMT or amt is required in the data set", call.=FALSE)
Expand Down Expand Up @@ -214,19 +246,25 @@ lastdose_list <- function(data,
sort1,
comments
)
if(has_na_time) {
re_order <- order(c(which(!na_time), which(na_time)))
for(j in seq_along(ans)) {
ans[[j]] <- ans[[j]][re_order]
}
}
ans
}

#' @rdname lastdose
#' @export
lastdose_df <- function(data,...) {
ans <- lastdose_list(data,...)
lastdose_df <- function(data, ...) {
ans <- lastdose_list(data, ...)
data.frame(
tad = ans[["tad"]],
tafd = ans[["tafd"]],
ldos = ans[["ldos"]],
stringsAsFactors=FALSE,check.names=FALSE,
fix.empty.names=FALSE, row.names=NULL
stringsAsFactors = FALSE, check.names = FALSE,
fix.empty.names = FALSE, row.names = NULL
)
}

Expand Down Expand Up @@ -260,11 +298,11 @@ lastdose_df <- function(data,...) {
#'
#'
#' @export
find_comments <- function(x,...) UseMethod("find_comments")
find_comments <- function(x, ...) UseMethod("find_comments")
#' @rdname find_comments
#'
#' @export
find_comments.data.frame <- function(x,...) {
find_comments.data.frame <- function(x, ...) {
if(!inherits(x[["C"]], c("logical", "character"))) {
if(exists("C", x)) {
warning(
Expand All @@ -279,7 +317,7 @@ find_comments.data.frame <- function(x,...) {

#' @rdname find_comments
#' @export
find_comments.character <- function(x,...) {
find_comments.character <- function(x, ...) {
!(is.na(x)|x=='.')
}

Expand All @@ -288,3 +326,66 @@ find_comments.character <- function(x,...) {
find_comments.logical <- function(x, ...) {
x & !is.na(x)
}

#' Find TIME column
#'
#' Search data frame names for the first matching candidate TIME column name.
#' See `details`.
#'
#' @param data a data.frame to search
#' @details
#' Column names will be searched against the following candidates
#'
#' - `TIME`
#' - `DATETIME`
#'
#' The first the first candidate to be matched will be returned. If there
#' are no matches, an error is generated.
#'
#' @examples
#' data <- data.frame(A = 1, DATETIME = 2, TIME = 3, Z = 99)
#' lastdose:::find_time_col(data)
#'
find_time_col <- function(data) {
stopifnot(is.data.frame(data))
ans <- intersect(c("TIME", "DATETIME"), names(data))
if(length(ans)==0) {
stop("could not find a TIME column in `data`", call. = FALSE)
}
ans[1]
}

#' Find ID column
#'
#' Search data frame names for the first matching candidate ID column name.
#' See `details`.
#'
#' @param data a data.frame to search
#' @details
#' Column names will be searched against the following candidates
#'
#' - `getOption("lastdose.id_col")`
#' - `ID`
#' - `USUBJID`
#' - `SUBJID`
#' - `PTNO`
#' - `SUBJ`
#'
#' The first the first candidate to be matched will be returned. If there
#' are no matches, an error is generated.
#'
#' @examples
#' data <- data.frame(A = 1, B = 2, PTNO = 3, ID = 4, Z = 99)
#' lastdose:::find_id_col(data)
#'
find_id_col <- function(data) {
stopifnot(is.data.frame(data))
op <- getOption("lastdose.id_col", NULL)
can <- c(op, "ID", "USUBJID", "SUBJID", "PTNO", "SUBJ")
ans <- intersect(can, names(data))
if(length(ans)==0) {
stop("could not find a subject identifier column in `data`", call. = FALSE)
}
ans[1]
}

34 changes: 34 additions & 0 deletions man/find_id_col.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

30 changes: 30 additions & 0 deletions man/find_time_col.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading