Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closes #91 length attribute from max data length #213

Merged
merged 43 commits into from
Feb 11, 2024
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
a669657
Add a function for max length
cpiraux Dec 19, 2023
ea1201a
Update to use getOption for df name
cpiraux Dec 19, 2023
df65c51
Add message for data length
cpiraux Dec 20, 2023
6693b8e
Add argument to function
cpiraux Dec 22, 2023
878fa51
Add length from max data length to 'width' attribute
cpiraux Jan 3, 2024
edc977a
Update function description
cpiraux Jan 3, 2024
20f4853
add test for length argument
cpiraux Jan 3, 2024
8e01730
Update nchar_gt_200 to take into account NA values
cpiraux Jan 9, 2024
56f02d7
Add test for variable length < 200 when the variable contains NAs
cpiraux Jan 9, 2024
4d3973e
Uncomment code
cpiraux Jan 9, 2024
7fdcfc8
add "domain =" in function for test
cpiraux Jan 10, 2024
43c0a2d
Update style
cpiraux Jan 10, 2024
2bd889d
Update documentation
cpiraux Jan 10, 2024
2cd4a6f
Update style
cpiraux Jan 10, 2024
e796810
Update style
cpiraux Jan 10, 2024
5a39042
Merge branch 'main' into 91-max-length
cpiraux Jan 10, 2024
11eceab
Merge branch 'main' into 91-max-length
cpiraux Jan 22, 2024
2ec667f
update style
cpiraux Jan 22, 2024
e4961d5
Merge branch 'main' into 91-max-length
cpiraux Jan 25, 2024
2d6b3bb
add match.args
cpiraux Jan 26, 2024
53c9cef
add assertion on parameter
cpiraux Jan 26, 2024
86c604f
add assertion on parameters
cpiraux Jan 26, 2024
cd01549
remove blank line
cpiraux Jan 26, 2024
da6f80e
run devtools::document()
cpiraux Jan 26, 2024
9c129cc
Remove blank line
cpiraux Jan 26, 2024
7bbfca8
fix: #91 typo in match.arg function
bms63 Jan 28, 2024
4023016
fix: #91 resolve merge conflicts from #199
bms63 Jan 29, 2024
9ac78b0
docs: #91 including data option. updating vignettes
bms63 Jan 29, 2024
d02431e
fix: #91 global bindings and arguments
bms63 Jan 29, 2024
b846999
chore: #91 nolint commented code
bms63 Jan 29, 2024
5488216
fix: use match.arg for xportr_lenght lenght
averissimo Jan 30, 2024
89f421a
fix: correct order of parameters on vignettes
averissimo Jan 30, 2024
fb808a3
Merge branch 'main' into 91-max-length
cpiraux Feb 7, 2024
d41682b
change argument name length to length_source
cpiraux Feb 7, 2024
be56799
change order of argument
cpiraux Feb 7, 2024
13dd326
Added description in NEWS.md
cpiraux Feb 7, 2024
d811cb7
Update documentation
cpiraux Feb 7, 2024
7bf3770
Change argument name to source_length in test-length
cpiraux Feb 11, 2024
758846a
change argument name to length_source
cpiraux Feb 11, 2024
cc9d0df
Update NEWS.md
cpiraux Feb 11, 2024
c7a410b
Reduce line length less than 120 characters
cpiraux Feb 11, 2024
e74e7c0
Merge branch '91-max-length' of github.com:atorus-research/xportr int…
cpiraux Feb 11, 2024
98c075f
lint:reduce lenght of line
cpiraux Feb 11, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ export(length_log)
export(type_log)
export(var_names_log)
export(var_ord_msg)
export(variable_max_length)
export(xportr_df_label)
export(xportr_format)
export(xportr_label)
Expand Down
53 changes: 40 additions & 13 deletions R/length.R
Original file line number Diff line number Diff line change
@@ -1,16 +1,22 @@
#' Assign SAS Length
#'
#' Assigns SAS length from a metadata object to a given data frame. If a
#' length isn't present for a variable the length value is set to 200 for
#' character columns, and 8 for non-character columns. This value is stored in
#' the 'width' attribute of the column.
#' Assigns the SAS length to a specified data frame, either from a metadata object
#' or based on the calculated maximum data length. If a length isn't present for
#' a variable the length value is set to 200 for character columns, and 8
#' for non-character columns. This value is stored in the 'width' attribute of the column.
#'
#' @param .df A data frame of CDISC standard.
#' @param metadata A data frame containing variable level metadata. See
#' 'Metadata' section for details.
#' @param domain Appropriate CDSIC dataset name, e.g. ADAE, DM. Used to subset
#' the metadata object. If none is passed, then [xportr_metadata()] must be
#' called before hand to set the domain as an attribute of `.df`.
#' the metadata object. If none is passed, then name of the dataset passed as
#' .df will be used.
#' @param length Choose the assigned length from either metadata or data.
#'
#' If `"metadata"` is specified, the assigned length is from the metadata length.
#' If `"data"` is specified, the assigned length is determined by the calculated maximum data length.
#'
#' *Permitted Values*: `"metadata"`, `"data"`
#' @param verbose The action this function takes when an action is taken on the
#' dataset or function validation finds an issue. See 'Messaging' section for
#' details. Options are 'stop', 'warn', 'message', and 'none'
Expand Down Expand Up @@ -66,6 +72,7 @@
xportr_length <- function(.df,
metadata = NULL,
domain = NULL,
length = "metadata",
verbose = getOption("xportr.length_verbose", "none"),
metacore = deprecated()) {
if (!missing(metacore)) {
cpiraux marked this conversation as resolved.
Show resolved Hide resolved
Expand Down Expand Up @@ -109,17 +116,37 @@ xportr_length <- function(.df,

length_log(miss_vars, verbose)

length <- metadata[[variable_length]]
names(length) <- metadata[[variable_name]]
if (length == "metadata") {
length_metadata <- metadata[[variable_length]]
names(length_metadata) <- metadata[[variable_name]]

for (i in names(.df)) {
if (i %in% miss_vars) {
attr(.df[[i]], "width") <- impute_length(.df[[i]])
} else {
attr(.df[[i]], "width") <- length_metadata[[i]]
}
}
}

# Assign length from data
if (length == "data") {
var_length_max <- variable_max_length(.df)

length_data <- var_length_max[[variable_length]]
names(length_data) <- var_length_max[[variable_name]]

for (i in names(.df)) {
if (i %in% miss_vars) {
attr(.df[[i]], "width") <- impute_length(.df[[i]])
} else {
attr(.df[[i]], "width") <- length[[i]]
for (i in names(.df)) {
attr(.df[[i]], "width") <- length_data[[i]]
}

length_msg <- left_join(var_length_max, metadata[, c(variable_name, variable_length)], by = variable_name) %>%
filter(length.x < length.y)

max_length_msg(length_msg, verbose)
}


.df
}

Expand Down
28 changes: 28 additions & 0 deletions R/messages.R
Original file line number Diff line number Diff line change
Expand Up @@ -181,3 +181,31 @@ var_ord_msg <- function(reordered_vars, moved_vars, verbose) {
cli_h2("All variables in dataset are ordered")
}
}

#' Utility for data Lengths
#'
#' @param max_length Dataframe with data and metadata length
#' @param verbose Provides additional messaging for user
#'
#' @return Output to Console

max_length_msg <- function(max_length, verbose) {
if (nrow(max_length) > 0) {
cpiraux marked this conversation as resolved.
Show resolved Hide resolved
cli_h2("Variable length is shorter than the length specified in the metadata.")

xportr_logger(
glue(
"Update length in metadata to trim the variables:"
),
type = verbose
)

xportr_logger(
glue(
"{format(max_length[[1]], width = 8)} has a length of {format(as.character(max_length[[2]]), width = 3)}",
" and a length of {format(as.character(max_length[[3]]), width = 3)} in metadata"
),
type = verbose
)
}
}
39 changes: 38 additions & 1 deletion R/utils-xportr.R
Original file line number Diff line number Diff line change
Expand Up @@ -307,7 +307,7 @@

# 4.0 max length of Character variables <= 200 bytes
max_nchar <- data %>%
summarize(across(where(is.character), ~ max(nchar(., type = "bytes"))))
summarize(across(where(is.character), ~ max(0L, nchar(., type = "bytes"), na.rm = TRUE)))
nchar_gt_200 <- max_nchar[which(max_nchar > 200)]
if (length(nchar_gt_200) > 0) {
err_cnd <- c(
Expand Down Expand Up @@ -372,6 +372,42 @@
}
}


#' Calculate the maximum length of variables
#'
#' Function to calculate the maximum length of variables in a given dataframe
#'
#' @inheritParams xportr_length
#'
#' @return Returns a dataframe with variables and their maximum length
#'
#' @export

variable_max_length <- function(.df) {
variable_length <- getOption("xportr.length")
cpiraux marked this conversation as resolved.
Show resolved Hide resolved
variable_name <- getOption("xportr.variable_name")

max_nchar <- .df %>%
summarize(across(where(is.character), ~ max(0L, nchar(., type = "bytes"), na.rm = TRUE)))


xport_max_length <- data.frame()
col <- 0
for (var in names(.df)) {
col <- col + 1

xport_max_length[col, variable_name] <- var

if (is.character(.df[[var]])) {
xport_max_length[col, variable_length] <- max_nchar[var]
} else {
xport_max_length[col, variable_length] <- 8
}
}

return(xport_max_length)
}

#' Custom check for metadata object
#'
#' Improvement on the message clarity over the default assert(...) messages.
Expand Down Expand Up @@ -421,3 +457,4 @@
#' Internal choices for verbose option
#' @noRd
.internal_verbose_choices <- c("none", "warn", "message", "stop")

Check warning on line 460 in R/utils-xportr.R

View workflow job for this annotation

GitHub Actions / lint

file=R/utils-xportr.R,line=460,col=1,[trailing_blank_lines_linter] Trailing blank lines are superfluous.
cpiraux marked this conversation as resolved.
Show resolved Hide resolved
19 changes: 19 additions & 0 deletions man/max_length_msg.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

17 changes: 17 additions & 0 deletions man/variable_max_length.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

16 changes: 12 additions & 4 deletions man/xportr_length.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

26 changes: 26 additions & 0 deletions tests/testthat/test-length.R
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these will need to have the argument updated to length_source

Original file line number Diff line number Diff line change
Expand Up @@ -193,6 +193,32 @@ test_that("xportr_length: Gets warning when metadata has multiple rows with same
multiple_vars_in_spec_helper2(xportr_length)
})

meta_example <- data.frame(
dataset = "df",
variable = c("USUBJID", "WEIGHT"),
length = c(10, 8)
)

df <- data.frame(
USUBJID = c("1", "12", "123"),
WEIGHT = c(85, 45, 121)
)

test_that("xportr_length: length assigned as expected from metadata or data", {
result <- df %>%
xportr_length(meta_example, domain = "df", length = "metadata") %>%
expect_attr_width(c(10, 8))

result <- df %>%
xportr_length(meta_example, domain = "df", length = "data") %>%
expect_attr_width(c(3, 8))
})

test_that("xportr_length: Gets message when length in metadata longer than data length", {
result <- df %>%
xportr_length(meta_example, domain = "df", length = "data") %>%
expect_message()
})

test_that("xportr_length: Works as expected with only one domain in metadata", {
adsl <- data.frame(
Expand Down
8 changes: 8 additions & 0 deletions tests/testthat/test-utils-xportr.R
Original file line number Diff line number Diff line change
Expand Up @@ -127,3 +127,11 @@ test_that("xpt_validate: Get error message when the length of a non-ASCII charac
"Length of A must be 200 bytes or less."
)
})

test_that("xpt_validate: Get error message when the length of a character variable is > 200 bytes and contains NAs", {
df <- data.frame(A = c(paste(rep("A", 201), collapse = ""), NA_character_))
expect_equal(
xpt_validate(df),
"Length of A must be 200 bytes or less."
)
})
Loading