Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate pattern in favour of select in data_rename() #568

Merged
merged 20 commits into from
Dec 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Type: Package
Package: datawizard
Title: Easy Data Wrangling and Statistical Transformations
Version: 0.13.0.15
Version: 0.13.0.16
Authors@R: c(
person("Indrajeet", "Patil", , "patilindrajeet.science@gmail.com", role = "aut",
comment = c(ORCID = "0000-0003-1995-6531")),
Expand Down
17 changes: 13 additions & 4 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,18 @@
# datawizard (development)

BREAKING CHANGES

* Argument `drop_na` in `data_match()` is deprecated now. Please use `remove_na`
instead.
BREAKING CHANGES AND DEPRECATIONS

* Argument `drop_na` in `data_match()` is deprecated now. Please use
`remove_na` instead.

* In `data_rename()` (#567):
- argument `pattern` is deprecated. Use `select` instead.
- argument `safe` is deprecated. The function now errors when `select`
contains unknown column names.
- when `replacement` is `NULL`, an error is now thrown (previously, column
indices were used as new names).
- if `select` (previously `pattern`) is a named vector, then all elements
must be named, e.g. `c(length = "Sepal.Length", "Sepal.Width")` errors.

CHANGES

Expand Down
12 changes: 10 additions & 2 deletions R/data_addprefix.R
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
#' @rdname data_rename
#' Add a prefix or suffix to column names
#'
#' @rdname data_prefix_suffix
#' @inheritParams extract_column_names
#' @param pattern A character string, which will be added as prefix or suffix
#' to the column names.
#' @param ... Other arguments passed to or from other functions.
#'
#' @seealso
#' [data_rename()] for more fine-grained column renaming.
#' @examples
#' # Add prefix / suffix to all columns
#' head(data_addprefix(iris, "NEW_"))
Expand Down Expand Up @@ -29,7 +37,7 @@ data_addprefix <- function(data,
}


#' @rdname data_rename
#' @rdname data_prefix_suffix
#' @export
data_addsuffix <- function(data,
pattern,
Expand Down
143 changes: 70 additions & 73 deletions R/data_rename.R
Original file line number Diff line number Diff line change
@@ -1,36 +1,27 @@
#' @title Rename columns and variable names
#' @name data_rename
#'
#' @description Safe and intuitive functions to rename variables or rows in
#' data frames. `data_rename()` will rename column names, i.e. it facilitates
#' renaming variables `data_addprefix()` or `data_addsuffix()` add prefixes
#' or suffixes to column names. `data_rename_rows()` is a convenient shortcut
#' renaming variables. `data_rename_rows()` is a convenient shortcut
#' to add or rename row names of a data frame, but unlike `row.names()`, its
#' input and output is a data frame, thus, integrating smoothly into a possible
#' pipe-workflow.
#' input and output is a data frame, thus, integrating smoothly into a
#' possible pipe-workflow.
#'
#' @inheritParams extract_column_names
#' @param data A data frame, or an object that can be coerced to a data frame.
#' @param pattern Character vector.
#' - For `data_addprefix()` or `data_addsuffix()`, a character string, which
#' will be added as prefix or suffix to the column names.
#' - For `data_rename()`, indicates columns that should be selected for
#' renaming. Can be `NULL` (in which case all columns are selected).
#' `pattern` can also be a named vector. In this case, names are used as
#' values for the `replacement` argument (i.e. `pattern` can be a character
#' vector using `<new name> = "<old name>"` and argument `replacement` will
#' be ignored then).
#' @param replacement Character vector. Can be one of the following:
#' - A character vector that indicates the new names of the columns selected
#' in `pattern`. `pattern` and `replacement` must be of the same length.
#' - `NULL`, in which case columns are numbered in sequential order.
#' - A string (i.e. character vector of length 1) with a "glue" styled pattern.
#' Currently supported tokens are:
#' in `select`. `select` and `replacement` must be of the same length.
#' - A string (i.e. character vector of length 1) with a "glue" styled
#' pattern. Currently supported tokens are:
#' - `{col}` which will be replaced by the column name, i.e. the
#' corresponding value in `pattern`.
#' corresponding value in `select`.
#' - `{n}` will be replaced by the number of the variable that is replaced.
#' - `{letter}` will be replaced by alphabetical letters in sequential order.
#' - `{letter}` will be replaced by alphabetical letters in sequential
#' order.
#' If more than 26 letters are required, letters are repeated, but have
#' sequential numeric indices (e.g., `a1` to `z1`, followed by `a2` to `z2`).
#' sequential numeric indices (e.g., `a1` to `z1`, followed by `a2` to
#' `z2`).
#' - Finally, the name of a user-defined object that is available in the
#' environment can be used. Note that the object's name is not allowed to
#' be one of the pre-defined tokens, `"col"`, `"n"` and `"letter"`.
Expand All @@ -39,35 +30,32 @@
#' ```r
#' data_rename(
#' mtcars,
#' pattern = c("am", "vs"),
#' select = c("am", "vs"),
#' replacement = "new_name_from_{col}"
#' )
#' ```
#' ... which would return new column names `new_name_from_am` and
#' `new_name_from_vs`. See 'Examples'.
#'
#' If `pattern` is a named vector, `replacement` is ignored.
#' If `select` is a named vector, `replacement` is ignored.
#' @param rows Vector of row names.
#' @param safe Do not throw error if for instance the variable to be
#' renamed/removed doesn't exist.
#' @param verbose Toggle warnings and messages.
#' @param safe Deprecated. Passing unknown column names now always errors.
#' @param pattern Deprecated. Use `select` instead.
#' @param ... Other arguments passed to or from other functions.
etiennebacher marked this conversation as resolved.
Show resolved Hide resolved
#'
#' @details
#' `select` can also be a named character vector. In this case, the names are
#' used to rename the columns in the output data frame. See 'Examples'.
#'
#' @return A modified data frame.
#'
#' @examples
#' # Rename columns
#' head(data_rename(iris, "Sepal.Length", "length"))
#' # data_rename(iris, "FakeCol", "length", safe=FALSE) # This fails
#' head(data_rename(iris, "FakeCol", "length")) # This doesn't
#' head(data_rename(iris, c("Sepal.Length", "Sepal.Width"), c("length", "width")))
#'
#' # use named vector to rename
#' head(data_rename(iris, c(length = "Sepal.Length", width = "Sepal.Width")))
#'
#' # Reset names
#' head(data_rename(iris, NULL))
#'
#' # Change all
#' head(data_rename(iris, replacement = paste0("Var", 1:5)))
#'
Expand All @@ -80,8 +68,7 @@
#' x <- c("hi", "there", "!")
#' head(data_rename(mtcars[1:3], c("mpg", "cyl", "disp"), "col_{x}"))
#' @seealso
#' - Functions to rename stuff: [data_rename()], [data_rename_rows()],
#' [data_addprefix()], [data_addsuffix()]
#' - Add a prefix or suffix to column names: [data_addprefix()], [data_addsuffix()]
#' - Functions to reorder or remove columns: [data_reorder()], [data_relocate()],
#' [data_remove()]
#' - Functions to reshape, pivot or rotate data frames: [data_to_long()],
Expand All @@ -96,28 +83,48 @@
#'
#' @export
data_rename <- function(data,
pattern = NULL,
select = NULL,
replacement = NULL,
safe = TRUE,
verbose = TRUE,
pattern = NULL,
...) {
# change all names if no pattern specified
if (is.null(pattern)) {
pattern <- names(data)
# If the user does data_rename(iris, pattern = "Sepal.Length", "length"),
# then "length" is matched to select by position while it's the replacement
# => do the switch manually
if (!is.null(pattern)) {
.is_deprecated("pattern", "select")
if (!is.null(select)) {
replacement <- select
}
select <- pattern
}

if (!is.character(pattern)) {
insight::format_error("Argument `pattern` must be of type character.")
if (isFALSE(safe)) {
insight::format_warning("In `data_rename()`, argument `safe` is no longer used and will be removed in a future release.") # nolint
}

# check if `pattern` has names, and if so, use as "replacement"
if (!is.null(names(pattern))) {
replacement <- names(pattern)
# change all names if no pattern specified
select <- .select_nse(
select,
data,
exclude = NULL,
ignore_case = NULL,
regex = NULL,
allow_rename = TRUE,
verbose = verbose,
ifnotfound = "error"
)

# Forbid partially named "select",
# Ex: if select = c("foo" = "Species", "Sepal.Length") then the 2nd name and
# 2nd value are "Sepal.Length"
if (!is.null(names(select)) && any(names(select) == select)) {
insight::format_error("When `select` is a named vector, all elements must be named.")
}
etiennebacher marked this conversation as resolved.
Show resolved Hide resolved

# name columns 1, 2, 3 etc. if no replacement
if (is.null(replacement)) {
replacement <- paste0(seq_along(pattern))
# check if `select` has names, and if so, use as "replacement"
if (!is.null(names(select))) {
replacement <- names(select)
}

# coerce to character
Expand All @@ -126,22 +133,22 @@ data_rename <- function(data,
# check if `replacement` has no empty strings and no NA values
invalid_replacement <- is.na(replacement) | !nzchar(replacement)
if (any(invalid_replacement)) {
if (is.null(names(pattern))) {
# when user did not match `pattern` with `replacement`
if (is.null(names(select))) {
# when user did not match `select` with `replacement`
msg <- c(
"`replacement` is not allowed to have `NA` or empty strings.",
sprintf(
"Following values in `pattern` have no match in `replacement`: %s",
toString(pattern[invalid_replacement])
"Following values in `select` have no match in `replacement`: %s",
toString(select[invalid_replacement])
)
)
} else {
# when user did not name all elements of `pattern`
# when user did not name all elements of `select`
msg <- c(
"Either name all elements of `pattern` or use `replacement`.",
"Either name all elements of `select` or use `replacement`.",
sprintf(
"Following values in `pattern` were not named: %s",
toString(pattern[invalid_replacement])
"Following values in `select` were not named: %s",
toString(select[invalid_replacement])
)
)
}
Expand All @@ -163,30 +170,20 @@ data_rename <- function(data,
# check if we have "glue" styled replacement-string
glue_style <- length(replacement) == 1 && grepl("{", replacement, fixed = TRUE)

if (length(replacement) > length(pattern) && verbose) {
insight::format_alert(
paste0(
"There are more names in `replacement` than in `pattern`. The last ",
length(replacement) - length(pattern), " names of `replacement` are not used."
)
)
} else if (length(replacement) < length(pattern) && verbose && !glue_style) {
insight::format_alert(
paste0(
"There are more names in `pattern` than in `replacement`. The last ",
length(pattern) - length(replacement), " names of `pattern` are not modified."
)
)
if (length(replacement) > length(select)) {
insight::format_error("There are more names in `replacement` than in `select`.")
} else if (length(replacement) < length(select) && !glue_style) {
insight::format_error("There are more names in `select` than in `replacement`")
}

# if we have glue-styled replacement-string, create replacement pattern now
# if we have glue-styled replacement-string, create replacement select now
if (glue_style) {
replacement <- .glue_replacement(pattern, replacement)
replacement <- .glue_replacement(select, replacement)
}

for (i in seq_along(pattern)) {
for (i in seq_along(select)) {
if (!is.na(replacement[i])) {
data <- .data_rename(data, pattern[i], replacement[i], safe, verbose)
data <- .data_rename(data, select[i], replacement[i], safe, verbose)
}
}

Expand Down
Loading
Loading