Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement data_seek() #458

Merged
merged 19 commits into from
Sep 12, 2023
Merged

Implement data_seek() #458

merged 19 commits into from
Sep 12, 2023

Conversation

strengejacke
Copy link
Member

@strengejacke strengejacke commented Sep 12, 2023

@easystats/maintainers What do you think about this function? I find it useful when I see a questionnaire of a data set and am looking for particular variables where the pattern I know are present in value or variable labels.

One problem is the name - this functions tries to find variables, but we already have data_find() (and find_columns() as alias), which work differently and also return the extracted variables - seek_variables(), on contrast, returns a data frame with a summary of found matches. find_variables()` is already taken in insight.

data_seek() is currently an alias.

@strengejacke
Copy link
Member Author

btw, this is a slightly modified version of sjmisc::find_in_data()

@strengejacke
Copy link
Member Author

strengejacke commented Sep 12, 2023

Examples:

library(datawizard)

# seek variables with "Length" in variable name or labels
seek_variables(iris, "Length")
#> index |       column |       labels
#> -----------------------------------
#>     1 | Sepal.Length | Sepal.Length
#>     3 | Petal.Length | Petal.Length

# seek variables with "dependency" in names or labels
# column "e42dep" has a label-attribute "elder's dependency"
data(efc)
seek_variables(efc, "dependency")
#> index | column |             labels
#> -----------------------------------
#>     3 | e42dep | elder's dependency

# "female" only appears as value label attribute - default search is in
# variable names and labels only, so no match
seek_variables(efc, "female")
#> Can't export table to 
#>   text
#>   , data frame is empty.

# when we seek in all sources, we find the variable "e16sex"
seek_variables(efc, "female", seek = "all")
#> index | column |         labels
#> -------------------------------
#>     2 | e16sex | elder's gender

# typo, no match
seek_variables(iris, "Lenght")
#> Can't export table to 
#>   text
#>   , data frame is empty.

# typo, fuzzy match
seek_variables(iris, "Lenght", fuzzy = TRUE)
#> index |       column |       labels
#> -----------------------------------
#>     1 | Sepal.Length | Sepal.Length
#>     3 | Petal.Length | Petal.Length

# multiple pattern
seek_variables(efc, c("female", "dependency"), seek = "all")
#> index | column |             labels
#> -----------------------------------
#>     2 | e16sex |     elder's gender
#>     3 | e42dep | elder's dependency

Created on 2023-09-12 with reprex v2.0.2

@codecov

This comment was marked as outdated.

@rempsyc
Copy link
Member

rempsyc commented Sep 12, 2023

Ich mag, especially the integration with labels :)

@DominiqueMakowski
Copy link
Member

As someone that doesn't use labels, I don't expect to use this function over data_find(), but I can see the benefits

@etiennebacher
Copy link
Member

I like this, it would have been useful for me before. How about data_search() for the name?

NAMESPACE Outdated Show resolved Hide resolved
R/seek_variables.R Outdated Show resolved Hide resolved
R/seek_variables.R Outdated Show resolved Hide resolved
R/seek_variables.R Outdated Show resolved Hide resolved
R/seek_variables.R Outdated Show resolved Hide resolved
strengejacke and others added 2 commits September 12, 2023 19:51
Co-authored-by: Etienne Bacher <52219252+etiennebacher@users.noreply.github.com>
Co-authored-by: Etienne Bacher <52219252+etiennebacher@users.noreply.github.com>
@etiennebacher etiennebacher changed the title Draft seek_variables Implement data_seek() Sep 12, 2023
@strengejacke
Copy link
Member Author

As someone that doesn't use labels

You could at least use it for factor levels :-)

datawizard::data_seek(iris, "setosa", seek = "all")
#> index |  column |  labels
#> -------------------------
#>     5 | Species | Species

Created on 2023-09-12 with reprex v2.0.2

Copy link
Member

@etiennebacher etiennebacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @strengejacke !

@etiennebacher etiennebacher merged commit 02969e0 into main Sep 12, 2023
25 of 26 checks passed
@etiennebacher etiennebacher deleted the seek_variables branch September 12, 2023 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants