Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create function update_schema() to edit field properties #70

Open
peterdesmet opened this issue Jan 3, 2022 · 3 comments
Open

Create function update_schema() to edit field properties #70

peterdesmet opened this issue Jan 3, 2022 · 3 comments
Labels
complexity:high Likely complex to implement enhancement New feature or request
Milestone

Comments

@peterdesmet
Copy link
Member

peterdesmet commented Jan 3, 2022

A created schema will only have the field properties name, type and (sometimes) constraints. I see it as fairly common to add more properties, such as description, required etc. It is possible to do that with purrr, but it isn't very straightforward. Maybe a specific function would be useful.

Create schema:

library(frictionless)
iris_schema <- create_schema(iris)
str(iris_schema)
#> List of 1
#>  $ fields:List of 5
#>   ..$ :List of 2
#>   .. ..$ name: chr "Sepal.Length"
#>   .. ..$ type: chr "number"
#>   ..$ :List of 2
#>   .. ..$ name: chr "Sepal.Width"
#>   .. ..$ type: chr "number"
#>   ..$ :List of 2
#>   .. ..$ name: chr "Petal.Length"
#>   .. ..$ type: chr "number"
#>   ..$ :List of 2
#>   .. ..$ name: chr "Petal.Width"
#>   .. ..$ type: chr "number"
#>   ..$ :List of 3
#>   .. ..$ name       : chr "Species"
#>   .. ..$ type       : chr "string"
#>   .. ..$ constraints:List of 1
#>   .. .. ..$ enum: chr [1:3] "setosa" "versicolor" "virginica"

Atomic function

iris_schema <- edit_field_property(iris_schema, "Sepal.Width", "description", "Sepal width in cm.")
# Same as: iris_schema$fields[[2]]$description <- "Sepal width in cm."

Not sure this is super useful, but it is very clear what field you are setting.

Loop function

iris_schema <- edit_fields(
  iris_schema,
  "description",
  c("Sepal length in cm.", "Sepal width in cm.", "Petal length in cm.", "Petal width in cm.", NA_character_)
)
# If value is NA or NULL, don't set property

Faster, but disconnect between field name and value you want to set.

Recode like function

iris_schema <- edit_fields(
  iris_schema,
  "description",
  "Sepal.length" = "Sepal length in cm.",
  "Sepal.width" = "Sepal width in cm.",
  "Species" = NA_character
)
# If field is not listed, don't set property
# If field is listed but NA or NULL, remove it

Note, it should also work for nested properties:

iris_schema <- edit_fields(
  iris_schema,
  "constraints$required",
  "Sepal.length" = true
)
@peterdesmet peterdesmet added the enhancement New feature or request label Jan 3, 2022
@damianooldoni
Copy link
Contributor

After our short chat, I completely agree on the benefit of having such a function in this package to cover basic and quite typical steps of handling data packages. Some thoughts:

  1. I think it will be still important to show in documentation how purrr function imap is used within edit_fields. In this way users can be inspired and write their own custom functions for cases way too specific for being included in the package. Sooner or later something like that will happen.
  2. I like the loop approach but it's true: there is no link between field name and value to set, so bad mistakes can occurr! Unless you use named vectors! 👍 See below an example.
  3. the recode like function is nice and easy to use as it is very tidyverse-like. The only drawback is its verbosity when it has to be applied to many fields: typos can arise as users have to write a lot within the same function. This is the reason why I seldom use recode in my daily life 😄

I think we should go for the loop option. And here below I show you a simple way to solve the drawback by using a named vector:

# get field names
field_names <- map_chr(iris_schema$fields, ~ .$name)

field_names
[1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"

# define values as a named vector
values <- c("Sepal length in cm.", "Sepal width in cm.", "Petal length in cm.", "Petal width in cm.", NA_character_)
names(values) <- field_names
values

iris_schema <- edit_fields(
    iris_schema,
    "description",
    values
  )

So, if the user provides an unnamed vector, then the order of the fields is used: maybe a message can be returned providing the order the function will use. Otherwise, the values are set based on the field names defined in the names.

@peterdesmet: in this way I think the loop function will match all our expectations. What do you think?

@peterdesmet
Copy link
Member Author

Also suggested by @beatrizmilz in ropensci/software-review#495 (comment):

Adding the descriptions to the schema does not seem trivial. There is an example with the purrr package. But the example might be not simple to understand if someone is not used to the purrr package.

I`m talking about this piece of code:

iris_schema <- create_schema(iris)

# Remove description for first field
iris_schema$fields[[1]]$description <- NULL

# Set descriptions for all fields
descriptions <- c(
  "Sepal length in cm.",
  "Sepal width in cm.",
  "Pedal length in cm.",
  "Pedal width in cm.",
  "Iris species."
)
iris_schema$fields <- purrr::imap(
  iris_schema$fields,
  ~ c(.x, description = descriptions[.y])
)

Do the authors think that it is possible to create a function to add descriptions to the schema, in a way that is used in a similarly to the other functions of the package? Example of the idea:

iris_schema <- create_schema(iris) |>
  add_description(
    c(
      "Sepal length in cm.",
      "Sepal width in cm.",
      "Pedal length in cm.",
      "Pedal width in cm.",
      "Iris species."
    )
  )

@peterdesmet
Copy link
Member Author

Finally got some time to think about this.

Workflow

  1. Create a schema first:
schema <-
  PlantGrowth %>%
  create_schema()
str(schema)
#> List of 1
#>  $ fields:List of 2
#>   ..$ :List of 2
#>   .. ..$ name: chr "weight"
#>   .. ..$ type: chr "number"
#>   ..$ :List of 3
#>   .. ..$ name       : chr "group"
#>   .. ..$ type       : chr "string"
#>   .. ..$ constraints:List of 1
#>   .. .. ..$ enum: chr [1:3] "ctrl" "trt1" "trt2"
  1. Properties can be added to each field in the schema by providing an unnamed vector to update_schema() (cf. to what @damianooldoni suggested above). Properties are added based on field order. Here we only provide a vector of length 1, so only the first field gets a property.
schema <-
  schema %>%
  update_schema(
    property = "unit",
    values = c("g")
  )
str(schema)
#> List of 1
#>  $ fields:List of 2
#>   ..$ :List of 2
#>   .. ..$ name: chr "weight"
#>   .. ..$ type: chr "number"
#>   .. ..$ unit: chr "g" <--------
#>   ..$ :List of 3
#>   .. ..$ name       : chr "group"
#>   .. ..$ type       : chr "string"
#>   .. ..$ constraints:List of 1
#>   .. .. ..$ enum: chr [1:3] "ctrl" "trt1" "trt2"
  1. Properties can also be added by providing a named vector to update_schema(). The convenience function field_names() is used to name the vector (New function fields_names() #196):
descriptions <- c("Weight of the plant", "Group the plant is in")
names(description) <- field_names(schema)
names(descriptions) <- names
descriptions
#>                  weight                   group 
#>   "Weight of the plant" "Group the plant is in"
schema <-
  schema %>%
  update_schema(
    property = "description",
    values = descriptions
  )
str(schema)
#> List of 1
#>  $ fields:List of 2
#>   ..$ :List of 2
#>   .. ..$ name: chr "weight"
#>   .. ..$ type: chr "number"
#>   .. ..$ unit: chr "g"
#>   .. ..$ description: chr "Weight of the plant" <--------
#>   ..$ :List of 3
#>   .. ..$ name       : chr "group"
#>   .. ..$ type       : chr "string"
#>   .. ..$ description: chr "Group the plant is in" <--------
#>   .. ..$ constraints:List of 1
#>   .. .. ..$ enum: chr [1:3] "ctrl" "trt1" "trt2"
  1. You can't update reserved properties:
schema <-
  schema %>%
  update_schema(
    property = "name",
    name = c("foo")
  )
#' Error: "name" is a reserved field property.
  1. The resource with the custom made schema can be added to a package:
package <-
  create_package() %>%
  add_resource("plant-growth", PlantGrowth, schema = schema)
  1. If you want to update a schema of an already attached resource (not advised), you can be assigning it directly:
package$resources[[1]]$schema <- schema

Function name

I'm tempted to go for update_schema() rather than edit_fields(). update_fields() would be a valuable alternative, it's just clear that it returns a schema (not fields).

get_schema(package, resource_name) => schema
create_schema(df) => schema
update_schema(schema) => schema <-----
field_names(schema) => vector

@damianooldoni @PietrH @nepito what do you think?

@peterdesmet peterdesmet changed the title Create function to edit field properties Create function update_schema() to edit field properties Mar 26, 2024
@peterdesmet peterdesmet added this to the 1.2.0 milestone Mar 27, 2024
@peterdesmet peterdesmet added the complexity:high Likely complex to implement label Jul 3, 2024
@peterdesmet peterdesmet modified the milestones: 1.2.0, 1.3.0 Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
complexity:high Likely complex to implement enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants