-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Updated site with new folder structure - allows subpages, all in one …
…place
- Loading branch information
Showing
69 changed files
with
12,617 additions
and
492 deletions.
There are no files selected for viewing
19 changes: 19 additions & 0 deletions
19
...ze/code-collection/2022-08-21-packages-for-summary-tables/index/execute-results/html.json
Large diffs are not rendered by default.
Oops, something went wrong.
15 changes: 15 additions & 0 deletions
15
...ollection/2022-10-26-summarising-across-columns-tidyverse/index/execute-results/html.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"hash": "b5fefe95a1a7481550d1964adb30c287", | ||
"result": { | ||
"engine": "knitr", | ||
"markdown": "---\ntitle: 'Summarising across columns'\nauthor: 'Chitra M Saraswati'\ndate: '2022-10-26'\nslug: summarising-across-columns-tidyverse\ncategories:\n - R\n - tidyverse\ndraft: yes\n---\n\n\nSomething commonly done in analyses is summarising across multiple columns: for example, you might want to calculate the mean for all variables in your dataset. I'll explain how to do this using the tidyverse tools. So let's load the tidyverse package:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyverse)\n```\n:::\n\n\nFor our first example, let's calculate the mean for each column in the `airquality` dataset, grouped by month.\n\nThe main function here is `across()` which lets you apply a function, or multiple functions, across multiple columns. The important options here is `.cols`, `.fns` and `.names`. I'll paste\n\n.cols, cols <tidy-select> Columns to transform. Because across() is used within functions like summarise() and mutate(), you can't select or compute upon grouping variables. .fns Functions to apply to each of the selected columns. Possible values are: • A function, e.g. mean. • A purrr-style lambda, e.g. \\~ mean(.x,na.rm = TRUE) • A list of functions/lambdas, e.g. list(mean = mean, n_miss = \\~ sum(is.na(.x)) • NULL: the default value, returns the selected columns in a data frame without applying a transformation. This is useful for when you want to use a function that takes a data frame. Within these functions you can use cur_column() and cur_group() to access the current column and grouping keys respectively. ... Additional arguments for the function calls in .fns. Using these ... is strongly discouraged because of issues of timing of evaluation. .names A glue specification that describes how to name the output columns. This can use {.col} to stand for the selected column name, and {.fn} to stand for the name of the function being applied. The default (NULL) is equivalent to \"{.col}\" for the single function case and \"{.col}\\_{.fn}\" for the case where a list is used for .fns.\n\nIn this instance, we use `.fns` to define the function we want to run. Additional options for `mean`--in this instance, `na.rm`, is added using a comma.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nairquality %>% \n group_by(Month) %>% \n summarise(across(\n .cols = -Day,\n .fns = mean,\n na.rm = TRUE\n ))\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\nWarning: There was 1 warning in `summarise()`.\nℹ In argument: `across(.cols = -Day, .fns = mean, na.rm = TRUE)`.\nℹ In group 1: `Month = 5`.\nCaused by warning:\n! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.\nSupply arguments directly to `.fns` through an anonymous function instead.\n\n # Previously\n across(a:b, mean, na.rm = TRUE)\n\n # Now\n across(a:b, \\(x) mean(x, na.rm = TRUE))\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 5 × 5\n Month Ozone Solar.R Wind Temp\n <int> <dbl> <dbl> <dbl> <dbl>\n1 5 23.6 181. 11.6 65.5\n2 6 29.4 190. 10.3 79.1\n3 7 59.1 216. 8.94 83.9\n4 8 60.0 172. 8.79 84.0\n5 9 31.4 167. 10.2 76.9\n```\n\n\n:::\n:::\n\n\nHere's how to count missing values for each column, grouped by month. In this instance we write the function using `~` instead of `.fns` and refer to the columns using `.x`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nairquality %>% \n group_by(Month) %>% \n summarise(across(\n .cols = -Day,\n ~sum(is.na(.x))\n ))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 5 × 5\n Month Ozone Solar.R Wind Temp\n <int> <int> <int> <int> <int>\n1 5 5 4 0 0\n2 6 21 0 0 0\n3 7 5 0 0 0\n4 8 5 3 0 0\n5 9 1 0 0 0\n```\n\n\n:::\n:::\n\n\nLet's check these counts using the `summary()` function for month five:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nairquality %>% \n filter(Month == 5) %>% select(Ozone, Solar.R) %>% \n summary()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n Ozone Solar.R \n Min. : 1.00 Min. : 8.0 \n 1st Qu.: 11.00 1st Qu.: 72.0 \n Median : 18.00 Median :194.0 \n Mean : 23.62 Mean :181.3 \n 3rd Qu.: 31.50 3rd Qu.:284.5 \n Max. :115.00 Max. :334.0 \n NA's :5 NA's :4 \n```\n\n\n:::\n:::\n\n\nRenaming across columns\n\n```r\ndat_allergy <- dat_allergy %>%\n rename_with(.cols = -or_matgate_nbr,\n ~ paste0(\"allergy_\", .x))\n```\n", | ||
"supporting": [], | ||
"filters": [ | ||
"rmarkdown/pagebreak.lua" | ||
], | ||
"includes": {}, | ||
"engineDependencies": {}, | ||
"preserve": {}, | ||
"postProcess": true | ||
} | ||
} |
15 changes: 15 additions & 0 deletions
15
...de-collection/2024-04-11-understanding-a-targets-workflow/index/execute-results/html.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"hash": "d37b4b20133be66acf6fd7cec3618ec6", | ||
"result": { | ||
"engine": "knitr", | ||
"markdown": "---\ntitle: \"Understanding a targets workflow: An introduction\"\nauthor: \"Chitra M Saraswati\"\ndescription: \"`{targets}`: why use it, how to use it, and some resources to get you started. (Hint: reduce the mental burden of figuring out which step of the analysis you're on.)\"\ndate: \"2024-04-11\"\ndate-modified: \"2024-04-23\"\ncategories:\n - R\n - workflow\n---\n\n\nI was recently introduced to the R package targets as part of learning how to develop software. As I continue to use it, I've been blown away by how much easier my workflow has become\\--even outside of software development\\--and I regret not having known of it sooner. It would have been nice to use this workflow when I was doing my Master's thesis, which involved a large amount of complex and repeated analyses. So here's my notes on the targets workflow so far: why use it, how to use it (especially if you've never used it before), and the resources I found most useful.\n\n# Why use a targets workflow?\n\nMost of us doing statistical analysis work on complex projects with moving parts. What I mean by this is: say I have a workflow that involves loading the data, some exploratory data analysis, fitting the data to a model, visualising it, and then creating a summary table of the data. It's nothing too complicated, but it *is* complex: if one part changes, the rest of the project changes too.\n\nFor example, perhaps the data is updated and we want to re-run the whole analysis from start to finish. Or perhaps we decide that the model isn't a good fit, so we run some other models and see which one is better; in this instance, only the model and visualisation would change, but the rest of the project stays the same.\n\nAll this is fine if you have well-annotated code and know which parts of the script to re-run. But what if the computations are intensive and time-consuming? Or what if it takes a while to render your table because you've included so many variables? Or even if you're just tweaking a few variables and then making a plot; if you miss re-naming a variable earlier on in your script, your plot might not run because it depends on that variable.\n\nIt's in situations like these where targets can help you. Using a targets workflow means you'll have it all laid out in front of you instead of needing to look for specific bits of code in your script and any dependencies. It'll take the headache out of figuring out which parts of the analysis depends on which variables and functions. Then when you change a variable name (or any part of the project, really), you'll know which parts of the analysis would need to be re-run, minus the headaches.\n\nTo summarise, I find a targets workflow especially useful in the following scenarios:\n\n- If you're doing something complex that has a lot of moving parts\n- If said moving parts will change as your project progresses\n- If you'd like to re-run your analysis with different inputs (e.g. a new dataset), or to create different outputs (e.g. plots and tables); especially if you're re-running your analysis a lot!\n\n# How to use targets\n\nWhen you're first starting to program in R--especially if you're mainly doing statistical analysis--a targets workflow might seem an unnecessary way of going about things. When I first came across the targets workflow, I felt overwhelmed and didn't understand how it could be useful for my workflow. So the way I'd recommend going about it is to take an existing project and to convert that to a targets workflow.\n\nThe [quick walkthrough](https://books.ropensci.org/targets/walkthrough.html) in the user manual is an excellent example of how to use the targets workflow--I highly recommend reading it and going through the worked example. It does a much better job than I can in giving a reproducible example of how to use targets. Then I recommend converting your project to a targets workflow by using that walkthrough, or this [quick tutorial](https://carpentries-incubator.github.io/targets-workshop/index.html).\n\nI'll walk you through a conceptual overview of how I use targets. I highly recommend you work through the two tutorials above first, though.\n\nFirst I make sure that I understand my workflow from beginning to end. This overview is the main advantage of using a targets workflow: you'll have a bird's eye view of your analysis and figure out where changes are necessary. For example, I may break down my analysis like this:\n\n- Read in csv file of data\n- Summarise data\n- Fit a model to the data\n- Create a plot of the model\n\nOnce I've done that and would like to start creating a targets workflow, I'll run the following in my console:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(targets)\nlibrary(tarchetypes)\n\ntar_script()\n```\n:::\n\n\nThis creates a file in your project directory called `_targets.R`. This is the high-level overview of the project, so I open that and write out something like the following:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# _targets.R file\n\n# Set-up ------------------------------\n\nlibrary(targets)\nlibrary(tarchetypes)\n\n# Load packages needed for this project\ntar_option_set(\n packages = c(\"tidyverse\",\n \"purrr\")\n )\n\n# Set up a workspace when our code errors\ntar_option_set(workspace_on_error = TRUE)\n\n# Load functions to be used in our project\nsource(\"R/functions.R\")\n\n# Target objects ------------------------------\n\ntar_plan(\n \n # Load data\n tar_file(\n path_to_data,\n \"./raw/data.csv\"\n ),\n \n tar_target(\n data,\n read_csv(path_to_data)\n ),\n \n # Create a summary table\n tar_target(\n summary_table,\n summarise_data(data)\n ),\n \n # Fit a model to the data\n tar_target(\n model,\n model_data(data)\n ),\n \n # Create plot of the data\n tar_target(\n plot,\n plot_data(data)\n )\n \n)\n```\n:::\n\n\nWhere `summary_table`, `model_data`, and `plot_data` are functions I've written up myself and saved in a separate script called `functions.R`.\n\nThe advantage of the above is if your data changes, as an example, you just need to change the first target object `path_to_data` to point to the new data file. You can then re-run the whole analysis using `tar_make()` and your resulting summary table, fitted model, and plot will be re-created. This is so much easier than having a script and needing to change `data` to `new_data` for every instance in which it's referred to.\n\nYou can also have an overview of your workflow by running `tar_visnetwork()`.\n\n# Final thoughts\n\nUsing a targets workflow requires you to change the way you approach programming. If you're trained as a statistical programmer, I assume you're used to writing your code in scripts. A targets workflow requires you to write your code as functions instead (otherwise known as \"functional programming\").\n\nIt *is* a steep learning curve and it did take me a while to get my head around why and how I should use a targets workflow. It's a different way of thinking about how to program.\n\nDespite this, I do highly recommend using a targets workflow instead of having multiple R scripts, for all the reasons I've stated above. Utilising this workflow is a step towards reproducible research, which is always great.\n\n# Resources for learning targets\n\nI think the best way to understand why a targets workflow is useful is to actually do a project within a targets workflow. So here are some resources I found helpful:\n\n- The targets R package [user manual](https://books.ropensci.org/targets/) with a quick [walkthrough](https://books.ropensci.org/targets/walkthrough.html) to get you started. I highly recommend this one if you're trying to set up a targets workflow by yourself.\n- For a more hands-on tutorial, I recommend this [Carpentries workshop](https://carpentries-incubator.github.io/targets-workshop/index.html) by Joel Nitta. I really liked that this gave you a bare-bones overview of what a targets workflow looks like. I used this the first time I tried to set up a targets workflow, but I also had the benefit of having someone walk me through using targets.\n\nMore readings on targets:\n\n- The official targets [website](https://docs.ropensci.org/targets/)\n- Within the official website: a list of [examples](https://github.com/ropensci/targets#example-projects) for targets workflows\n", | ||
"supporting": [], | ||
"filters": [ | ||
"rmarkdown/pagebreak.lua" | ||
], | ||
"includes": {}, | ||
"engineDependencies": {}, | ||
"preserve": {}, | ||
"postProcess": true | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,102 @@ | ||
/* Default */ | ||
.Rtable1 table, table.Rtable1 { | ||
font-family: "Arial", Arial, sans-serif; | ||
font-size: 10pt; | ||
border-collapse: collapse; | ||
padding: 0px; | ||
margin: 0px; | ||
/* margin-bottom: 10pt; */ | ||
} | ||
.Rtable1 td { | ||
white-space:nowrap; | ||
} | ||
.Rtable1 th, .Rtable1 td { | ||
border: 0; | ||
text-align: center; | ||
padding: 0.5ex 1.5ex; | ||
margin: 0px; | ||
} | ||
.Rtable1 thead>tr:first-child>th { | ||
border-top: 2pt solid black; | ||
} | ||
.Rtable1 thead>tr:last-child>th { | ||
border-bottom: 1pt solid black; | ||
} | ||
.Rtable1 tbody>tr:last-child>td { | ||
border-bottom: 2pt solid black; | ||
} | ||
.Rtable1 th.grouplabel { | ||
padding-left: 0; | ||
padding-right: 0; | ||
} | ||
.Rtable1 th.grouplabel>div { | ||
margin-left: 1.5ex; | ||
margin-right: 1.5ex; | ||
border-bottom: 1pt solid black; | ||
} | ||
.Rtable1 th.grouplabel:last-child>div { | ||
margin-right: 0; | ||
} | ||
.Rtable1 .rowlabel { | ||
text-align: left; | ||
padding-left: 2.5ex; | ||
} | ||
.Rtable1 .firstrow.rowlabel { | ||
padding-left: 0.5ex; | ||
font-weight: bold; | ||
} | ||
|
||
/* Zebra stripes */ | ||
.Rtable1-zebra tbody tr:nth-child(odd) { | ||
background-color: #eee; | ||
} | ||
|
||
/* Times font */ | ||
table.Rtable1-times { | ||
font-family: "Times New Roman", Times, serif; | ||
} | ||
|
||
/* Shade style */ | ||
.Rtable1-shade th { | ||
background-color: #ccc; | ||
} | ||
|
||
/* Grid style */ | ||
.Rtable1-grid th, .Rtable1-grid td { | ||
border-left: 1pt solid black; | ||
border-right: 1pt solid black; | ||
} | ||
.Rtable1-grid thead>tr:first-child>th { | ||
border-top: 1pt solid black; | ||
} | ||
.Rtable1-grid thead>tr:last-child>th { | ||
border-bottom: 1pt solid black; | ||
} | ||
.Rtable1-grid tbody>tr:last-child>td { | ||
border-bottom: 1pt solid black; | ||
} | ||
.Rtable1-grid .firstrow, .Rtable1-grid .firstrow ~ td { | ||
border-top: 1pt solid black; | ||
} | ||
.Rtable1-grid th.grouplabel>div { | ||
margin-left: 0; | ||
margin-right: 0; | ||
border-bottom: 0; | ||
} | ||
|
||
/* Center style */ | ||
.Rtable1-center td.rowlabel, .Rtable1-center td.firstrow.rowlabel { | ||
font-weight: bold; | ||
text-align: center; | ||
padding: 0.5ex 1.5ex; | ||
} | ||
|
||
/* Footnote */ | ||
.Rtable1 .Rtable1-footnote { | ||
font-size: smaller; | ||
padding: 0px; | ||
margin: 0px; | ||
text-align: left; | ||
white-space: normal; | ||
} | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.