-
Notifications
You must be signed in to change notification settings - Fork 28
/
drake.qmd
262 lines (206 loc) · 28.4 KB
/
drake.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
---
execute:
freeze: auto
---
# What about `drake`? {#drake}
```{r, message = FALSE, warning = FALSE, echo = FALSE}
knitr::opts_knit$set(root.dir = fs::dir_create(tempfile()))
knitr::opts_chunk$set(collapse = TRUE, comment = "#>", eval = TRUE)
```
<a href="https://www.tidyverse.org/lifecycle/#superseded"><img src="https://img.shields.io/badge/lifecycle-superseded-blue.svg" alt='superseded lifecycle'></a>
[`targets`](https://github.com/ropensci/targets) is the successor of [`drake`](https://github.com/ropensci/drake), an older pipeline tool. As of 2021-01-21, [`drake`](https://github.com/ropensci/drake) is [superseded](https://lifecycle.r-lib.org/articles/stages.html#superseded), which means there are no plans for new features or discretionary enhancements, but basic maintenance and support will continue indefinitely. Existing projects that use [`drake`](https://github.com/ropensci/drake) can safely continue to use [`drake`](https://github.com/ropensci/drake), and there is no need to retrofit [`targets`](https://github.com/ropensci/targets). New projects should use [`targets`](https://github.com/ropensci/targets) because it is friendlier and more robust.
## Why is `drake` superseded?
Nearly four years of community feedback have exposed major user-side limitations regarding data management, collaboration, dynamic branching, and parallel efficiency. Unfortunately, these limitations are permanent. Solutions in [`drake`](https://github.com/ropensci/drake) itself would make the package incompatible with existing projects that use it, and the internal architecture is too copious, elaborate, and mature for such extreme refactoring. That is why [`targets`](https://github.com/ropensci/targets) was created. The [`targets`](https://github.com/ropensci/targets) package borrows from past learnings, user suggestions, discussions, complaints, success stories, and feature requests, and it improves the user experience in ways that will never be possible in [`drake`](https://github.com/ropensci/drake).
## Transitioning to `targets`
If you know [`drake`](https://github.com/ropensci/drake), then you already almost know `targets`. The programming style is similar, and most functions in `targets` have counterparts in [`drake`](https://github.com/ropensci/drake).
Functions in `drake`| Counterparts in `targets`
---|---
[`use_drake()`](https://docs.ropensci.org/drake/reference/use_drake.html), [`drake_script()`](https://docs.ropensci.org/drake/reference/drake_script.html) | [`tar_script()`](https://docs.ropensci.org/targets/reference/tar_script.html)
[`drake_plan()`](https://docs.ropensci.org/drake/reference/drake_plan.html) | [`tar_manifest()`](https://docs.ropensci.org/targets/reference/tar_manifest.html), [`tarchetypes::tar_plan()`](https://docs.ropensci.org/tarchetypes/reference/tar_plan.html)
[`target()`](https://docs.ropensci.org/drake/reference/target.html) | [`tar_target()`](https://docs.ropensci.org/targets/reference/tar_target.html), [`tar_target_raw()`](https://docs.ropensci.org/targets/reference/tar_target_raw.html)
[`drake_config()`](https://docs.ropensci.org/drake/reference/drake_config.html) | [`tar_option_set()`](https://docs.ropensci.org/targets/reference/tar_option_set.html)
[`outdated()`](https://docs.ropensci.org/drake/reference/outdated.html), [`r_outdated()`](https://docs.ropensci.org/drake/reference/r_make.html) | [`tar_outdated()`](https://docs.ropensci.org/targets/reference/tar_outdated.html)
[`vis_drake_graph()`](https://docs.ropensci.org/drake/reference/vis_drake_graph.html), [`r_vis_drake_graph()`](https://docs.ropensci.org/drake/reference/r_make.html) | [`tar_visnetwork()`](https://docs.ropensci.org/targets/reference/tar_visnetwork.html), [`tar_glimpse()`](https://docs.ropensci.org/targets/reference/tar_glimpse.html)
[`drake_graph_info()`](https://docs.ropensci.org/drake/reference/drake_graph_info.html), [`r_drake_graph_info()`](https://docs.ropensci.org/drake/reference/r_make.html) | [`tar_network()`](https://docs.ropensci.org/targets/reference/tar_network.html)
[`make()`](https://docs.ropensci.org/drake/reference/make.html), [`r_make()`](https://docs.ropensci.org/drake/reference/r_make.html) | [`tar_make()`](https://docs.ropensci.org/targets/reference/tar_make.html), [`tar_make_clustermq()`](https://docs.ropensci.org/targets/reference/tar_make_clustermq.html), [`tar_make_future()`](https://docs.ropensci.org/targets/reference/tar_make_future.html)
[`loadd()`](https://docs.ropensci.org/drake/reference/readd.html) | [`tar_load()`](https://docs.ropensci.org/targets/reference/tar_load.html)
[`readd()`](https://docs.ropensci.org/drake/reference/readd.html) | [`tar_read()`](https://docs.ropensci.org/targets/reference/tar_read.html)
[`diagnose()`](https://docs.ropensci.org/drake/reference/diagnose.html), [`build_times()`](https://docs.ropensci.org/drake/reference/build_times.html), [`cached()`](https://docs.ropensci.org/drake/reference/cached.html), [`drake_cache_log()`](https://docs.ropensci.org/drake/reference/drake_cache_log.html) | [`tar_meta()`](https://docs.ropensci.org/targets/reference/tar_meta.html)
[`drake_progress()`](https://docs.ropensci.org/drake/reference/drake_progress.html), [`drake_running()`](https://docs.ropensci.org/drake/reference/drake_running.html), [`drake_done()`](https://docs.ropensci.org/drake/reference/drake_done.html), [`drake_failed()`](https://docs.ropensci.org/drake/reference/drake_failed.html), [`drake_cancelled()`](https://docs.ropensci.org/drake/reference/drake_cancelled.html) | [`tar_progress()`](https://docs.ropensci.org/targets/reference/tar_progress.html)
[`clean()`](https://docs.ropensci.org/drake/reference/clean.html) | [`tar_deduplicate()`](https://docs.ropensci.org/targets/reference/tar_deduplicate.html), [`tar_delete()`](https://docs.ropensci.org/targets/reference/tar_delete.html), [`tar_destroy()`](https://docs.ropensci.org/targets/reference/tar_destroy.html), [`tar_invalidate()`](https://docs.ropensci.org/targets/reference/tar_invalidate.html)
[`drake_gc()`](https://docs.ropensci.org/drake/reference/drake_gc.html) | [`tar_prune()`](https://docs.ropensci.org/targets/reference/tar_prune.html)
[`id_chr()`](https://docs.ropensci.org/drake/reference/id_chr.html) | [`tar_name()`](https://docs.ropensci.org/targets/reference/tar_name.html), [`tar_path()`](https://docs.ropensci.org/targets/reference/tar_path.html)
[`knitr_in()`](https://docs.ropensci.org/drake/reference/knitr_in.html) | [`tarchetypes::tar_render()`](https://docs.ropensci.org/tarchetypes/reference/tar_render.html)
[`cancel()`](https://docs.ropensci.org/drake/reference/cancel.html), [`cancel_if()`](https://docs.ropensci.org/drake/reference/cancel_if.html) | [`tar_cancel()`](https://docs.ropensci.org/targets/reference/tar_cancel.html)
[`trigger()`](https://docs.ropensci.org/drake/reference/trigger.html) | [`tar_cue()`](https://docs.ropensci.org/targets/reference/tar_cue.html)
[`drake_example()`](https://docs.ropensci.org/drake/reference/drake_example.html), [`drake_example()`](https://docs.ropensci.org/drake/reference/drake_examples.html), [`load_mtcars_example()`](https://docs.ropensci.org/drake/reference/load_mtcars_example.html), [`clean_mtcars_example()`](https://docs.ropensci.org/drake/reference/clean_mtcars_example.html) | Unsupported. Example `targets` pipelines are in individual repositories [linked from here](https://docs.ropensci.org/targets/index.html#examples).
[`drake_build()`](https://docs.ropensci.org/drake/reference/drake_build.html) | Unsupported in `targets` to ensure coherence with dynamic branching.
[`drake_debug()`](https://docs.ropensci.org/drake/reference/drake_debug.html) | See the [debugging chapter](#debug).
[`drake_history()`](https://docs.ropensci.org/drake/reference/drake_history.html), [`recoverable()`](https://docs.ropensci.org/drake/reference/recoverable.html) | Unsupported in `targets`. Instead of trying to manage history and data recovery directly, [`targets`](https://github.com/ropensci/targets) maintains a much lighter/friendlier data store to make it easier to use external data versioning tools instead.
[`missed()`](https://docs.ropensci.org/drake/reference/missed.html), [`tracked()`](https://docs.ropensci.org/drake/reference/tracked.html), [`deps_code()`](https://docs.ropensci.org/drake/reference/deps_code.html), [`deps_target()`](https://docs.ropensci.org/drake/reference/deps_target.html), [`deps_knitr()`](https://docs.ropensci.org/drake/reference/deps_knitr.html), [`deps_profile()`](https://docs.ropensci.org/drake/reference/deps_profile.html) | Unsupported in `targets` because dependency detection is [easier to understand](#targets) than in `drake`.
[`drake_hpc_template_file()`](https://docs.ropensci.org/drake/reference/drake_hpc_template_file.html), [`drake_hpc_template_files()`](https://docs.ropensci.org/drake/reference/drake_hpc_template_files.html) | Deemed out of scope for `targets`.
[`drake_cache()`](https://docs.ropensci.org/drake/reference/drake_cache.html), [`new_cache()`](https://docs.ropensci.org/drake/reference/new_cache.html), [`find_cache()`](https://docs.ropensci.org/drake/reference/find_cache.html). | Unsupported because [`targets`](https://github.com/ropensci/targets) is far more strict and paternalistic about data/file management.
[`rescue_cache()`](https://docs.ropensci.org/drake/reference/rescue_cache.html), [`which_clean()`](https://docs.ropensci.org/drake/reference/which_clean.html), [`cache_planned()`](https://docs.ropensci.org/drake/reference/cached_planned.html), [`cache_unplanned()`](https://docs.ropensci.org/drake/reference/cached_unplanned.html) | Unsupported due to the simplified data management system and storage cleaning functions.
[`drake_get_session_info()`](https://docs.ropensci.org/drake/reference/drake_get_session_info.html) | Deemed superfluous and a potential bottleneck. Discarded for `targets`.
[`read_drake_seed()`](https://docs.ropensci.org/drake/reference/read_drake_seed.html) | Superfluous because `targets` always uses the same global seed. [`tar_meta()`](https://docs.ropensci.org/targets/reference/tar_meta.html) shows all the target-level seeds.
[`show_source()`](https://docs.ropensci.org/drake/reference/show_source.html) | Deemed superfluous. Discarded in `targets` to conserve storage space in `_targets/meta/meta`.
[`drake_tempfile()`](https://docs.ropensci.org/drake/reference/drake_tempfile.html) | Superfluous in `targets` because there is no special `disk.frame` storage format. ([Dynamic file targets](https://books.ropensci.org/targets/files.html#external-output-files) are much better for managing `disk.frame`s.)
[`file_store()`](https://docs.ropensci.org/drake/reference/file_store.html) | Superfluous in `targets` because [all files are dynamic files](https://books.ropensci.org/targets/files.html) and there is no longer a need to Base32-encode any file names.
Likewise, many `make()` arguments have equivalent arguments elsewhere.
Argument of `drake::make()`| Counterparts in `targets`
---|---
`targets` | `names` in [`tar_make()`](https://docs.ropensci.org/targets/reference/tar_make.html) etc.
`envir` | `envir` in [`tar_option_set()`](https://docs.ropensci.org/targets/reference/tar_option_set.html)
`verbose` | `reporter` in [`tar_make()`](https://docs.ropensci.org/targets/reference/tar_make.html) etc.
`parallelism` | Choice of function: [`tar_make()`](https://docs.ropensci.org/targets/reference/tar_make.html) vs [`tar_make_clustermq()`](https://docs.ropensci.org/targets/reference/tar_make_clustermq.html) vs [`tar_make_future()`](https://docs.ropensci.org/targets/reference/tar_make_future.html)
`jobs` | `workers` in [`tar_make_clustermq()`](https://docs.ropensci.org/targets/reference/tar_make_clustermq.html) and [`tar_make_future()`](https://docs.ropensci.org/targets/reference/tar_make_future.html)
`packages` | `packages` in [`tar_target()`](https://docs.ropensci.org/targets/reference/tar_target.html) and [`tar_option_set()`](https://docs.ropensci.org/targets/reference/tar_option_set.html)
`lib_loc` | `library` in [`tar_target()`](https://docs.ropensci.org/targets/reference/tar_target.html) and [`tar_option_set()`](https://docs.ropensci.org/targets/reference/tar_option_set.html)
`trigger` | `cue` in [`tar_target()`](https://docs.ropensci.org/targets/reference/tar_target.html) and [`tar_option_set()`](https://docs.ropensci.org/targets/reference/tar_option_set.html)
`caching` | `storage` and `retrieval` in [`tar_target()`](https://docs.ropensci.org/targets/reference/tar_target.html) and [`tar_option_set()`](https://docs.ropensci.org/targets/reference/tar_option_set.html)
`keep_going` | `error` in [`tar_target()`](https://docs.ropensci.org/targets/reference/tar_target.html) and [`tar_option_set()`](https://docs.ropensci.org/targets/reference/tar_option_set.html)
`memory_strategy` | `memory` in [`tar_target()`](https://docs.ropensci.org/targets/reference/tar_target.html) and [`tar_option_set()`](https://docs.ropensci.org/targets/reference/tar_option_set.html)
`garbage_collection` | `garbage_collection` in [`tar_target()`](https://docs.ropensci.org/targets/reference/tar_target.html) and [`tar_option_set()`](https://docs.ropensci.org/targets/reference/tar_option_set.html)
`template` | `resources` in [`tar_target()`](https://docs.ropensci.org/targets/reference/tar_target.html) and [`tar_option_set()`](https://docs.ropensci.org/targets/reference/tar_option_set.html), along with helpers like [`tar_resources()`](https://docs.ropensci.org/targets/reference/tar_resources.html).
`curl_handles` | `handle` element of `resources` argument of [`tar_target()`](https://docs.ropensci.org/targets/reference/tar_target.html) and [`tar_option_set()`](https://docs.ropensci.org/targets/reference/tar_option_set.html)
`format` | `format` in [`tar_target()`](https://docs.ropensci.org/targets/reference/tar_target.html) and [`tar_option_set()`](https://docs.ropensci.org/targets/reference/tar_option_set.html)
`seed` | Superfluous because targets always uses the same global seed. [`tar_meta()`] shows all the target-level seeds.
In addition, many [optional columns of `drake` plans](https://books.ropensci.org/drake/plans.html#special-columns) are expressed differently in `targets`.
[Optional column of `drake` plans](https://books.ropensci.org/drake/plans.html#special-columns) | Feature in `targets`
---|---
`format` | `format` argument of [`tar_target()`](https://docs.ropensci.org/targets/reference/tar_target.html) and [`tar_option_set()`](https://docs.ropensci.org/targets/reference/tar_option_set.html)
`dynamic` | `pattern` argument of [`tar_target()`](https://docs.ropensci.org/targets/reference/tar_target.html) and [`tar_option_set()`](https://docs.ropensci.org/targets/reference/tar_option_set.html)
`transform` | [static branching functions in `tarchetypes`](https://docs.ropensci.org/tarchetypes/reference/index.html#section-branching) such as [`tar_map()`](https://docs.ropensci.org/tarchetypes/reference/tar_map.html) and [`tar_combine()`](https://docs.ropensci.org/tarchetypes/reference/tar_combine.html)
`trigger` | `cue` argument of [`tar_target()`](https://docs.ropensci.org/targets/reference/tar_target.html) and [`tar_option_set()`](https://docs.ropensci.org/targets/reference/tar_option_set.html)
`hpc` | `deployment` argument of [`tar_target()`](https://docs.ropensci.org/targets/reference/tar_target.html) and [`tar_option_set()`](https://docs.ropensci.org/targets/reference/tar_option_set.html)
`resources` | `resources` argument of [`tar_target()`](https://docs.ropensci.org/targets/reference/tar_target.html) and [`tar_option_set()`](https://docs.ropensci.org/targets/reference/tar_option_set.html)
`caching` | `storage` and `retrieval` arguments of [`tar_target()`](https://docs.ropensci.org/targets/reference/tar_target.html) and [`tar_option_set()`](https://docs.ropensci.org/targets/reference/tar_option_set.html)
## Advantages of `targets` over `drake`
### Better guardrails by design
[`drake`](https://github.com/ropensci/drake) leaves ample room for user-side mistakes, and some of these mistakes require extra awareness or advanced knowledge of R to consistently avoid. The example behaviors below are too systemic to solve and still preserve back-compatibility.
1. By default, `make()` looks for functions and global objects in the parent environment of the calling R session. Because the global environment is often old and stale in practical situations, which causes targets to become incorrectly invalidated. Users need to remember to restart the session before calling `make()`. The issue is [discussed here](https://github.com/ropensci/drake/issues/761), and the discussion led to functions like `r_make()` which always create a fresh session to do the work. However, `r_make()` is not a complete replacement for `make()`, and beginner users still run into the original problems.
1. Similar to the above, `make()` does not find the intended functions and global objects if it is called in a different environment. Edge cases like [this one](https://github.com/ropensci/drake/issues/874) and [this one](https://github.com/ropensci/drake/issues/527) continue to surprise users.
1. [`drake`](https://github.com/ropensci/drake) is extremely flexible about the location of the `.drake/` cache. When a user calls `readd()`, `loadd()`, `make()`, and similar functions, [`drake`](https://github.com/ropensci/drake) searches up through the parent directories until it finds a `.drake/` folder. This flexibility seldom helps, and it creates uncertainty and inconsistency when it comes to initializing and accessing projects, especially if there are multiple projects with nested file systems.
The [`targets`](https://github.com/ropensci/targets) package solves all these issues by design. Functions `tar_make()`, `tar_make_clustermq()`, and `tar_make_future()` all create fresh new R sessions by default. They all require a `_targets.R` configuration file in the project root (working directory of the `tar_make()` call) so that the functions, global objects, and settings are all populated in the exact same way each session, leading to less frustration, greater consistency, and greater reproducibility. In addition, the `_targets/` data store always lives in the project root.
### Enhanced debugging support
[`targets`](https://github.com/ropensci/targets) has enhanced debugging support. With the `workspaces` argument to `tar_option_set()`, users can locally recreate the conditions under which a target runs. This includes packages, global functions and objects, and the random number generator seed. Similarly, `tar_option_set(error = "workspace")` automatically saves debugging workspaces for targets that encounter errors. The `debug` option lets users enter an interactive debugger for a given target while the pipeline is running. And unlike `drake`, all debugging features are fully compatible with dynamic branching.
### Improved tracking of package functions
By default, [`targets`](https://github.com/ropensci/targets) ignores changes to functions inside external packages. However, if a workflow centers on a custom package with methodology under development, users can make [`targets`](https://github.com/ropensci/targets) automatically watch the package's functions for changes. Simply supply the names of the relevant packages to the `imports` argument of `tar_option_set()`. Unlike [`drake`](https://github.com/ropensci/drake), [`targets`](https://github.com/ropensci/targets) can track multiple packages this way, and the internal mechanism is much safer.
### Lighter, friendlier data management
[`drake`](https://github.com/ropensci/drake)'s cache is an intricate file system in a hidden `.drake` folder. It contains multiple files for each target, and those names are not informative. (See the files in the `data/` folder in the diagram below.) Users often have trouble understanding how [`drake`](https://github.com/ropensci/drake) manages data, resolving problems when files are corrupted, placing the data under version control, collaborating with others on the same pipeline, and clearing out superfluous data when the cache grows large in storage.
```
.drake/
├── config/
├── data/
├───── 17bfcef645301416.rds
├───── 21935c86f12692e2.rds
├───── 37caf5df2892cfc4.rds
├───── ...
├── drake/
├───── history/
├───── return/
├───── tmp/
├── keys/ # A surprisingly large number of tiny text files live here.
├───── memoize/
├───── meta/
├───── objects/
├───── progress/
├───── recover/
├───── session/
└── scratch/ # This folder should be temporary, but it gets egregiously large.
```
The [`targets`](https://github.com/ropensci/targets) takes a friendlier, more transparent, less mysterious approach to data management. Its data store is a visible `_targets` folder, and it contains far fewer files: a spreadsheet of metadata, a spreadsheet of target progress, and one informatively named data file for each target. It is much easier to understand the data management process, identify and diagnose problems, place projects under version control, and avoid consuming unnecessary storage resources. Sketch:
```
_targets/
├── meta/
├───── meta
├───── process
├───── progress
├── objects/
├───── target_name_1
├───── target_name_2
├───── target_name_3
├───── ...
├── scratch/ # tar_make() deletes this folder after it finishes.
└── user/ # gittargets users can put custom files here for data version control.
```
### Cloud storage
Thanks to the simplified data store and simplified internals, [`targets`](https://github.com/ropensci/targets) can automatically upload data to the Amazon S3 bucket of your choice. Simply configure [`aws.s3`](https://github.com/cloudyr/aws.s3), create a bucket, and select one of the AWS-powered storage formats. Then, [`targets`](https://github.com/ropensci/targets) will automatically upload the return values to the cloud.
```{r, eval = FALSE}
# _targets.R
tar_option_set(resources = list(bucket = "my-bucket-name"))
list(
tar_target(dataset, get_large_dataset(), format = "aws_fst_tbl"),
tar_target(analysis, analyze_dataset(dataset), format = "aws_qs")
)
```
Data retrieval is still super easy.
```{r, eval = FALSE}
tar_read(dataset)
```
### Show status of functions and global objects
[`drake`](https://github.com/ropensci/drake) has several utilities that inform users which targets are up to date and which need to rerun. However, those utilities are limited by how [`drake`](https://github.com/ropensci/drake) manages functions and other global objects. Whenever [`drake`](https://github.com/ropensci/drake) inspects globals, it stores their values in its cache and loses track of their previous state from the last run of the pipeline. As a result, it has trouble informing users exactly why a given target is out of date. And because the system for tracking global objects is tightly coupled with the cache, this limitation is permanent.
In [`targets`](https://github.com/ropensci/targets), the metadata management system only updates information on global objects when the pipeline actually runs. This makes it possible to understand which specific changes to your code could have invalided your targets. In large projects with long runtimes, this feature contributes significantly to reproducibility and peace of mind.
![](./man/figures/drake-graph.png)
### Dynamic branching with `dplyr::group_by()`
[Dynamic branching](https://books.ropensci.org/drake/dynamic.html) was an architecturally difficult fit in [`drake`](https://github.com/ropensci/drake), and it can only support one single ([`vctrs`](https://github.com/r-lib/vctrs)-based) method of slicing and aggregation for processing sub-targets. This limitation has frustrated members of the community, as discussed [here](https://github.com/ropensci/drake/issues/1087) and [here](https://github.com/ropensci/drake/issues/1170).
[`targets`](https://github.com/ropensci/targets), on the other hand, is more flexible regarding slicing and aggregation. When it branches over an object, it can iterate over vectors, lists, and even data frames grouped with `dplyr::group_by()`. To branch over chunks of a data frame, our data frame target needs to have a special `tar_group` column. We can create this column in our target's return value with the `tar_group()` function.
```{r, eval = TRUE}
library(dplyr)
library(targets)
library(tarchetypes)
library(tibble)
tibble(
x = seq_len(6),
id = rep(letters[seq_len(3)], each = 2)
) %>%
group_by(id) %>%
tar_group()
```
Our actual target has the command above and `iteration = "group"`.
```{r, eval = FALSE}
tar_target(
data,
tibble(
x = seq_len(6),
id = rep(letters[seq_len(3)], each = 2)
) %>%
group_by(id) %>%
tar_group(),
iteration = "group"
)
```
Now, any target that maps over `data` is going to define one branch for each group in the data frame. The following target creates three branches when run in a pipeline: one returning 3, one returning 7, and one returning 11.
```{r, eval = FALSE}
tar_target(
sums,
sum(data$x),
pattern = map(data)
)
```
### Composable dynamic branching
Because the design of [`targets`](https://github.com/ropensci/targets) is fundamentally dynamic, users can create complicated dynamic branching patterns that are never going to be possible in `drake`. Below, target `z` creates six branches, one for each combination of `w` and tuple (`x`, `y`). The pattern `cross(w, map(x, y))` is equivalent to `tidyr::crossing(w, tidyr::nesting(x, y))`.
```{r, eval = FALSE}
# _targets.R
library(targets)
library(tarchetypes)
list(
tar_target(w, seq_len(2)),
tar_target(x, head(letters, 3)),
tar_target(y, head(LETTERS, 3)),
tar_target(
z,
data.frame(w = w, x = x, y = y),
pattern = cross(w, map(x, y))
)
)
```
Thanks to [`glep`](https://github.com/glep) and [`djbirke`](https://github.com/djbirke) on GitHub for the idea.
### Improved parallel efficiency
[Dynamic branching](https://books.ropensci.org/drake/dynamic.html) in [`drake`](https://github.com/ropensci/drake) is staged. In other words, all the sub-targets of a dynamic target must complete before the pipeline moves on to downstream targets. The diagram below illustrates this behavior in a pipeline with a dynamic target B that maps over another dynamic target A. For thousands of dynamic sub-targets with highly variable runtimes, this behavior consumes unnecessary runtime and computing resources. And because [`drake`](https://github.com/ropensci/drake)'s architecture was designed at a fundamental level for static branching only, this limitation is permanent.
![](./man/figures/drake-dynamic-drake.png)
By contrast, the internal data structures in [`targets`](https://github.com/ropensci/targets) are dynamic by design, which allows for a dynamic branching model with more flexibility and parallel efficiency. Branches can always start as soon as their upstream dependencies complete, even if some of those upstream dependencies are branches. This behavior reduces runtime and reduces consumption of computing resources.
![](./man/figures/drake-dynamic-targets.png)
### Metaprogramming
In [`drake`](https://github.com/ropensci/drake), pipelines are defined with the `drake_plan()` function. `drake_plan()` supports an elaborate domain specific language that diffuses user-supplied R expressions. This makes it convenient to assign commands to targets in the vast majority of cases, but it also obstructs custom metaprogramming by users ([example here](https://github.com/ropensci/drake/issues/1251)). Granted, it is possible to completely circumvent `drake_plan()` and create the whole data frame from scratch, but this is hardly ideal and seldom done in practice.
The [`targets`](https://github.com/ropensci/targets) package tries to make customization easier. Relative to [`drake`](https://github.com/ropensci/drake), [`targets`](https://github.com/ropensci/targets) takes a decentralized approach to setting up pipelines, moving as much custom configuration as possible to the target level rather than the whole pipeline level. In addition, the `tar_target_raw()` function avoids non-standard evaluation while mirroring `tar_target()` in all other respects. All this makes it much easier to create [custom metaprogrammed pipelines](https://books.ropensci.org/targets/static.html#metaprogram) and [target archetypes](https://github.com/ropensci/tarchetypes) while avoiding an elaborate domain specific language for [static branching](https://books.ropensci.org/drake/static.html), which was extremely difficult to understand and error prone in [`drake`](https://github.com/ropensci/drake). The [R Targetopia](https://wlandau.github.io/targetopia.html) is an emerging ecosystem of workflow frameworks that take full advantage of this customization and democratize reproducible pipelines.