Releases: ropensci/targets
Releases · ropensci/targets
Migrate to {crew} 1.0.0
Speed gains for large pipelines (with many up-to-date targets)
targets 1.10.0
Invalidating changes
These changes invalidate certain targets in a pipeline and cause them to rerun on the next tar_make()
.
- Exclude function signatures from
tar_repository_cas()
output strings to reduce the size of pipeline metadata (#1390). - Exclude function signatures from
tar_format()
output strings to reduce the size of pipeline metadata (#1390).
Summary of performance gains
tar_make()
and tar_outdated()
run much faster in this release. Extensive profiling was done on a real-world simulation pipeline with 66002 up-to-date targets. For tar_make()
using all the default settings:
Machine | Before (seconds) | After (seconds) | Speedup |
---|---|---|---|
M2 Macbook | 413.16 | 35.538 | 11.62587 |
RHEL9 | 450.66 | 94.08 | 4.790 |
And for tar_outdated()
using all the default settings
Machine | Before (seconds) | After (seconds) | Speedup |
---|---|---|---|
M2 Macbook | 91.314 | 16.636 | 5.48894 |
RHEL9 | 167.809 | 37.395 | 4.487472 |
To take advantage of these speed gains for an existing pipeline, you may have to run tar_make()
to convert the time stamps and file sizes to a new format. This initial tar_make()
is slow, but subsequent tar_make()
calls should be much faster than before the upgrade.
Other/specific changes
- Speed up
tar_make()
andtar_outdated()
by avoiding excessive buffering and disk writes for metadata and reporters when the pipeline is just skipping targets. - Use a more lookup-efficient data structure for
tar_runtime$file_info
(#1398). - Fall back on vector aggregation without names (#1401, @guglicap).
- Speed up representation of file sizes in metadata (#1408).
- Add a new
"forecast_interactive"
reporter totar_outdated()
to choose"forecast"
for interactive sessions and"silent"
for non-interactive ones. - Add a new
seconds_reporter_outdated
argument totar_config_set()
with a default of 1 to control the time interval of the reporter oftar_outdated()
and other passive algorithm functions. - Remove target descriptions from the default labels of graph visualizations.
igraph compatibility
targets 1.9.1
Bug fixes
- Allow branch references to contain multi-element
path
vectors with cloud metadata (#1382, @n8layman). - Avoid partial matches in internal code (#1384, @olivroy).
- Add error handling around calls to
ps::ps_disk_partitions()
andps::ps_fs_mount_point()
. - Do not store
_targets/objects/
paths in metadata for CAS repositories (#1391).
Compatibility
- Ensure compatibility with
igraph
>= 2.1.2.
Memory efficiency
targets 1.9.0
Improvements
- Un-break workflows that use
format = "file_fast"
(#1339, @koefoeden). - Fix deadlock in
error = "trim"
(#1340, @koefoeden). - Remove tailored debugging message (#1341, @koefoeden).
- Store warnings while writing to storage (#1345, @Aariq).
- Allow
garbage_collection
to be a non-negative integer to control the frequency of garbage collection in a performant, convenient, unified way (#1351). - Deprecate the
garbage_collection
argument oftar_make()
,tar_make_future()
, andtar_make_clusterm()
(#1351). - Instrument
target_run()
,target_prepare()
, andtarget_conclude()
usingautometric
. - Avoid sending problematic error classes such as
"vctrs_error_subscript_oob"
torlang::abort()
(#1354, @Jiefei-Wang). - Reduce memory consumption by ~23% in large pipelines by avoiding the accumulation of promise objects (#1352).
- Avoid
store_assert_format()
andstore_convert_object()
isstorage
is"none"
. - Add a
list()
method totar_repository_cas()
to make it easier and more efficient to specify custom CAS repositories (#1366). - Improve speed and reduce memory consumption by avoiding deep copies of inner environments of target definition objects (#1368).
- Reduce memory consumption by storing buds and branches as lightweight references when
memory
is"transient"
(#1364). - Replace the
memory
class with the newlookup
class. - Implement
memory = "auto"
to select transient memory for dynamic branches and persistent memory for other targets (#1371). - Omit whole pattern targets from branch subpipelines when possible. Should reduce memory consumption in some cases.
- Omit whole stem targets from branch subpipelines when
retrieval
is"main"
and only a bud is actually used. The same cannot be done with branches because each branch may need to be (un)marshaled individually. - Compress branches into references when
retrieval
is"worker"
and the whole pattern is part of the subpipeline. - Avoid duplicated branch aggregation: just send the branches over the network.
- Back-compatibly switch
format = "qs"
fromqs
toqs2
(#1373). - Add
tar_unblock_process()
.
Potentially invalidating changes
- Add
"keepNA"
and"keepInteger"
to.deparseOpts()
(#1375). This may cause existing pipelines to rerun, but it makes add-ons liketarchetypes::tar_map()
much easier to use.
Content addressable storage
targets 1.8.0
- Wrap
tar_watch()
UI module inbslib::page()
(#1302, @kwbyron-lilly). - Remove
callr_function
intar_make_as_job()
argument list. - Ensure
storage = "worker"
is respected when the process of storing an object generates an error (#1304, @multimeric). - Default to the
_targets.R
pattern intar_branches()
(#1306, @multimeric, @mattwarkentin). - Remove superfluous functions and globals from metadata with
tar_prune()
(#1312, @benzipperer). - Change the default
workspace_on_error
option toTRUE
(#1310, @hadley). - Enhance and organize the
error = "stop"
error message. - Avoid saving a file in
_targets/objects
forerror = "null"
. Instead, switch to a special"null"
storage format class iferror
is"null"
the target throws an error. This should allow users to more freely create new formats withtar_format()
without worrying about how to handleNULL
objects created byerror = "null"
. - Implement
format = "auto"
(#1311, @hadley). - Replace
pingr
dependency withbase::socketConnection()
for local URL utilities (#1317, #1318, @Adafede). - Implement
tar_repository_cas()
,tar_repository_cas_local()
, andtar_repository_cas_local_gc()
for content-addressable storage (#1232, #1314, @noamross). - Add
tar_format_get()
to make implementing CAS systems easier. - Implement
error = "trim"
intar_target()
andtar_option_set()
(#1310, #1311, @hadley). - Use the file system type to decide whether to trust time stamps (#1315, @hadley, @gaborcsardi).
- Deprecate
format = "file_fast"
in favor of the above (#1315). - Deprecate
trust_object_timestamps
in favor of the more unifiedtrust_timestamps
intar_option_set()
(#1315). - Print storage size of each target in verbose reporters (#1337, @psychelzh).
- Combine help files of
tar_target()
andtar_target_raw()
. Same withtar_load()
andtar_load_raw()
. - Add a
substitute
argument totar_format()
to make it easier to write custom storage formats without metaprogramming.
bslib and speed
targets 1.7.1
- Use
bslib
intar_watch()
. - Speed up
target_upstream_edges()
andpipeline_upstream_edges()
by avoiding data frames until the last minute (17% speedup for certain kinds of large pipelines). - Automatically set
as_job
toFALSE
intar_make()
ifrstudioapi
and/or RStudio is not available.
secretbase
targets 1.7.0
Invalidating changes
- Use
secretbase::siphash13()
instead ofdigest(algo = "xxhash64", serializationVersion = 3)
so hashes of in-memory objects no longer depend on serialization version 3 headers (#1244, @shikokuchuo). Unfortunately, pipelines built with earlier versions oftargets
will need to rerun.
Other improvements
- Ensure patterns marshal properly (#1266, #1264, njtierney/geotargets#52, @Aariq, @njtierney).
- Inform and prompt the user when the pipeline was built with an old version of
targets
and changes to the package will cause the current work to rerun (#1244). For thetar_make*()
functions,utils::menu()
prompts the user to give people a chance to downgrade if necessary. - For type safety in the internal database class, read all columns as character vectors in
data.table::fread()
, then convert them to the correct types afterwards. - Add a new
tar_resources_custom_format()
function which can pass environment variables to customize the behavior of customtar_format()
storage formats (#1263, #1232, @Aariq, @noamross). - Only marshal dependencies if actually sending the target to a parallel worker.
Custom descriptions
targets 1.6.0
- Modernize
extras
intar_renv()
. tar_target()
gains adescription
argument for free-form text describing what the target is about (#1230, #1235, #1236, @tjmahr).tar_visnetwork()
,tar_glimpse()
,tar_network()
,tar_mermaid()
, andtar_manifest()
now optionally show target descriptions (#1230, #1235, #1236, @tjmahr).tar_described_as()
is a new wrapper aroundtidyselect::any_of()
to select specific subsets of targets based on the description rather than the name (#1136, #1196, @noamross, @mattmoo).- Fix the documentation of the
names
argument (nudge users towardtidyselect
expressions). - Make assertions on the pipeline process more robust (to check if two processes are trying to access the same data store).
CRAN patch
targets 1.5.1
- Avoid
arrow
-related CRAN check NOTE. use_targets()
only writes the_targets.R
script. Therun.sh
andrun.R
scripts are superseded by theas_job
argument oftar_make()
. Users not using the RStudio IDE can calltar_make()
withcallr_function = callr::r_bg
to run the pipeline as a background process.tar_make_clustermq()
andtar_make_future()
are superseded in favortar_make(use_crwe = TRUE)
, so template files are no longer written for the former automatically.
Small fixes
targets 1.4.1
- Print "errored pipeline" when at least one target errors.
- Bump minimum
clustermq
version to 0.9.2. - Repair the
tar_debug_instructions()
tips for when commands are long. - Do not look for dependencies of primitive functions (#1200, @smwindecker, @joelnitta).