-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
On-demand branch creation #1364
Comments
Bud targets probably also need a similar treatment. |
Actually, I think it's more efficient to just go with the original serialization idea proposed in #1352. Even after subtracting pedigree creation time, it is much faster to deserialize a branch on demand than to create one from scratch. command <- command_init()
settings <- settings_init()
cue <- cue_init()
value <- value_init()
branch <- branch_init(command, settings, cue, value)
serialized_branch_high <- qs::qserialize(branch, preset = "high")
serialized_branch_balanced <- qs::qserialize(branch, preset = "balanced")
serialized_branch_fast <- qs::qserialize(branch, preset = "fast")
microbenchmark(
create_branch = branch_init(command, settings, cue, value),
create_pedigree = pedigree_new(parent = branch$settings$name, index = 1L),
deserialize_high = qs::qdeserialize(serialized_branch_high),
deserialize_balanced = qs::qdeserialize(serialized_branch_balanced),
deserialize_fast = qs::qdeserialize(serialized_branch_fast),
times = 1e4,
control = list(warmup = 100)
)
#> Unit: microseconds
#> expr min lq mean median uq max neval cld
#> create_branch 56.621 58.876 68.514920 60.721 64.9235 8143.133 10000 a
#> create_pedigree 1.558 2.009 2.270166 2.173 2.2960 34.440 10000 b
#> deserialize_high 17.917 18.942 27.222868 21.689 34.3170 7516.366 10000 c
#> deserialize_balanced 14.801 15.785 23.145619 18.204 30.8730 403.645 10000 d
#> deserialize_fast 14.760 15.662 22.881981 17.917 30.7090 204.795 10000 d |
and to summarize the storage sizes in the various options: command <- command_init()
settings <- settings_init(name = "target_name")
cue <- cue_init()
value <- value_init()
branch <- branch_init(command, settings, cue, value, index = 1L)
serialized_branch_high <- qs::qserialize(branch, preset = "high")
serialized_branch_balanced <- qs::qserialize(branch, preset = "balanced")
serialized_branch_fast <- qs::qserialize(branch, preset = "fast")
library(lobstr)
obj_size(qs::qserialize(branch$pedigree))
#> 176 B
obj_size(branch$pedigree)
#> 456 B
obj_size(serialized_branch_high)
#> 680 B
obj_size(serialized_branch_balanced)
#> 840 B
obj_size(serialized_branch_fast)
#> 1.17 kB
obj_size(branch)
#> 9.54 kB The "high" present on the branch looks like a good tradeoff (#1365). |
After optimizing with #1368, branch creation got much faster. Also, it will be much easier now to create branches on demand and store only lightweight references whenever possible. I will need to refactor the junction class and add fancy checking to target <- tar_target(y, x, pattern = map(x))
name <- "x_branch"
command <- target$command
store <- target$store
cue <- target$cue
settings <- target$settings
index <- 1L
deps_parent <- character(0L)
deps_child <- character(0L)
branch <- branch_init(
name = name,
command = command,
deps_parent = deps_parent,
deps_child = deps_child,
settings = settings,
cue = cue,
store = store,
index = index
)
serialized_branch_high <- qs::qserialize(branch, preset = "high")
serialized_branch_balanced <- qs::qserialize(branch, preset = "balanced")
serialized_branch_fast <- qs::qserialize(branch, preset = "fast")
microbenchmark::microbenchmark(
create_branch = branch_init(
name = name,
command = command,
deps_parent = deps_parent,
deps_child = deps_child,
settings = settings,
cue = cue,
store = store,
index = index
),
deserialize_high = qs::qdeserialize(serialized_branch_high),
deserialize_balanced = qs::qdeserialize(serialized_branch_balanced),
deserialize_fast = qs::qdeserialize(serialized_branch_fast),
times = 1e4,
control = list(warmup = 100)
)
#> Unit: microseconds
#> expr min lq mean median uq max neval cld
#> create_branch 15.170 16.400 19.10258 16.974 18.204 5561.978 10000 a
#> deserialize_high 17.835 19.024 26.08239 20.623 28.618 6385.299 10000 b
#> deserialize_balanced 14.678 15.785 21.73968 16.851 25.092 6344.299 10000 a
#> deserialize_fast 14.555 15.662 20.98209 16.687 24.928 5613.187 10000 a |
Notes to self on the next steps for the implementation:
|
|
In 23652fd (branch |
I also need to go into the pattern and stem classes and make sure they store references and not whole targets in the pipeline object when they create branches and buds. |
83b706c converts branches and buds to and from lightweight references using the existing machinery of |
branch |
Implemented in #1370 |
Dynamic branches take up a lot of memory in the main session of a large pipeline. Instead of a full branch object, it may be possible to store a lightweight reference to the branch until it is actually needed. If this works, we may see a large reduction in memory consumption.
The text was updated successfully, but these errors were encountered: