You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently NGLess uses two stages for execution. A first stage verifies that the script and output files are consistent (equivalent to --validate-only) and a second stage where computation happens if the first stage finishes successfully.
However, the current implementation performs downloads, indexing and computation during the same (second) stage.
If using the parallel module, this can lead to jobs waiting on each other for significant amounts of time. This happens during indexing and initialization of internal and external modules, as well as, during downloads, leading to failures or delays due to connectivity problems or slow networks speeds.
For example, mapping to hg19 only downloads and indexes the files when the map() step is reached for the first time.
This limitation often leads to workflows that follow a run one sample first and if it finishes run all others approach.
If implementing a staged execution, an ngless workflow could look like:
# (run once) Ensure ngless is correctly installed
ngless --check-install
# (run once) Check that the script is valid and inputs/outputs are as expected
ngless --validate-only script.ngl
# (run once/multiple) Download and index all dependencies (references, resources from internal modules, initialization of external modules, indexing, etc...)
ngless --ensure-dependencies script.ngl
# (run once/multiple) Interpret the script, possibly in parallel
ngless script.ngl
An advantage of --ensure-dependencies is that resources could be downloaded, indexed, ... in parallel, something which currently happens sequentially.
Additionally, execution of script.ngl would have predictable behavior for a user regardless of being the first time the command is being executed.
This issue is also in line with #71 which proposes a setup phase for external modules. Such phase would also run during --ensure-dependencies.
The text was updated successfully, but these errors were encountered:
Currently NGLess uses two stages for execution. A first stage verifies that the script and output files are consistent (equivalent to
--validate-only
) and a second stage where computation happens if the first stage finishes successfully.However, the current implementation performs downloads, indexing and computation during the same (second) stage.
If using the
parallel
module, this can lead to jobs waiting on each other for significant amounts of time. This happens during indexing and initialization of internal and external modules, as well as, during downloads, leading to failures or delays due to connectivity problems or slow networks speeds.For example, mapping to
hg19
only downloads and indexes the files when themap()
step is reached for the first time.This limitation often leads to workflows that follow a
run one sample first and if it finishes run all others
approach.If implementing a staged execution, an ngless workflow could look like:
An advantage of
--ensure-dependencies
is that resources could be downloaded, indexed, ... in parallel, something which currently happens sequentially.Additionally, execution of
script.ngl
would have predictable behavior for a user regardless of being the first time the command is being executed.This issue is also in line with #71 which proposes a
setup
phase for external modules. Such phase would also run during--ensure-dependencies
.The text was updated successfully, but these errors were encountered: