You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Refactor CWLJob.run() to return (outputs, metadata) instead of just outputs. metadata is a dictionary that will contain the information we need for generating CWLProv.
Propagate the metadata through the .run() calls to the root of the computation
Fill metadata with a data structure containing runtime information about the tasks (tree or dict, with the keys being the jobstore IDs)
Generate a ProvenanceProfile per task and a ResearchObject when all the metadata has been gathered.
Refactor cwltool/provenance.py so that recorded time and time of recording are decoupled.
Refactor ProvenanceProfile:prospective_prov out of the class to be the function that creates all the ProvenanceProfiles and relates them in a tree-like structure.
Refactor cwltool/provenance.py so that we can defer file movements until the end of the run
There's quite a bit of friction in order to do the changes because CWLProv is part of the cwltool package. I don't know up to what point can it be beneficial to separate it into a different module.
There is not much separation of concerns in some functions: they use provenance.py's functions directly. I think this is linked with some of the tight coupling we've already solved. The question is how far do we want to go. (I've only spent about an hour going into @inutano's provenance work)
Refactor
CWLJob.run()
to return(outputs, metadata)
instead of justoutputs
.metadata
is a dictionary that will contain the information we need for generating CWLProv.Propagate the metadata through the
.run()
calls to the root of the computationTry to reuse Toil's Jobstore ID's (See Accessing the Jobstoreid corresponding to a job #2449) for each
CWLJob
record this ID and the parent ID.Fill metadata with a data structure containing runtime information about the tasks (tree or dict, with the keys being the jobstore IDs)
Generate a
ProvenanceProfile
per task and aResearchObject
when all the metadata has been gathered.Refactor
cwltool/provenance.py
so that recorded time and time of recording are decoupled.Refactor
ProvenanceProfile:prospective_prov
out of the class to be the function that creates all theProvenanceProfile
s and relates them in a tree-like structure.Refactor
cwltool/provenance.py
so that we can defer file movements until the end of the runUpdate Toil to use cwltool with the fixes (Update cwltool version to the latest #2469)
Most of the progress is found on https://github.com/DataBiosphere/toil/tree/wip-prov
┆Issue is synchronized with this Jira Story
┆Issue Number: TOIL-280
The text was updated successfully, but these errors were encountered: