Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusing provenance model (wasStartedBy/qualifiedStart/wasEndedBy/qualifiedEnd) #2007

Open
avillar opened this issue Jun 3, 2024 · 0 comments

Comments

@avillar
Copy link

avillar commented Jun 3, 2024

Actual Behavior

I'll use PROV-N notation here, but naturally the same applies to the RDF versions.

When generating a provenance trace for a cwl run, an Agent is defined to represent the tool running the workflow, like so:

agent(id:agent-id, [prov:type='prov:SoftwareAgent', prov:type='wfprov:WorkflowEngine', prov:label="cwltool 3.1.20240508115724"])

This Agent is also used as an activity in a way that IMHO is confusing. For example:

wasStartedBy(id:main-activity-id, -, id:agent-id, 2024-06-03T12:05:08.637959)

The Agent here is used as the starter Activity for the main activity. While PROV-O doesn't prevent an Agent from also being an Activity, the semantics in here are a bit confusing, since the Agent is one thing (the software that ran the workflow) and the starter Activity is a different one (a given execution of the software), and in this provenance trace we're conflating the two of them.

A similar pattern is followed again when declaring a wasStartedBy for the software agent:

wasStartedBy(id:agent-id, -, id:empty-agent-id, 2024-06-03T13:25:29.034671)
agent(id:empty-agent-id)
agent(id:avillar, [prov:type='schema:Person', prov:type='prov:Person', prov:label="Alejandro Villar", foaf:name="Alejandro Villar", schema:name="Alejandro Villar"])
actedOnBehalfOf(id:empty-agent-id, id:avillar, -)

An empty (i.e., no descriptive metadata) agent (the host?) is generated and said to act on behalf of the user (myself in this case), and this empty agent is bound to the software ageng via wasStartedBy, effectively declaring that both are Activities.

Suggested Behavior

The Activity for starting/stopping the software agent should be separated from the Agent itself. I think something like the following (simplified) diagram would make more sense:

+-----------------+  wasStartedBy        +--------------------+
| starterActivity | <------------------- |    mainActivity    |
+-----------------+                      +--------------------+
  |                                        |
  |                                        | wasEndedBy
  |                                        v
  |                                      +--------------------+
  |                                      |   endingActivity   |
  |                                      +--------------------+
  |                                        |
  |                                        | wasAssociatedWith
  |                                        v
  |                 wasAssociatedWith    +--------------------+
  +------------------------------------> |   softwareAgent    |
                                         +--------------------+
                                           |
                                           | actedOnBehalfOf
                                           v
                                         +--------------------+
                                         |        user        |
                                         +--------------------+

Qualified relationships could be used for the associations if additional metadata needs to be included (such as timestamps).

Workflow Code

hello_world.cwl run with cwltool --provenance prov --enable-user-provenance --full-name 'Alejandro Villar' --enable-host-provenance hello_world.cwl

Your Environment

  • cwltool version: cwltool 3.1.20240508115724
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant