Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to represent which compute provider was used? #82

Open
4 tasks
mr-c opened this issue Jun 20, 2024 · 6 comments
Open
4 tasks

How to represent which compute provider was used? #82

mr-c opened this issue Jun 20, 2024 · 6 comments

Comments

@mr-c
Copy link

mr-c commented Jun 20, 2024

Could be for the entire workflow and/or each individual step (for the case of distributed execution)

  • Where (and with what term) in the ro-crate-metadata.json metadata file would we represent this for an entire workflow?
  • Where (and with what term) in the ro-crate-metadata.json metadata file would we represent this for a specific step?
  • How should the provider be credited? Current thinking is to use an ROR + a local identifier (like a cluster name)
  • Probably should be a list of identifiers, for layered systems. Example: OpenStack VMs provisioned on the de.NBI cloud, at the Freiburg University instance. (needing to give credit to OpenStack, de.NBI Cloud, and the Freiburg Uni computers)

(@mr-c has heard about instrument identifiers, but knows nothing about that beyond the existence of the concept)

@mr-c
Copy link
Author

mr-c commented Jun 20, 2024

I see the instrument field of the Process Run Crate profile, but that seems to be capturing the name of the tool, not where it was run

https://www.researchobject.org/workflow-run-crate/profiles/process_run_crate/#requirements

@jmfernandez
Copy link
Contributor

I guess agent is most suitable for that task. When it is used, it should link to the Person or the Organization.

The question here is whether an Organization entity is suitable for the purpose.

@stain
Copy link
Contributor

stain commented Jun 25, 2024

From https://www.researchobject.org/workflow-run-crate/profiles/provenance_run_crate/ figure
I think it would be better to attach to the ControlAction as there may be some kind of job submission, containers etc. involved before it comes down to the file-based command line execution represented by the bottom CreateAction for the tool. Perhaps the new https://schema.org/provider can be better used -- it currently points to a Person or Organization so same issue as with agent. (agent I would say would rather be the workflow engine in that case, but it's not as needed information as the reverse link with object is already there.)

provenance diagram

@stain
Copy link
Contributor

stain commented Jun 25, 2024

If you want to identify the infrastructure separate from the infrastructure organisation, (e.g. ARCHER2) then perhaps we need to find a way to a https://schema.org/Service or even https://schema.org/WebAPI (e.g. WES)

@stain
Copy link
Contributor

stain commented Sep 4, 2024

location on the tool execution could also be used, perhaps to a https://schema.org/VirtualLocation ?

@stain
Copy link
Contributor

stain commented Sep 4, 2024

Discussion in ELIXIR Compute WP5, use case is Mgnify workflow in Galaxy, executed in Pulsar nodes. From there we would ideally need:

  • Where was it run? (identifier/name of the platform)
    • Galaxy (version NN), at usegalaxy.eu; using pulsar nodes XX & YY; final storage at de.NBI Cloud host ZZ
    • Workflow system/platform & version
    • Host name & organization identifier (ROR or something for consortia/virtual orgs)
    • Compute providers (if different from above) (Pulsar nodes)
    • Storage providers (if different from above)
  • Which workflow? (identifier/name like from workflowhub.org)
  • Which user-supplied data was used?
  • ENA Sample accession identifiers, all data has to be in ENA first. Example: https://www.ebi.ac.uk/ena/browser/view/SAMN11835499
  • What user-chosen parameters?
    galaxy/CWL input object (key-value pairs)

Raised by @mberacochea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants