Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cwl-tes execution to remote server #26

Open
fgypas opened this issue Aug 21, 2018 · 5 comments
Open

cwl-tes execution to remote server #26

fgypas opened this issue Aug 21, 2018 · 5 comments

Comments

@fgypas
Copy link

fgypas commented Aug 21, 2018

Hi

Can you add an example of how to execute cwl-tes when the Task Execution server (funnel, TESK) is running remotely and not locally?

I tried to execute the CWL test workflow via cwl-tes in a remote TESK (https://github.com/EMBL-EBI-TSI/TESK) instance as following:

cwl-tes \
--tes https://tes-dev.tsi.ebi.ac.uk/ \
hashsplitter-workflow.cwl.yml \
--input <(echo resources/test.txt)

or

cwl-tes \
--tes https://tes-dev.tsi.ebi.ac.uk/ \
hashsplitter-workflow.cwl.yml \
--input resources/test.txt

but non of them works. Based on a slack discussion it seems that the input and output data must be HTTP or FTP urls. Do you have a working example I can try?

Thank you in advance
Foivos

@susheel
Copy link

susheel commented Aug 23, 2018

@mr-c I've just gone through the pull request and the logs and it seems that the FTPFsAccess code does not seem to upload the input files to the FTP staging directory (ftp://ftp-private.ebi.ac.uk/upload/) as a result the job fails.

Further TESK only receives local path references to files and not the remote FTP URLs which it needs to download and stage for execution. Again this will fail if TESK only receives local path references.

TESK is set up to stage remote file/directory URLs to the cluster, execute and upload the outputs to a remote URL. The PoC we have uses FTP as a {de}staging area, but we hope to extend this to other remote storage services.

@susheel
Copy link

susheel commented Aug 23, 2018

  1. If you plan to execute cwl-tes with job parameters that reference local files, then cwl-tes and any FsAccess class needs to know the remote storage location to stage the local files. This must be supplied as a parameter (e.g. --remote-storage-url) to cwl-tes and upload/download of the files must be additionally handled by cwl-tes via FTPFsAccess. TESK should receive only URLs to the staged files and the output URLs to send the outputs to.

  2. A simpler workaround (at least for this PoC) is to ask the user to re-write their CWLs solely with references to remote FTP URLs for inputs and outputs. This is already supported in cwl-tes and TESK natively. The missing piece is {de}staging of intermediate files between workflow steps/tasks, where I assume cwl-tes via cwltool make FS checks for non-existent local file paths (as they are already remote URLs). This is specifically what we want to fix!

I assume option 2 is simpler to fix as it would involve disabling the unnecessary local FS checks, but option 1 is more ideal in the long term and extensible for other remote storage.

For option 1 (i.e. the PR #25 being developed) the missing pieces are:

  • Ability to parameterise cwl-tes with a --remote-storage-url
  • cwl-tes to use the remote storage url to upload/download files
  • cwl-tes to rewrite local paths to remote url paths before submitting to TES
  • Ability disable cwltools FS checks for intermediate outputs

@susheel
Copy link

susheel commented Aug 23, 2018

Looping in @psafont who may be able to advice further after he gets back from annual leave.

@susheel
Copy link

susheel commented Aug 23, 2018

@mr-c Find below the task submitted to TESK. Note the inputs[].url and outputs[].url paths. These have to be remote HTTP (input) FTP urls for TESK to process them.

{
  "id": "task-9bc998c3",
  "state": "SYSTEM_ERROR",
  "name": "bwa-mem-tool.cwl",
  "description": "",
  "inputs": [
    {
      "name": "reference",
      "description": "cwl_input:reference",
      "url": "file:///home/mcrusoe/common-workflow-language/v1.0/v1.0/chr20.fa",
      "path": "/var/lib/cwl/stg329168e2-8b26-452d-b08a-87b0e4374efe/chr20.fa",
      "type": "FILE"
    },
    {
      "name": "reads[0]",
      "description": "cwl_input:reads[0]",
      "url": "file:///home/mcrusoe/common-workflow-language/v1.0/v1.0/example_human_Illumina.pe_1.fastq",
      "path": "/var/lib/cwl/stg36b5d550-0a36-40b2-8347-147b962288eb/example_human_Illumina.pe_1.fastq",
      "type": "FILE"
    },
    {
      "name": "reads[1]",
      "description": "cwl_input:reads[1]",
      "url": "file:///home/mcrusoe/common-workflow-language/v1.0/v1.0/example_human_Illumina.pe_2.fastq",
      "path": "/var/lib/cwl/stg47227e17-6bdb-46d7-8682-db4f03e07728/example_human_Illumina.pe_2.fastq",
      "type": "FILE"
    },
    {
      "name": "args.py",
      "description": "cwl_input:args.py",
      "url": "file:///home/mcrusoe/common-workflow-language/v1.0/v1.0/args.py",
      "path": "/var/lib/cwl/stg48c30ccf-3907-4e38-ab13-5c017030672b/args.py",
      "type": "FILE"
    }
  ],
  "outputs": [
    {
      "name": "stdout",
      "url": "file:///tmp/tmpn0qd_ggf/output.sam",
      "path": "/var/spool/cwl/output.sam",
      "type": "FILE"
    },
    {
      "name": "workdir",
      "url": "file:///tmp/tmpn0qd_ggf/",
      "path": "/var/spool/cwl",
      "type": "DIRECTORY"
    }
  ],
  "resources": {},
  "executors": [
    {
      "image": "python:2-slim",
      "command": [
        "python",
        "/var/lib/cwl/stg48c30ccf-3907-4e38-ab13-5c017030672b/args.py",
        "bwa",
        "mem",
        "-t",
        "2",
        "-I",
        "1,2,3,4",
        "-m",
        "3",
        "/var/lib/cwl/stg329168e2-8b26-452d-b08a-87b0e4374efe/chr20.fa",
        "/var/lib/cwl/stg36b5d550-0a36-40b2-8347-147b962288eb/example_human_Illumina.pe_1.fastq",
        "/var/lib/cwl/stg47227e17-6bdb-46d7-8682-db4f03e07728/example_human_Illumina.pe_2.fastq"
      ],
      "workdir": "/var/spool/cwl",
      "stdout": "/var/spool/cwl/output.sam",
      "env": {
        "HOME": "/tmp/tmpn0qd_ggf",
        "TMPDIR": "/tmp/tmp6gk992dx"
      }
    }
  ],
  "tags": {
    "CWLDocumentId": "file:///home/mcrusoe/common-workflow-language/v1.0/v1.0/bwa-mem-tool.cwl"
  },
}

@mr-c
Copy link
Contributor

mr-c commented Aug 23, 2018

Thank you @susheel ; task-87dbc5ba was submitted using ftp:/ URLs but failed; do you have server-side logs for that task?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants