Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Curl called on s3 URLs from rna seq cgl manifest #361

Closed
nsjake opened this issue Jul 16, 2016 · 4 comments · Fixed by #376
Closed

Curl called on s3 URLs from rna seq cgl manifest #361

nsjake opened this issue Jul 16, 2016 · 4 comments · Fixed by #376
Assignees
Labels

Comments

@nsjake
Copy link

nsjake commented Jul 16, 2016

The generated manifest says that s3 URLs pointing to samples are allowed. Note that converting the S3 URL to its https counterpart resolved the issue.

Command:

toil-rnaseq run ./jobstore --retryCount=1 --workDir=/data --manifest=/mnt/ephemeral/manifest-toil-rnaseq.tsv --workDir=/mnt/ephemeral/data &> ~/log

log:

ip-172-31-18-211: 2016-07-16 19:59:19,045 INFO: toil.lib.bioio: Logging set at level: INFO
ip-172-31-18-211: 2016-07-16 19:59:19,045 INFO: toil.lib.bioio: Logging set at level: INFO
ip-172-31-18-211: 2016-07-16 19:59:19,047 INFO: toil.jobStores.fileJobStore: Path to job store directory is '/mnt/ephemeral/jobstore'.
ip-172-31-18-211: 2016-07-16 19:59:19,047 INFO: toil.jobStores.abstractJobStore: The workflow ID is: '55011317-aa9f-4c09-8283-44cd4605c1de'
ip-172-31-18-211: 2016-07-16 19:59:19,049 INFO: toil.common: Using the single machine batch system
ip-172-31-18-211: 2016-07-16 19:59:19,049 WARNING: toil.batchSystems.singleMachine: Limiting maxCores to CPU count of system (32).
ip-172-31-18-211: 2016-07-16 19:59:19,049 WARNING: toil.batchSystems.singleMachine: Limiting maxMemory to physically available memory (63321128960).
ip-172-31-18-211: 2016-07-16 19:59:19,049 INFO: toil.batchSystems.singleMachine: Setting up the thread pool with 320 workers, given a minimum CPU fraction of 0.100000 and a maximum CPU value of 32.
ip-172-31-18-211: 2016-07-16 19:59:19,108 INFO: toil.common: Written the environment for the jobs to the environment file
ip-172-31-18-211: 2016-07-16 19:59:19,108 INFO: toil.common: Caching all jobs in job store
ip-172-31-18-211: 2016-07-16 19:59:19,108 INFO: toil.common: 0 jobs downloaded.
ip-172-31-18-211: 2016-07-16 19:59:19,110 INFO: toil.realtimeLogger: Real-time logging disabled
ip-172-31-18-211: 2016-07-16 19:59:19,112 INFO: toil.leader: (Re)building internal scheduler state
ip-172-31-18-211: 2016-07-16 19:59:19,112 INFO: toil.leader: Checked batch system has no running jobs and no updated jobs
ip-172-31-18-211: 2016-07-16 19:59:19,112 INFO: toil.leader: Found 1 jobs to start and 0 jobs with successors to run
ip-172-31-18-211: 2016-07-16 19:59:19,116 INFO: toil.leader: Starting the main loop
ip-172-31-18-211: 2016-07-16 19:59:19,117 INFO: toil.batchSystems.singleMachine: Executing command: '_toil_worker /mnt/ephemeral/jobstore 3/N/joba0PJ7U'.
INFO:toil.common:Created the workflow directory at /mnt/ephemeral/data/toil-55011317-aa9f-4c09-8283-44cd4605c1de
ip-172-31-18-211: 2016-07-16 19:59:19,384 INFO: toil.batchSystems.singleMachine: Executing command: '_toil_worker /mnt/ephemeral/jobstore u/x/jobBfQvYU'.
ip-172-31-18-211: 2016-07-16 19:59:19,643 INFO: toil.leader: Got message from job at time 07-16-2016 19:59:19: UUID: 6f4a6944-9437-4549-b574-342261291192
URL: s3://cgl-driver-projects-encrypted/wcdt/oicr_issue42_input/RNA150410JA_10_1-1-15_M12_EV_05uM_JQ1.tar
Paired: False
File Type: tar
Cores: 32
CIMode: None
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/local/lib/python2.7/dist-packages/toil/fileStore.py", line 1215, in asyncWrite
    raise RuntimeError("The termination flag is set, exiting")
RuntimeError: The termination flag is set, exiting
Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/local/lib/python2.7/dist-packages/toil/fileStore.py", line 1215, in asyncWrite
    raise RuntimeError("The termination flag is set, exiting")
RuntimeError: The termination flag is set, exiting


Exception RuntimeError: RuntimeError('cannot join current thread',) in <bound method FileStore.__del__ of <toil.fileStore.FileStore object at 0x7f911d1409d0>> ignored
ip-172-31-18-211: 2016-07-16 19:59:21,599 WARNING: toil.leader: The jobWrapper seems to have left a log file, indicating failure: u/x/jobBfQvYU
ip-172-31-18-211: 2016-07-16 19:59:21,599 WARNING: toil.leader: Reporting file: u/x/jobBfQvYU
ip-172-31-18-211: 2016-07-16 19:59:21,599 WARNING: toil.leader: u/x/jobBfQvYU:  ---TOIL WORKER OUTPUT LOG---
ip-172-31-18-211: 2016-07-16 19:59:21,599 WARNING: toil.leader: u/x/jobBfQvYU:  WARNING:toil.resource:The localize() method should only be invoked on a worker.
ip-172-31-18-211: 2016-07-16 19:59:21,599 WARNING: toil.leader: u/x/jobBfQvYU:  WARNING:toil.resource:Can't find resource for leader path '/home/ubuntu/toil-scripts/local/lib/python2.7/site-packages/toil_scripts'
ip-172-31-18-211: 2016-07-16 19:59:21,599 WARNING: toil.leader: u/x/jobBfQvYU:  WARNING:toil.resource:Can't localize module ModuleDescriptor(dirPath='/home/ubuntu/toil-scripts/local/lib/python2.7/site-packages', name='toil_scripts.lib.urls')
ip-172-31-18-211: 2016-07-16 19:59:21,599 WARNING: toil.leader: u/x/jobBfQvYU:  WARNING:toil.resource:The localize() method should only be invoked on a worker.
ip-172-31-18-211: 2016-07-16 19:59:21,599 WARNING: toil.leader: u/x/jobBfQvYU:  WARNING:toil.resource:Can't find resource for leader path '/home/ubuntu/toil-scripts/local/lib/python2.7/site-packages/toil_scripts'
ip-172-31-18-211: 2016-07-16 19:59:21,599 WARNING: toil.leader: u/x/jobBfQvYU:  WARNING:toil.resource:Can't localize module ModuleDescriptor(dirPath='/home/ubuntu/toil-scripts/local/lib/python2.7/site-packages', name='toil_scripts.lib.urls')
ip-172-31-18-211: 2016-07-16 19:59:21,599 WARNING: toil.leader: u/x/jobBfQvYU:  Traceback (most recent call last):
ip-172-31-18-211: 2016-07-16 19:59:21,600 WARNING: toil.leader: u/x/jobBfQvYU:    File "/usr/local/lib/python2.7/dist-packages/toil/worker.py", line 330, in main
ip-172-31-18-211: 2016-07-16 19:59:21,600 WARNING: toil.leader: u/x/jobBfQvYU:      job._runner(jobWrapper=jobWrapper, jobStore=jobStore, fileStore=fileStore)
ip-172-31-18-211: 2016-07-16 19:59:21,600 WARNING: toil.leader: u/x/jobBfQvYU:    File "/usr/local/lib/python2.7/dist-packages/toil/job.py", line 1043, in _runner
ip-172-31-18-211: 2016-07-16 19:59:21,600 WARNING: toil.leader: u/x/jobBfQvYU:      returnValues = self._run(jobWrapper, fileStore)
ip-172-31-18-211: 2016-07-16 19:59:21,600 WARNING: toil.leader: u/x/jobBfQvYU:    File "/usr/local/lib/python2.7/dist-packages/toil/job.py", line 991, in _run
ip-172-31-18-211: 2016-07-16 19:59:21,600 WARNING: toil.leader: u/x/jobBfQvYU:      return self.run(fileStore)
ip-172-31-18-211: 2016-07-16 19:59:21,600 WARNING: toil.leader: u/x/jobBfQvYU:    File "/usr/local/lib/python2.7/dist-packages/toil/job.py", line 1136, in run
ip-172-31-18-211: 2016-07-16 19:59:21,600 WARNING: toil.leader: u/x/jobBfQvYU:      rValue = userFunction(*((self,) + tuple(self._args)), **self._kwargs)
ip-172-31-18-211: 2016-07-16 19:59:21,600 WARNING: toil.leader: u/x/jobBfQvYU:    File "/home/ubuntu/toil-scripts/local/lib/python2.7/site-packages/toil_scripts/lib/urls.py", line 42, in download_url_job
ip-172-31-18-211: 2016-07-16 19:59:21,600 WARNING: toil.leader: u/x/jobBfQvYU:      s3_key_path=s3_key_path, cghub_key_path=cghub_key_path)
ip-172-31-18-211: 2016-07-16 19:59:21,600 WARNING: toil.leader: u/x/jobBfQvYU:    File "/home/ubuntu/toil-scripts/local/lib/python2.7/site-packages/toil_scripts/lib/urls.py", line 25, in download_url
ip-172-31-18-211: 2016-07-16 19:59:21,600 WARNING: toil.leader: u/x/jobBfQvYU:      _download_encrypted_file(url, file_path, s3_key_path)
ip-172-31-18-211: 2016-07-16 19:59:21,600 WARNING: toil.leader: u/x/jobBfQvYU:    File "/home/ubuntu/toil-scripts/local/lib/python2.7/site-packages/toil_scripts/lib/urls.py", line 115, in _download_encrypted_file
ip-172-31-18-211: 2016-07-16 19:59:21,600 WARNING: toil.leader: u/x/jobBfQvYU:      subprocess.check_call(['curl', '-fs', '--retry', '5', '-H', h1, '-H', h2, '-H', h3, url, '-o', file_path])
ip-172-31-18-211: 2016-07-16 19:59:21,600 WARNING: toil.leader: u/x/jobBfQvYU:    File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
ip-172-31-18-211: 2016-07-16 19:59:21,600 WARNING: toil.leader: u/x/jobBfQvYU:      raise CalledProcessError(retcode, cmd)
ip-172-31-18-211: 2016-07-16 19:59:21,600 WARNING: toil.leader: u/x/jobBfQvYU:  CalledProcessError: Command '['curl', '-fs', '--retry', '5', '-H', 'x-amz-server-side-encryption-customer-algorithm:AES256', '-H', 'x-amz-server-side-encryption-customer-key:C/66a4rAtcgJIRhYGeVhdJSFXOAF6KHJhLoCwhiOK+8=', '-H', 'x-amz-server-side-encryption-customer-key-md5:o8RRVB9IStIj5OsTwhbnvg==', 's3://cgl-driver-projects-encrypted/wcdt/oicr_issue42_input/RNA150410JA_10_1-1-15_M12_EV_05uM_JQ1.tar', '-o', '/mnt/ephemeral/data/toil-55011317-aa9f-4c09-8283-44cd4605c1de/tmpjftIJZ/0b025702-f41a-4f8f-b295-69173e6ca73e/tSavOJg/RNA150410JA_10_1-1-15_M12_EV_05uM_JQ1.tar']' returned non-zero exit status 1
ip-172-31-18-211: 2016-07-16 19:59:21,600 WARNING: toil.leader: u/x/jobBfQvYU:  Exiting the worker because of a failed jobWrapper on host ip-172-31-18-211
ip-172-31-18-211: 2016-07-16 19:59:21,600 WARNING: toil.leader: u/x/jobBfQvYU:  ERROR:toil.worker:Exiting the worker because of a failed jobWrapper on host ip-172-31-18-211
ip-172-31-18-211: 2016-07-16 19:59:21,600 WARNING: toil.leader: u/x/jobBfQvYU:  WARNING:toil.jobWrapper:Due to failure we are reducing the remaining retry count of job u/x/jobBfQvYU to 1
ip-172-31-18-211: 2016-07-16 19:59:21,601 INFO: toil.batchSystems.singleMachine: Executing command: '_toil_worker /mnt/ephemeral/jobstore u/x/jobBfQvYU'.
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/local/lib/python2.7/dist-packages/toil/fileStore.py", line 1215, in asyncWrite
    raise RuntimeError("The termination flag is set, exiting")
RuntimeError: The termination flag is set, exiting
Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/local/lib/python2.7/dist-packages/toil/fileStore.py", line 1215, in asyncWrite
    raise RuntimeError("The termination flag is set, exiting")
RuntimeError: The termination flag is set, exiting


Exception RuntimeError: RuntimeError('cannot join current thread',) in <bound method FileStore.__del__ of <toil.fileStore.FileStore object at 0x7f26d0c6e9d0>> ignored
ip-172-31-18-211: 2016-07-16 19:59:23,815 WARNING: toil.leader: The jobWrapper seems to have left a log file, indicating failure: u/x/jobBfQvYU
ip-172-31-18-211: 2016-07-16 19:59:23,815 WARNING: toil.leader: Reporting file: u/x/jobBfQvYU
ip-172-31-18-211: 2016-07-16 19:59:23,815 WARNING: toil.leader: u/x/jobBfQvYU:  ---TOIL WORKER OUTPUT LOG---
ip-172-31-18-211: 2016-07-16 19:59:23,816 WARNING: toil.leader: u/x/jobBfQvYU:  WARNING:toil.resource:The localize() method should only be invoked on a worker.
ip-172-31-18-211: 2016-07-16 19:59:23,816 WARNING: toil.leader: u/x/jobBfQvYU:  WARNING:toil.resource:Can't find resource for leader path '/home/ubuntu/toil-scripts/local/lib/python2.7/site-packages/toil_scripts'
ip-172-31-18-211: 2016-07-16 19:59:23,816 WARNING: toil.leader: u/x/jobBfQvYU:  WARNING:toil.resource:Can't localize module ModuleDescriptor(dirPath='/home/ubuntu/toil-scripts/local/lib/python2.7/site-packages', name='toil_scripts.lib.urls')
ip-172-31-18-211: 2016-07-16 19:59:23,816 WARNING: toil.leader: u/x/jobBfQvYU:  WARNING:toil.resource:The localize() method should only be invoked on a worker.
ip-172-31-18-211: 2016-07-16 19:59:23,816 WARNING: toil.leader: u/x/jobBfQvYU:  WARNING:toil.resource:Can't find resource for leader path '/home/ubuntu/toil-scripts/local/lib/python2.7/site-packages/toil_scripts'
ip-172-31-18-211: 2016-07-16 19:59:23,816 WARNING: toil.leader: u/x/jobBfQvYU:  WARNING:toil.resource:Can't localize module ModuleDescriptor(dirPath='/home/ubuntu/toil-scripts/local/lib/python2.7/site-packages', name='toil_scripts.lib.urls')
ip-172-31-18-211: 2016-07-16 19:59:23,816 WARNING: toil.leader: u/x/jobBfQvYU:  Traceback (most recent call last):
ip-172-31-18-211: 2016-07-16 19:59:23,816 WARNING: toil.leader: u/x/jobBfQvYU:    File "/usr/local/lib/python2.7/dist-packages/toil/worker.py", line 330, in main
ip-172-31-18-211: 2016-07-16 19:59:23,816 WARNING: toil.leader: u/x/jobBfQvYU:      job._runner(jobWrapper=jobWrapper, jobStore=jobStore, fileStore=fileStore)
ip-172-31-18-211: 2016-07-16 19:59:23,816 WARNING: toil.leader: u/x/jobBfQvYU:    File "/usr/local/lib/python2.7/dist-packages/toil/job.py", line 1043, in _runner
ip-172-31-18-211: 2016-07-16 19:59:23,816 WARNING: toil.leader: u/x/jobBfQvYU:      returnValues = self._run(jobWrapper, fileStore)
ip-172-31-18-211: 2016-07-16 19:59:23,816 WARNING: toil.leader: u/x/jobBfQvYU:    File "/usr/local/lib/python2.7/dist-packages/toil/job.py", line 991, in _run
ip-172-31-18-211: 2016-07-16 19:59:23,816 WARNING: toil.leader: u/x/jobBfQvYU:      return self.run(fileStore)
ip-172-31-18-211: 2016-07-16 19:59:23,816 WARNING: toil.leader: u/x/jobBfQvYU:    File "/usr/local/lib/python2.7/dist-packages/toil/job.py", line 1136, in run
ip-172-31-18-211: 2016-07-16 19:59:23,816 WARNING: toil.leader: u/x/jobBfQvYU:      rValue = userFunction(*((self,) + tuple(self._args)), **self._kwargs)
ip-172-31-18-211: 2016-07-16 19:59:23,816 WARNING: toil.leader: u/x/jobBfQvYU:    File "/home/ubuntu/toil-scripts/local/lib/python2.7/site-packages/toil_scripts/lib/urls.py", line 42, in download_url_job
ip-172-31-18-211: 2016-07-16 19:59:23,816 WARNING: toil.leader: u/x/jobBfQvYU:      s3_key_path=s3_key_path, cghub_key_path=cghub_key_path)
ip-172-31-18-211: 2016-07-16 19:59:23,816 WARNING: toil.leader: u/x/jobBfQvYU:    File "/home/ubuntu/toil-scripts/local/lib/python2.7/site-packages/toil_scripts/lib/urls.py", line 25, in download_url
ip-172-31-18-211: 2016-07-16 19:59:23,816 WARNING: toil.leader: u/x/jobBfQvYU:      _download_encrypted_file(url, file_path, s3_key_path)
ip-172-31-18-211: 2016-07-16 19:59:23,816 WARNING: toil.leader: u/x/jobBfQvYU:    File "/home/ubuntu/toil-scripts/local/lib/python2.7/site-packages/toil_scripts/lib/urls.py", line 115, in _download_encrypted_file
ip-172-31-18-211: 2016-07-16 19:59:23,816 WARNING: toil.leader: u/x/jobBfQvYU:      subprocess.check_call(['curl', '-fs', '--retry', '5', '-H', h1, '-H', h2, '-H', h3, url, '-o', file_path])
ip-172-31-18-211: 2016-07-16 19:59:23,817 WARNING: toil.leader: u/x/jobBfQvYU:    File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
ip-172-31-18-211: 2016-07-16 19:59:23,817 WARNING: toil.leader: u/x/jobBfQvYU:      raise CalledProcessError(retcode, cmd)
ip-172-31-18-211: 2016-07-16 19:59:23,817 WARNING: toil.leader: u/x/jobBfQvYU:  CalledProcessError: Command '['curl', '-fs', '--retry', '5', '-H', 'x-amz-server-side-encryption-customer-algorithm:AES256', '-H', 'x-amz-server-side-encryption-customer-key:C/66a4rAtcgJIRhYGeVhdJSFXOAF6KHJhLoCwhiOK+8=', '-H', 'x-amz-server-side-encryption-customer-key-md5:o8RRVB9IStIj5OsTwhbnvg==', 's3://cgl-driver-projects-encrypted/wcdt/oicr_issue42_input/RNA150410JA_10_1-1-15_M12_EV_05uM_JQ1.tar', '-o', '/mnt/ephemeral/data/toil-55011317-aa9f-4c09-8283-44cd4605c1de/tmpsDoUyY/48347a2d-11c8-4cb2-9e09-96e89d2f3584/tLL0kqR/RNA150410JA_10_1-1-15_M12_EV_05uM_JQ1.tar']' returned non-zero exit status 1
ip-172-31-18-211: 2016-07-16 19:59:23,817 WARNING: toil.leader: u/x/jobBfQvYU:  Exiting the worker because of a failed jobWrapper on host ip-172-31-18-211
ip-172-31-18-211: 2016-07-16 19:59:23,817 WARNING: toil.leader: u/x/jobBfQvYU:  ERROR:toil.worker:Exiting the worker because of a failed jobWrapper on host ip-172-31-18-211
ip-172-31-18-211: 2016-07-16 19:59:23,817 WARNING: toil.leader: u/x/jobBfQvYU:  WARNING:toil.jobWrapper:Due to failure we are reducing the remaining retry count of job u/x/jobBfQvYU to 0
ip-172-31-18-211: 2016-07-16 19:59:23,817 WARNING: toil.leader: Job: u/x/jobBfQvYU is completely failed
ip-172-31-18-211: 2016-07-16 19:59:25,817 INFO: toil.leader: No jobs left to run so exiting.
ip-172-31-18-211: 2016-07-16 19:59:25,817 INFO: toil.leader: Finished the main loop
ip-172-31-18-211: 2016-07-16 19:59:25,817 INFO: toil.leader: Waiting for stats and logging collator process to finish ...
ip-172-31-18-211: 2016-07-16 19:59:26,156 INFO: toil.leader: ... finished collating stats and logs. Took 0.338464021683 seconds
ip-172-31-18-211: 2016-07-16 19:59:26,156 INFO: toil.leader: Waiting for service manager thread to finish ...
ip-172-31-18-211: 2016-07-16 19:59:27,113 INFO: toil.leader: ... finished shutting down the service manager. Took 0.957464933395 seconds
ip-172-31-18-211: 2016-07-16 19:59:27,114 INFO: toil.leader: Finished toil run with 2 failed jobs
ip-172-31-18-211: 2016-07-16 19:59:27,114 INFO: toil.leader: Failed jobs at end of the run: set(['3/N/joba0PJ7U', 'u/x/jobBfQvYU'])
Traceback (most recent call last):
  File "/home/ubuntu/toil-scripts/bin/toil-rnaseq", line 11, in <module>
    sys.exit(main())
  File "/home/ubuntu/toil-scripts/local/lib/python2.7/site-packages/toil_scripts/rnaseq_cgl/rnaseq_cgl_pipeline.py", line 505, in main
    Job.Runner.startToil(Job.wrapJobFn(map_job, download_sample, samples, config), args)
  File "/usr/local/lib/python2.7/dist-packages/toil/job.py", line 500, in startToil
    return toil.start(job)
  File "/usr/local/lib/python2.7/dist-packages/toil/common.py", line 561, in start
    return self._runMainLoop(job)
  File "/usr/local/lib/python2.7/dist-packages/toil/common.py", line 775, in _runMainLoop
    jobCache=self._jobCache)
  File "/usr/local/lib/python2.7/dist-packages/toil/leader.py", line 694, in mainLoop
    raise FailedJobsException( config.jobStore, len(toilState.totalFailedJobs) )
toil.leader.FailedJobsException: The job store '/mnt/ephemeral/jobstore' contains 2 failed jobs

Manifest:

#   Edit this manifest to include information pertaining to each sample to be run.
#   There are 4 tab-separated columns: filetype, paired/unpaired, UUID, URL(s) to sample
#
#   filetype    Filetype of the sample. Options: "tar" or "fq", for tarball/tarfile or fastq/fastq.gz
#   paired      Indicates whether the data is paired or single-ended. Options:  "paired" or "single"
#   UUID        This should be a unique identifier for the sample to be processed
#   URL         A URL ['http://', 'file://', 's3://', 'ftp://', 'gnos://'] pointing to the sample
#
#   If sample is being submitted as a fastq pair, provide two URLs separated by a comma.
#
#   Examples of several combinations are provided below. Lines beginning with # are ignored.
#
#   tar paired  UUID_1  file:///path/to/sample.tar
#   fq  paired  UUID_2  file:///path/to/R1.fq.gz,file:///path/to/R2.fq.gz
#   tar single  UUID_3  http://sample-depot.com/single-end-sample.tar
#   tar paired  UUID_4  s3://my-bucket-name/directory/paired-sample.tar.gz
#   fq  single  UUID_5  s3://my-bucket-name/directory/single-end-file.fq
#
#   Place your samples below, one per line.
tar single  6f4a6944-9437-4549-b574-342261291192    s3://cgl-driver-projects-encrypted/wcdt/oicr_issue42_input/RNA150410JA_10_1-1-15_M12_EV_05uM_JQ1.tar

@nsjake nsjake added the bug label Jul 16, 2016
@nsjake nsjake changed the title Curl being called on s3 URLs from manifest in rna seq cgl pipeline Curl called on s3 URLs from manifest in rna seq cgl pipeline Jul 16, 2016
@nsjake nsjake changed the title Curl called on s3 URLs from manifest in rna seq cgl pipeline Curl called on s3 URLs from rna seq cgl manifest Jul 16, 2016
@jvivian
Copy link
Collaborator

jvivian commented Jul 18, 2016

This is a known issue due to our (somewhat clunky) encryption system — Encrypted S3 URLs use the whole URL, which is region specific: e.g. https://s3-us-west-2.amazonaws.com/bucket/file.txt If the user supplies an S3 URL I'd have to make an assumption about what region they're trying to run this from. @hannes-ucsc and I discussed changing everything to just the s3:// URL, but figured it wasn't worth the effort just to replace that with the ICGC storage system.

I'm not sure there's a solution that isn't specific to just our group and our encryption system – meaning the wiki for production should include this information. I'll bring this up at scrum today.

@hannes-ucsc
Copy link
Contributor

If this is systemic, the title should be adjusted to reflect that.

@hannes-ucsc
Copy link
Contributor

Now that s3am supports downloads, all URLs referring to objects in S3 should use the s3:// scheme and be implemented by calling s3am. That should be fairly easy to do as all machinery is already there. You can rip out the encryption header stuff. As a beneficial side-effect, the downloads are going to be much faster and more reliable. The general rules for running s3am programmatically as documented on its README apply.

If the user insists on passing a region-specific, public or signed HTTP(S) url pointing at an object in S3 that should also support, minus encryption. It would be the responsibility of the user to fabricate HTTP(S) urls that can be curled without adding query parameters to the URL or setting headers when requesting them.

The ICGC storage system will NOT be addressed using s3:// URL's, but some other scheme like icgc://.

@hannes-ucsc
Copy link
Contributor

You can rip out the encryption header stuff

… from the curl invocation for http:// URLs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants