Skip to content

Latest commit

 

History

History

data_curation

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

The scripts here are used to import dataset into a PostgreSQL database for the SPT application or analysis, as well as organize or curate these datasets into a form acceptable for import.

  1. Curation or pre-processing
  2. Doing import / upload into the database

Curation or pre-processing

Datasets are stored in subdirectories of datasets/. To prepare a new dataset, follow the full example here. The example includes files pre-generated in a format ready for import into the database, but you can also re-generate them yourself.

Extraction scripts tend to be dataset-specific, but there are some common tasks like quantification over segments in images, and formulation of standardized representations of channel or clinical metadata.

Doing import / upload into the database

The recommended import method is to use spt db interactive-uploader.

It will take care of creating a run directory for Nextflow, configuring it with a workflow.config file like:

[general]
db_config_file = /Users/username/.spt_db.config.local

[database visitor]
study_name = Melanoma CyTOF ICI

[tabular import]
input_path = datasets/moldoveanu/generated_artifacts

or, in the case of S3 source files, with:

...
[tabular import]
input_path = s3://bucketname/moldoveanu

In the S3 case, you would have to make sure that credentials are available. Currently Nextflow requires, in the case of session-specific credentials, a "profile" in ~/.aws/credentials, usually the profile named default.

Note

To put the generated_artifacts/ files into an S3 bucket, you can use:

aws s3 cp generated_artifacts s3://bucket-name/dataset-name --recursive`