This directory provides Dockerfiles and corresponding 'recipes' for scripting/processing tasks that are common across workflows.
Contents:
Description: Convert a GTF annotation file to BED12 (also known as a 'gene model' file) format.
Subdirectory name: gtf_to_bed12
Dockerhub link: https://hub.docker.com/r/apaeval/gtftobed
Dockerhub url (to use with Nextflow/Snakemake): docker.io/apaeval/gtftobed:1.0
Usage (shell command to run the tool):
/gtfTobed.py --gtf <path to GTF file> --out_bed <name of the output BED file>
Description: Convert a .csv
to a .tsv
file using pandas
. Useful for conversion BEFORE running an method workflow (see "Compatibility" below).
Subdirectory name: csv2tsv
Compatibility:
Currently no Docker image available, but the script can be run inside the apaeval
conda environment.
Usage:
python csv2tsv.py --csv samples.csv --tsv samples.tsv
Description: Convert a benchmarking workflow output json file to a .tsv
file using pandas
. json objects will be flattened, and each object is converted to a table row. Furthermore, the metric_id
is split into 3 columns "metric", "window_size" and "site_set". If more than one json file is provided as input, resulting tables will be concatenated.
NOTE: In the current implementation, only OEBs "assessment" type objects will be kept and APAeval absolute quantification specific parameters
window_size
,site_set
andmetrics.metric_id
are expected to be present.This util will most likely have to be adapted for other events.
Subdirectory name: metrics_json2tsv
Compatibility:
Currently no Docker image available, but the script can be run inside the apaeval
conda environment.
Usage:
python metrics_json2tsv.py --file-list assessment1.json assessment2.json --output metrics.tsv
Description: Filter a benchmarking workflow output json file, i.e. remove objects that belong to specified metrics or challenges. Parts of metric- or challenge names can be specified and all objects (participant, assessment, aggregation, manifest-like) containing those metrics or challenges are removed.
Subdirectory name: filter_jsons
Compatibility:
Currently no Docker image available, but the script can be run inside the apaeval
conda environment.
Usage:
usage: filter_jsons.py [-h] -f FILE_LIST [FILE_LIST ...] -o OUT_PREFIX [-m [B_METRICS]] [-c [B_CHALLENGES]]
options:
-h, --help show this help message and exit
-f FILE_LIST [FILE_LIST ...], --file-list FILE_LIST [FILE_LIST ...]
list of OEB .json (assessments, validation or consolidation) filepaths.
-o OUT_PREFIX, --out_prefix OUT_PREFIX
prefix for output files
-m [B_METRICS], --b_metrics [B_METRICS]
list of metrics to be removed
-c [B_CHALLENGES], --b_challenges [B_CHALLENGES]
list of challenges to be removed
Have code for a pre-processing / file wrangling task that you think may be general to other tools? To add a tool to the directory, we need you to create a pull request with the following tasks completed:
- Create a suitably named subdirectory under
utils
to house the subsequent files (e.g. gtf_to_bed12) - Add a Dockerfile to the subdirectory to provide necessary dependencies and scripts. This should then be added to the
apaeval
team Dockerhub (if you don't have access, you can request as part of the pull request) - Add description of task and how to run it to the README. This should follow the format below:
## Header description of tool (placed under Available utilities)
(remove the escape slash when you copy and paste to interpret the hashes as titles)
Description: <short 1/2 sentence description of the task the utility completes>
Dockerhub link: https://hub.docker.com/r/apaeval/<tool_name>
Subdirectory name: <subdirectory name under utils/ storing Dockerfile and necessary scripts>
Docker.io url (to use with Nextflow/Snakemake): <docker.io/apaeval/<tool_name:tag>
Usage (shell command to run the tool):
python utility.py -i <input> -o <output>