Skip to content

cbg-ethz/sr2silo

Repository files navigation

sr2silo

Logo

Project Status: WIP – This project is currently under active development. CI/CD Black Pytest Ruff Pyright

Wrangling Short-Read Genomic Alignments for SILO Database

This project will wrangle short-read genomic alignments, for example from wastewater-sampling, into a format for easy import into the SILO sequencing database.

Usage of the V-Pipe Docker

The V-Pipe Docker is designed to process a single .bam file and upload the results to SILO.

Project Organization

  • silo-input-transformer: Is a rust based utility to handle the fasta to ndjson transformation and is here imported as a git submodule.
  • .github/workflows: Contains GitHub Actions used for building, testing, and publishing. install, and whether or not to mount the project directory into the container.
  • .vscode/settings.json: Contains VSCode settings specific to the project, such as the Python interpreter to use and the maximum line length for auto-formatting.
  • src: Place new source code here.
  • scripts: Place new source code here, temporary and intermediate works.
  • tests: Contains Python-based test cases to validate source code.
  • pyproject.toml: Contains metadata about the project and configurations for additional tools used to format, lint, type-check, and analyze Python code.

Setting up the repository

To build the package and maintain dependencies, we use Poetry. In particular, it's good to install it and become familiar with its basic functionalities by reading the documentation.

Setting up the Development Environment

  1. Create and activate the conda environment from the environment.yml file:
conda env create -f environment.yml
conda activate sr2silo
  1. Set up the environment with development tools:
poetry install --with dev
poetry run pre-commit install

Then, you will be able to run tests:

$ poetry run pytest

... or check the types:

$ poetry run pyright

Alternatively, you may prefer to work with the right Python environment using:

$ poetry shell
$ pytest

[WIP]: Run V-Pipe to SILO Transformation

This is currently implemented as script and under heavy development. To run, we recommend a build as a docker compose as it relies on other RUST components.

Configuration

Edit the docker-compose.env file in the docker-compose directory with the following paths:

SAMPLE_DIR=../../../data/sr2silo/daemon_test/samples/A1_05_2024_10_08/20241024_2411515907/alignments/
SAMPLE_ID=A1_05_2024_10_08
BATCH_ID=20241024_2411515907
TIMELINE_FILE=../../../data/sr2silo/daemon_test/timeline.tsv
NEXTCLADE_REFERENCE=sars-cov2
RESULTS_DIR=./results

Docker Secrets

To upload the processed outputs S3 storage is required.

For sensitive information like AWS credentials, use Docker secrets. Create the following files in the secrets directory:

  • secrets/aws_access_key_id.txt:

YourAWSAccessKeyId

  • secrets/aws_secret_access_key.txt:

YourAWSSecretAccessKey

  • secrets/aws_default_region.txt: YourAWSRegion

Run Transformation

To process a single sample, run the following command:

docker-compose --env-file .env up --build

Tool Sections

The code quality checks run on GitHub can be seen in

  • .github/workflows/test.yml for the python package CI/CD,

We are using:

  • Ruff to lint the code.
  • Black to format the code.
  • Pyright to check the types.
  • Pytest to run the unit tests code and workflows.
  • Interrogate to check the documentation.

Contributing

This project welcomes contributions and suggestions. For details, visit the repository's Contributor License Agreement (CLA) and Code of Conduct pages.

About

Wrangling short-read sequencing data for import into SILO

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages