Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Able to use either local or S3 remote vcf file #378

Merged
merged 3 commits into from
Nov 9, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .github/workflows/python-app.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,3 +46,9 @@ jobs:
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
run: docker-compose run -e APP_ENV=prod -e AWS_SECRET_ACCESS_KEY -e AWS_ACCESS_KEY_ID app pytest --color=yes

- name: Test with PyTest for S3
env:
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
run: docker-compose run -e APP_ENV=prod -e VCF_FILE="s3://phenopolis-vcf/August2019/merged2.vcf.gz" -e AWS_SECRET_ACCESS_KEY -e AWS_ACCESS_KEY_ID app pytest --color=yes -k test_variants
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -78,4 +78,3 @@ private.env

dc_dev.yml
/.mypy_cache/
*tbi
14 changes: 13 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,19 @@ A description of the code setup is available [here](code_setup.md).

## Setup using docker compose

Set the following environment variables in `private.env`:
Set the following environment variable in

* `public.env`:

```bash
VCF_FILE=...
```

Where `VCF_FILE` can be either a local file (e.g. `path/file.vcf.gz`) or a remote `S3` file (e.g. `s3://any_remote/file.vcf.gz` )

It's critical that the `VCF_FILE` has along its `tbi` file as well.

* Create `private.env` and add:

```bash
AWS_SECRET_ACCESS_KEY=....
Expand Down
2 changes: 1 addition & 1 deletion public.env
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ PH_DB_USER=phenopolis_api
PH_DB_PASSWORD=phenopolis_api
PH_DB_PORT=5432

S3_VCF_FILE_URL="s3://phenopolis-vcf/August2019/merged2.vcf.gz"
VCF_FILE=schema/small_demo.vcf.gz

MAIL_USERNAME=no-reply@phenopolis.com

Expand Down
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ ujson>=3.0,<3.1
python-dotenv>=0.14,<0.15
itsdangerous>=1.1,<1.2
bidict>=0.21,<0.22
cyvcf2>=0.20.9
cyvcf2>=0.30.12
boto3>=1.16.43

# for checks
Expand Down
Binary file added schema/small_demo.vcf.gz
Binary file not shown.
Binary file added schema/small_demo.vcf.gz.tbi
Binary file not shown.
3 changes: 2 additions & 1 deletion tests/conftest.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import pytest
import os
from dotenv import load_dotenv
from views import application, APP_ENV, VERSION
from views.auth import ADMIN_USER, USER, DEMO_USER
Expand All @@ -8,7 +9,7 @@


def pytest_report_header(config):
return f">>> Version: {VERSION}, APP_ENV: {APP_ENV}"
return f">>>\tVersion: {VERSION}\n\tAPP_ENV: {APP_ENV}\n\tVCF_FILE: {os.getenv('VCF_FILE')}"


@pytest.fixture
Expand Down
17 changes: 15 additions & 2 deletions tests/test_variants.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@


def test_get_genotypes_exception():
# if this happens, something is out of sync between S3 VCF file and variant table in DB
# if this happens, something is out of sync between VCF file and variant table in DB
redirected_error = sys.stderr = StringIO()
exec('_get_genotypes("443", "10000")')
err = redirected_error.getvalue()
Expand All @@ -14,7 +14,7 @@ def test_get_genotypes_exception():

def test_variant(_demo):
"""
This tests S3 and VCF access via cvycf2
This tests VCF access via cvycf2
tests both for subset and entry not in DB, the real one is 14-76127655-C-T
res -> str
"""
Expand All @@ -35,6 +35,19 @@ def test_variant_web(_admin_client):
assert "[{'display': 'my:PH00008258'," in str(resp.json), "Check for 'my:..."


def test_variant_genotype_vcf(_admin_client):
resp = _admin_client.get("/variant/14-76156575-A-G")
assert resp.status_code == 200
assert len(resp.json[0]["genotypes"]["data"]) == 4, "Critical, VCF access not working"


def test_cyvcf2_S3(_admin_client):
from cyvcf2 import VCF

vcf_S3 = VCF("s3://3kricegenome/test/test.vcf.gz") # public VCF file
assert len(vcf_S3.raw_header) == 559362, "Critical, S3 access not working"


def test_missing_variant(_demo):
response = variant("chr45-1234567890112233-C-G")
assert response.status_code == 404
Expand Down
7 changes: 1 addition & 6 deletions views/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@
from subprocess import Popen, STDOUT, PIPE
import psycopg2


# Options are: prod, dev, debug (default)
APP_ENV = os.getenv("APP_ENV", "debug")

Expand All @@ -34,11 +33,7 @@
if APP_ENV in ["prod"]:
ENV_LOG_FLAG = False

# in GH Workflow tests, private.env is not available so skip variant tests
try:
variant_file = VCF(os.getenv("S3_VCF_FILE_URL", "s3://phenopolis-vcf/August2019/merged2.vcf.gz"))
except OSError:
variant_file = None
variant_file = VCF(os.getenv("VCF_FILE"))


def _configure_logs():
Expand Down