Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 2.2.2rc2 #704

Merged
merged 28 commits into from
Jul 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
65faa3c
replace status page domain name
raylrui Jun 6, 2024
09db8a4
add stg domain
raylrui Jun 7, 2024
33264f7
Merge pull request #689 from umccr/replace-status-page-domain
raylrui Jun 7, 2024
84304d8
Added Portal LabMetadata LibraryID suffix annotation treatment
victorskl Jun 11, 2024
1d2cccb
Merge pull request #690 from umccr/update-docs-20240611
victorskl Jun 12, 2024
4e9dc36
Added documentation on how to trigger DRAGEN Alignment QC
victorskl Jun 14, 2024
9c26108
Merge pull request #692 from umccr/update-docs-20240614
victorskl Jun 14, 2024
fb70af7
Added Tumor Normal engineParameter override for tesUseInputManifest n…
victorskl Jun 14, 2024
5e21d45
Added WES engineParameter override tesUseInputManifest never for all …
victorskl Jun 17, 2024
26352e5
Merge pull request #693 from umccr/fix-tumor-normal-engine-parameter
victorskl Jun 17, 2024
0b87ed9
Updated LabMetadata model enum for assay, type, source
victorskl Jun 17, 2024
a1c0b7e
Merge pull request #695 from umccr/update-labmeta-assay-enum
victorskl Jun 18, 2024
b39f8eb
Added support for EventBridge S3 event processing
victorskl Jun 21, 2024
fae986e
Merge pull request #696 from umccr/support-s3-event-through-eventbrid…
victorskl Jun 21, 2024
60baa92
Updated doc for WTS sample with multiple lane FASTQs
victorskl Jun 25, 2024
1172c36
Merge pull request #697 from umccr/update-doc-wts-star-vs-dragen
victorskl Jun 25, 2024
8c3e5de
Updated LabMetadata model enum for assay, type, source - migration sc…
victorskl Jun 29, 2024
6411ebb
Merge pull request #698 from umccr/update-labmeta-assay-enum-migration
victorskl Jun 29, 2024
7bbb702
Bumped dependencies
victorskl Jun 29, 2024
612d301
Merge pull request #699 from umccr/bump-deps-20240629
victorskl Jun 29, 2024
5df7a6d
Fixed pandas dataframe applymap future warning
victorskl Jun 29, 2024
d854b6a
Merge pull request #700 from umccr/fix-pandas-numpy
victorskl Jun 29, 2024
c6d4ea1
Updated doc library_suffix.md
victorskl Jul 1, 2024
afe31e8
Merge pull request #701 from umccr/update-docs-20240701
victorskl Jul 1, 2024
5d10bd7
Implemented Subject endpoint to include ctTSOv2 results from ICAv2 BYOB
victorskl Jul 1, 2024
22e2ed5
Merge pull request #702 from umccr/implement-subject-endpoint-cttsov2…
victorskl Jul 2, 2024
0a310ab
Fixed Subject having no ctTSOv2 assay library
victorskl Jul 2, 2024
edf7b47
Merge pull request #703 from umccr/fix-subject-endpoint-cttsov2-results
victorskl Jul 2, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ repos:
- id: detect-secrets
name: detect secrets
args: [ '--baseline', '.secrets.baseline' ]
exclude: ^(yarn.lock|.yarn/|.gitguardian.yaml)
exclude: ^(yarn.lock|.yarn/|.yarnrc.yml|.gitguardian.yaml)

- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.1.0
Expand Down
12 changes: 6 additions & 6 deletions .yarn/plugins/@yarnpkg/plugin-outdated.cjs

Large diffs are not rendered by default.

626 changes: 313 additions & 313 deletions .yarn/releases/yarn-4.2.2.cjs → .yarn/releases/yarn-4.3.1.cjs

Large diffs are not rendered by default.

7 changes: 4 additions & 3 deletions .yarnrc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@ enableGlobalCache: false
nodeLinker: node-modules

plugins:
- path: .yarn/plugins/@yarnpkg/plugin-outdated.cjs
spec: "https://mskelton.dev/yarn-outdated/v3"
- checksum: 5e73a1acbb9741fce1e8335e243c9480ea2107b9b4b65ed7643785ddea9e3019aee254a92a853b1cd71023b16fff5b7d3afd5256fe57cd35a54f8785b8c30281
path: .yarn/plugins/@yarnpkg/plugin-outdated.cjs
spec: "https://go.mskelton.dev/yarn-outdated/v4"

yarnPath: .yarn/releases/yarn-4.2.2.cjs
yarnPath: .yarn/releases/yarn-4.3.1.cjs
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,14 +22,14 @@ docker --version
Docker version 24.0.7, build afdd53b4e3

python -V
Python 3.11.5
Python 3.11.9

node -v
v18.18.0
v20.15.0

npm i -g yarn
yarn -v
4.2.2
4.3.1
```

then:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
# Generated by Django 4.2.11 on 2024-06-29 01:20

from django.db import migrations, models


class Migration(migrations.Migration):

dependencies = [
("data_portal", "0009_alter_workflow_version_alter_workflow_wfl_id_and_more"),
]

operations = [
migrations.AlterField(
model_name="labmetadata",
name="assay",
field=models.CharField(
choices=[
("TsqNano", "Tsq Nano"),
("TsqSTR", "Tsq Str"),
("NebDNA", "Neb Dna"),
("NebRNA", "Neb Rna"),
("10X-3prime-expression", "Ten X 3Prime"),
("10X-5prime-expression", "Ten X 5Prime"),
("10X-ADT", "Ten X Adt"),
("10X-ATAC", "Ten X Atac"),
("10X-CITE-feature", "Ten X Cite Feat"),
("10X-CITE-hashing", "Ten X Cite Hash"),
("10X-CNV", "Ten X Cnv"),
("10X-CSP", "Ten X Csp"),
("10X-GEX", "Ten X Gex"),
("10X-VDJ", "Ten X Vdj"),
("10X-VDJ-BCR", "Ten X Vdj Bcr"),
("10X-VDJ-TCR", "Ten X Vdj Tcr"),
("AgSsCRE", "Ag Ss Cre"),
("bATAC", "B Atac"),
("CRISPR", "Crispr"),
("ctTSO", "Ct Tso"),
("IDTxGen", "Idt X Gen"),
("IlmnDNAprep", "Ilmn Dna Prep"),
("NebDNAu", "Neb Dna U"),
("NebMS", "Neb Ms"),
("PCR-Free-Tagmentation", "Pcr Free"),
("Takara", "Takara"),
("TPlxHV", "Tpl X Hv"),
("TSODNA", "Tso Dna"),
("BM-5L", "Bm 5L"),
("BM-6L", "Bm 6L"),
("MeDIP", "Me Dip"),
("ctTSOv2", "Ct Tso V2"),
],
max_length=255,
),
),
migrations.AlterField(
model_name="labmetadata",
name="quality",
field=models.CharField(
choices=[
("ascites", "Acites"),
("blood", "Blood"),
("bone-marrow", "Bone"),
("buccal", "Buccal"),
("cell-line", "Cell Line"),
("cfDNA", "Cf Dna"),
("cyst-fluid", "Cyst"),
("DNA", "Dna"),
("eyebrow-hair", "Eyebrow"),
("FFPE", "Ffpe"),
("FNA", "Fna"),
("OCT", "Oct"),
("organoid", "Organoid"),
("PDX-tissue", "Pdx"),
("plasma-serum", "Plasma"),
("RNA", "Rna"),
("tissue", "Tissue"),
("water", "Water"),
("skin", "Skin"),
],
max_length=255,
),
),
migrations.AlterField(
model_name="labmetadata",
name="source",
field=models.CharField(
choices=[
("ascites", "Acites"),
("blood", "Blood"),
("bone-marrow", "Bone"),
("buccal", "Buccal"),
("cell-line", "Cell Line"),
("cfDNA", "Cf Dna"),
("cyst-fluid", "Cyst"),
("DNA", "Dna"),
("eyebrow-hair", "Eyebrow"),
("FFPE", "Ffpe"),
("FNA", "Fna"),
("OCT", "Oct"),
("organoid", "Organoid"),
("PDX-tissue", "Pdx"),
("plasma-serum", "Plasma"),
("RNA", "Rna"),
("tissue", "Tissue"),
("water", "Water"),
("skin", "Skin"),
],
max_length=255,
),
),
migrations.AlterField(
model_name="labmetadata",
name="type",
field=models.CharField(
choices=[
("ctDNA", "Ct Dna"),
("ctTSO", "Ct Tso"),
("exome", "Exome"),
("other", "Other"),
("10X", "Ten X"),
("TSO-DNA", "Tso Dna"),
("TSO-RNA", "Tso Rna"),
("WGS", "Wgs"),
("WTS", "Wts"),
("BiModal", "Bi Modal"),
("MeDIP", "Me Dip"),
("Metagenm", "Metagenm"),
("MethylSeq", "Methyl Seq"),
],
max_length=255,
),
),
]
33 changes: 26 additions & 7 deletions data_portal/models/labmetadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,10 @@ class LabMetadataType(models.TextChoices):
TSO_RNA = "TSO-RNA"
WGS = "WGS"
WTS = "WTS"
BI_MODAL = "BiModal"
ME_DIP = "MeDIP"
METAGENM = "Metagenm"
METHYL_SEQ = "MethylSeq"


class LabMetadataPhenotype(models.TextChoices):
Expand All @@ -30,24 +34,38 @@ class LabMetadataPhenotype(models.TextChoices):


class LabMetadataAssay(models.TextChoices):
AG_SS_CRE = "AgSsCRE"
CT_TSO = "ctTSO"
TSQ_NANO = "TsqNano"
TSQ_STR = "TsqSTR"
NEB_DNA = "NebDNA"
NEB_DNA_U = "NebDNAu"
NEB_RNA = "NebRNA"
PCR_FREE = "PCR-Free-Tagmentation"
TEN_X_3PRIME = "10X-3prime-expression"
TEN_X_5PRIME = "10X-5prime-expression"
TEN_X_ADT = "10X-ADT"
TEN_X_ATAC = "10X-ATAC"
TEN_X_CITE_FEAT = "10X-CITE-feature"
TEN_X_CITE_HASH = "10X-CITE-hashing"
TEN_X_CNV = "10X-CNV"
TEN_X_CSP = "10X-CSP"
TEN_X_GEX = "10X-GEX"
TEN_X_VDJ = "10X-VDJ"
TEN_X_VDJ_BCR = "10X-VDJ-BCR"
TEN_X_VDJ_TCR = "10X-VDJ-TCR"
AG_SS_CRE = "AgSsCRE"
B_ATAC = "bATAC"
CRISPR = "CRISPR"
CT_TSO = "ctTSO"
IDT_X_GEN = "IDTxGen"
ILMN_DNA_PREP = "IlmnDNAprep"
NEB_DNA_U = "NebDNAu"
NEB_MS = "NebMS"
PCR_FREE = "PCR-Free-Tagmentation"
TAKARA = "Takara"
TPL_X_HV = "TPlxHV"
TSO_DNA = "TSODNA"
TSO_RNA = "TSORNA"
TSQ_NANO = "TsqNano"
TSQ_STR = "TsqSTR"
BM_5L = "BM-5L"
BM_6L = "BM-6L"
ME_DIP = "MeDIP"
CT_TSO_V2 = "ctTSOv2"


class LabMetadataQuality(models.TextChoices):
Expand Down Expand Up @@ -76,6 +94,7 @@ class LabMetadataSource(models.TextChoices):
RNA = "RNA"
TISSUE = "tissue"
WATER = "water"
SKIN = "skin"


class LabMetadataWorkflow(models.TextChoices):
Expand Down
66 changes: 66 additions & 0 deletions data_portal/models/s3object.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,36 @@
import logging
import random
from typing import List

from django.db import models
from django.db.models import Max, QuerySet, Q
from libumccr import libregex

from data_portal.exceptions import RandSamplesTooLarge
from data_portal.fields import HashField
from data_portal.models import LabMetadata
from data_portal.models.labmetadata import LabMetadataAssay

logger = logging.getLogger(__name__)


def _strip_topup_rerun_from_library_id_list(library_id_list: List[str]) -> List[str]:
"""
TODO copy from liborca, perhaps refactor to libumccr
"""
rglb_id_set = set()
for library_id in library_id_list:
# Strip _topup
rglb = libregex.SAMPLE_REGEX_OBJS['topup'].split(library_id, 1)[0]

# Strip _rerun
rglb = libregex.SAMPLE_REGEX_OBJS['rerun'].split(rglb, 1)[0]

rglb_id_set.add(rglb)

return list(rglb_id_set)


class S3ObjectManager(models.Manager):
"""
Manager class for S3 objects, providing additional helper methods.
Expand Down Expand Up @@ -113,6 +134,51 @@ def get_subject_sash_results(self, subject_id: str, **kwargs) -> QuerySet:
qs = qs.filter(bucket=bucket)
return qs

def get_subject_cttsov2_results(self, subject_id: str, **kwargs) -> QuerySet:
# get cttsov2 libraries
subject_meta_list: List[LabMetadata] = LabMetadata.objects.filter(
subject_id=subject_id,
assay__iexact=str(LabMetadataAssay.CT_TSO_V2.value).lower()
).all()

cttsov2_libraries: List[str] = list()
for meta in subject_meta_list:
cttsov2_libraries.append(meta.library_id)

# strip library suffixes
minted_cttsov2_libraries = _strip_topup_rerun_from_library_id_list(cttsov2_libraries)

# if the subject_id has no cttsov2 assay library then skip all together
if not minted_cttsov2_libraries:
return self.none()

# baseline queryset
qs: QuerySet = self.filter(key__icontains="/cttsov2/")

# TODO
# for baseline queryset, we can also consider bucket filter for a tad more performance boost
# but this also makes dependency upon bucket name look up
# anyway, unlike Athena; Django to vanilla SQL query on a RDBMS table is already fast enough with index lookup
# we can observe current approach and explore this down the track

# create library filter Q
lib_q = Q()
for lib in minted_cttsov2_libraries:
lib_q.add(data=Q(key__icontains=lib), conn_type=Q.OR)

# create file of interest Q
tmb_metrics_csv_q = Q(key__iregex='tmb.metrics.csv$')
all_bam_q = Q(key__iregex='.bam$')
all_results_q = Q(key__icontains='/Results/')

q_results: Q = (
tmb_metrics_csv_q | all_results_q | all_bam_q
) & lib_q

qs = qs.filter(q_results)

return qs


class S3Object(models.Model):
"""
Expand Down
2 changes: 2 additions & 0 deletions data_portal/viewsets/subject.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ def retrieve(self, request, pk=None, **kwargs):
results = S3Object.objects.get_subject_results(pk).all()
results_gds = GDSFile.objects.get_subject_results(pk).all()
results_sash = S3Object.objects.get_subject_sash_results(pk).all()
results_cttsov2 = S3Object.objects.get_subject_cttsov2_results(pk).all()

features = []

Expand All @@ -63,5 +64,6 @@ def retrieve(self, request, pk=None, **kwargs):
data.update(results=S3ObjectModelSerializer(results, many=True).data)
data.update(results_sash=S3ObjectModelSerializer(results_sash, many=True).data)
data.update(results_gds=GDSFileModelSerializer(results_gds, many=True).data)
data.update(results_cttsov2=S3ObjectModelSerializer(results_cttsov2, many=True).data)

return Response(data)
2 changes: 1 addition & 1 deletion data_processors/lims/services/google_lims_srv.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ def persist_lims_data(df: pd.DataFrame, rewrite: bool = False) -> Dict[str, int]
"""
logger.info(f"Start processing LIMS data")

df = df.applymap(_clean_data_cell)
df = df.map(_clean_data_cell)
# df = df.drop_duplicates() # Defer handling row duplicate a bit further down for invalid rows stat
df = df.reset_index(drop=True)

Expand Down
2 changes: 1 addition & 1 deletion data_processors/lims/services/labmetadata_srv.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ def persist_labmetadata(df: pd.DataFrame):
}

df = clean_columns(df)
df = df.applymap(_clean_data_cell)
df = df.map(_clean_data_cell)
df = df.drop_duplicates()
df = df.reset_index(drop=True)

Expand Down
8 changes: 8 additions & 0 deletions data_processors/pipeline/domain/tests/test_workflow.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,12 +42,20 @@ def test_secondary_analysis_helper(self):
self.assertIn("outputDirectory", eng_params)
self.assertIn("SBJ0001", eng_params['workDirectory'])
self.assertIn("SBJ0001", eng_params['outputDirectory'])
self.assertEqual(eng_params['tesUseInputManifest'], "never")

tso_helper = SecondaryAnalysisHelper(WorkflowType.DRAGEN_TSO_CTDNA)
tso_eng_params = tso_helper.get_engine_parameters(target_id="SBJ0002")
logger.info(tso_eng_params)
self.assertIn("maxScatter", tso_eng_params)
self.assertEqual(tso_eng_params['maxScatter'], 8)
self.assertEqual(tso_eng_params['tesUseInputManifest'], "never")

tn_helper = SecondaryAnalysisHelper(WorkflowType.TUMOR_NORMAL)
tn_eng_params = tn_helper.get_engine_parameters(target_id="SBJ0002")
logger.info(tn_eng_params)
self.assertIn("tesUseInputManifest", tn_eng_params)
self.assertEqual(tn_eng_params['tesUseInputManifest'], "never")

def test_secondary_analysis_helper_block_wgts_qc_type(self):
"""
Expand Down
4 changes: 4 additions & 0 deletions data_processors/pipeline/domain/workflow.py
Original file line number Diff line number Diff line change
Expand Up @@ -244,6 +244,10 @@ def get_engine_parameters(self, target_id: str, secondary_target_id=None) -> dic
# See https://github.com/umccr-illumina/cwl-iap/issues/200
engine_params.update(maxScatter=8)

# https://github.com/umccr/data-portal-apis/issues/671
# We can enable this TES override flag for all workflow types
engine_params.update(tesUseInputManifest="never")

return engine_params

def construct_workflow_name(self, subject_id: str, sample_name: str):
Expand Down
Loading