Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run QC for WTS alongside WGS #597

Merged
merged 10 commits into from
Aug 7, 2023
Merged

Conversation

alexiswl
Copy link
Member

@alexiswl alexiswl commented Jun 6, 2023

cba7478

Updates by file

data_processors.pipeline.domain.workflow.py

  • Added abstract workflow type DRAGEN_WGTS_QC.
  • Added labmetadata rule method must_be_wgts

data_processors.pipeline.lambdas.dragen_wgs_qc.py

  • Handle WTS data and add enable_rna option. True for WTS data, false for WGS data

data_processors.pipeline.orchestartion.dragen_wgs_qc_step.py

  • Set batcher runstep parameter to DRAGEN_WGTS_QC abstract workflow type
  • Replace must_be_wgs labmetadata rule with must_be_wgts
  • Update warning to use DRAGEN_WGTS_QC abstract workflow type

5eb62f3

Update orchestration logic

  • Now dragen wts happens after QC is complete.

  • Instead of performing dragen_wts after bclconvert, we first run dragen_wgts_qc (which now includes dragen_wts samples).

Updates by file:

* Added abstract workflow type DRAGEN_WGTS_QC.
* Added labmetadata rule method must_be_wgts

* Handle WTS data and add enable_rna option. True for WTS data, false for WGS data

* Set batcher runstep parameter to DRAGEN_WGTS_QC abstract workflow type
* Replace must_be_wgs labmetadata rule with must_be_wgts
* Update warning to use DRAGEN_WGTS_QC  abstract workflow type
Now dragen wts happens after QC is complete.

Instead of performing dragen_wts after bclconvert, we first run dragen_wgts_qc (which now includes dragen_wts samples).
@alexiswl alexiswl requested a review from victorskl June 6, 2023 06:51
@alexiswl
Copy link
Member Author

alexiswl commented Jun 6, 2023

@victorskl victorskl added the feature New feature label Jun 6, 2023
@victorskl victorskl added this to the Release 2.2 milestone Jun 6, 2023
Copy link
Member

@victorskl victorskl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good most of it, Alexis. Just that need to attend comment on Orchestrator on triggering cue.

data_processors/pipeline/domain/workflow.py Show resolved Hide resolved
data_processors/pipeline/lambdas/dragen_wgs_qc.py Outdated Show resolved Hide resolved
data_processors/pipeline/lambdas/orchestrator.py Outdated Show resolved Hide resolved
\## workflow.py

Added in dragen_wts_qc and dragen_wgts_qc into from_value class method

\## dragen_wgs_qc.py

Moved labmetadata improt to below try / except statement.

\## orchestrator.py

Split out WGS_QC and WTS_QC notification handling.
Dragen WTS now runs after dragen wts qc. Use similar logic to t/n runs (that work off a metadata list) rather than a batcher (as used in the qc runs).

Updates by file

\## orchestration/dragen_wts_step

\### Perform function

* Updated perform function to check if all dragen_wts_qc workflows have finished for a given sequencing run

* Invoke prepare_dragen_wts_jobs with a metadata list instead of a batcher object.

\### prepare dragen wts jobs function

Use similar (but simpler) logic to preparing of tn workflows

Collect fastq list rows by the library id input (grouped by library-id / subject-id).

And create the job dict with just the subject id, library id and fastq list rows object

Rather than use a labmetadatarule we use a query set (see metadata_srv changes for more info).

\## orchestration/tumor_normal_step

* Move handle rerun function to liborca
* Also fixed return in docstring

\## tools/liborca

* Collected handle rerun function from tumor normal step

\## services/metadata_srv

* Created queryset for collecting wts metdata by the wts qc runs

\## pipeline/lambdas/dragen_wts

* Handle new job input
* Remove all references to sequence_run and batch_run
* Assume subject Id is in the event dict
Copy link
Member

@victorskl victorskl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, few more minor comments to attend.

Copy link
Member

@victorskl victorskl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Alexis;

Could you please change PR to ready for review from draft, if you are ready..? This looks good so far from my side, minor comments to attend.

If that's all good I'll fix up the tests (assuming I've broken a few with these changes)

Do you still have bandwidth to follow up tests..?
If so, few pointers as follows...

  • Each python module file that has made changes; should have their test file counterpart in tests package.
  • First, you run each test file for unit test cases and fix where it broke, accordingly.
  • Optionally, run integration test cases there, if applicable and/or see fit.
  • Finally, run smoke test make test on the whole suite and, it must pass.

Here are starter, based on file changes in PR.

  • Test domain logic code changes
python manage.py test data_processors.pipeline.domain.tests.test_workflow
  • Test front controller (Lambda) entrypoints changes
python manage.py test data_processors.pipeline.lambdas.tests.test_dragen_wgs_qc
python manage.py test data_processors.pipeline.lambdas.tests.test_dragen_wts
python manage.py test data_processors.pipeline.lambdas.tests.test_orchestrator
  • Test orchestration step module middleware changes
python manage.py test data_processors.pipeline.orchestration.tests.test_dragen_wgs_qc_step
python manage.py test data_processors.pipeline.orchestration.tests.test_dragen_wts_step
python manage.py test data_processors.pipeline.orchestration.tests.test_tumor_normal_step
  • Test service layer module changes
python manage.py test data_processors.pipeline.services.tests.test_metadata_srv

@alexiswl
Copy link
Member Author

Thanks Victor, I'll work on this tomorrow if you don't mind?

@victorskl
Copy link
Member

Of course, go for it, please..!

data_portal/tests/factories:

* Made new WtsQC and WgtsQC factories

*  lambdas/test/test_dragen_wgs_qc.py
  * Use TestConstant for library id since we need matches on labmetadata types

* lambdas/test/test_dragen_wts.oy
    * Include subject id in event.
    * Remove seq run id and seq name, not immediately downstream of bclconvert

*  lambdas/tests/test_orchestrator.py
    * Exclude DRAGEN_WTS_STEP from tests post bclconvert run

* lambdas/dragen_wgs_qc.py
    * Check labmetadata getter returns not none, otherwise raise value error

* lambdas/dragen_wts.py
    * Use link_library_runs_with_x_seq_workflow over library_runs_with_workflow

* lambdas/orchestartor.py
  * Set DRAGEN_WGTS_QC_STEP, DRAGEN_WGS_QC_STEP and DRAGEN_WTS_QC_STEP in the same 'skip-list' context

* orchestration/tests/test_dragen_wts_step.py
    * Overhaul of test_dragen_wts_step since it is now invoked post qc
@alexiswl
Copy link
Member Author

alexiswl commented Aug 1, 2023

Getting close!

======================================================================
ERROR: test_perform (data_processors.pipeline.orchestration.tests.test_dragen_wts_step.DragenWtsStepUnitTests.test_perform)
python manage.py test data_processors.pipeline.orchestration.tests.test_dragen_wts_step.DragenWtsStepUnitTests.test_perform
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/alexiswl/UMCCR/GitHub/data-portal-apis/data_processors/pipeline/orchestration/tests/test_dragen_wts_step.py", line 95, in test_perform
    self.assertEqual(results['submitting_subjects'][0], wts_mock_subject_id)
                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range

======================================================================
ERROR: test_perform (data_processors.pipeline.orchestration.tests.test_somalier_extract_step.SomalierExtractStepUnitTests.test_perform)
python manage.py test data_processors.pipeline.orchestration.tests.test_somalier_extract_step.SomalierExtractStepUnitTests.test_perform
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/alexiswl/UMCCR/GitHub/data-portal-apis/data_processors/pipeline/orchestration/tests/test_somalier_extract_step.py", line 88, in test_perform
    self.assertIn('index', results['somalier_extract_step'][0])
                           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range

----------------------------------------------------------------------
Ran 315 tests in 46.693s

FAILED (errors=2, skipped=40)
Destroying test database for alias 'default'...
make: *** [Makefile:65: test] Error 1

@alexiswl
Copy link
Member Author

alexiswl commented Aug 2, 2023

@victorskl would you mind having a look at these? I couldn't figure out the changes needed to get the test perform working and I think the orchestration logic might not be handing the DRAGEN_WGTS_QC step?

@victorskl
Copy link
Member

Ok, I will pull and check in tick.

@victorskl
Copy link
Member

Hi Alexis; pls see #607 for

python manage.py test data_processors.pipeline.orchestration.tests.test_dragen_wts_step.DragenWtsStepUnitTests.test_perform

TL;DR - it just need mock WTS tumor Lab meta info <> LibraryRun <> Workflow has to link up / match them up well for test condition.

@victorskl
Copy link
Member

I will check for somalier step, next..

@victorskl
Copy link
Member

Ah this one is easy

python manage.py test data_processors.pipeline.orchestration.tests.test_somalier_extract_step.SomalierExtractStepUnitTests.test_perform

Just roll back changes here

mock_wgc_qc_workflow: Workflow = DragenWgtsQcWorkflowFactory()

... to

mock_wgc_qc_workflow: Workflow = DragenWgsQcWorkflowFactory()

Explain

  • We don't really save to db with a workflow type WorkflowType.DRAGEN_WGTS_QC yet
  • It is still a "placeholder" or "common" workflow type to process together both WGS/WTS alignment QC
  • So, we should just still keep this unittest case as is. Which is, originally unit testing WorkflowType.DRAGEN_WGS_QC scenario and, still valid test case.

@alexiswl alexiswl marked this pull request as ready for review August 7, 2023 06:08
@alexiswl
Copy link
Member Author

alexiswl commented Aug 7, 2023

Woo!

...
----------------------------------------------------------------------
Ran 315 tests in 48.655s

OK (skipped=40)
Destroying test database for alias 'default'...

Copy link
Member

@victorskl victorskl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go for it, Alexis..!

@alexiswl alexiswl merged commit bffcb8d into dev Aug 7, 2023
1 check passed
@victorskl victorskl deleted the enhancement/wts-auto-qc-pipeline branch August 8, 2023 05:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants