EVA-3696 - Processing via a scanner and new brokering method #232

tcezard · 2024-12-16T14:13:38Z

No description provided.

apriltuesday

Great work, also very useful for me because it just makes the orchestration more concrete and easier for me to understand.

For testing, I wouldn't mind including some integration tests against our submission-ws and Biosamples in dev, as long as we clean up the dev dbs and the tests aren't too long (or run selectively, e.g. manually triggered or on tags only) I think it's fine.

We can also just mock the submission-ws and write unit tests. Incidentally I think this is also easier if we use a client object as per my suggestion, means just one thing to mock rather than patching all over the place...

eva_sub_cli_processing/sub_cli_validation.py

eva_sub_cli_processing/sub_cli_brokering.py

apriltuesday · 2024-12-18T11:37:49Z

eva_sub_cli_processing/sub_cli_utils.py

+PROCESSING_STATUS = [READY_FOR_PROCESSING, FAILURE, SUCCESS, RUNNING, ON_HOLD]
+
+
+def sub_ws_auth():


Do you think it's worth extracting the submission WS client into common-pyutils, so it can be used in both eva-sub-cli and eva-submission? It's some extra refactoring, but I think it could be beneficial in the long run to keep python interactions with the submission WS in one place.

Yes that would be useful. I'll create a new ticket for it.

eva_submission/biosample_submission/biosamples_submitters.py

eva_sub_cli_processing/process_jobs.py

eva_sub_cli_processing/sub_cli_brokering.py

apriltuesday · 2024-12-18T16:24:39Z

eva_sub_cli_processing/process_jobs.py


-    def scan(self):
+    def _scan_per_status(self):


I was initially very confused about why we needed to scan both tables, before I realised that this scan is (I think) only used to add the first processing step. If that's the case, then maybe it could be a bit less generic and used only in that specific situation - even if we used it for other operations (e.g. scanning for cancelled submissions to clean up the db or something), I don't think creating a SubmissionStep for validation would make sense in those cases.

I agree that it is a bit confusing.
The way it is currently implemented we have a generic scanner that can find submissions with different status of processing step/processing status.
Then subclasses define what these steps and statuses are.
Only the new submission scanner was implemented but I've added the other onenow.

I get that, I just don't see what other Scanners would use _scan_for_new_per_submission_status besides the new submission scanner, so maybe that logic could be exclusive to that scanner. It's not a big deal though, thanks for adding the other scanners too.

apriltuesday · 2024-12-18T16:31:22Z

eva_sub_cli_processing/process_jobs.py

        pretty_print(header, lines)


 class NewSubmissionScanner(SubmissionScanner):

    statuses = ['UPLOADED']
+    step_statuses = []


This is mostly a matter of taste, but I think I would prefer the different scanning tasks as methods rather than classes. So we would have just one SubmissionScanner with the generic _scan_per_step_status helper method, and find_new_submissions, find_completed_submission_steps, etc. that call that method with the relevant statuses.

On the other hand, maybe there's more functionality that would go into these subclasses that I'm not thinking of, in which case having the extra classes makes sense.

That definitely could be refactored that way. I guess I was worked that this would make the SubmissionScanner class too bug but that might not be a valid concern.

tcezard added 2 commits December 16, 2024 14:12

processing via a scanner and new brokering method

e265b27

processing via a scanner and new brokering method

b61560d

tcezard requested review from apriltuesday and nitin-ebi December 17, 2024 14:55

apriltuesday reviewed Dec 18, 2024

View reviewed changes

tcezard mentioned this pull request Dec 19, 2024

Validate sample check when VCF contains aggregated genotypes #233

Merged

tcezard added 2 commits January 6, 2025 11:07

Address review comments

bb6f090

fix tests

28374b0

apriltuesday approved these changes Jan 6, 2025

View reviewed changes

tcezard merged commit 27d8be1 into EBIvariation:master Jan 6, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EVA-3696 - Processing via a scanner and new brokering method #232

EVA-3696 - Processing via a scanner and new brokering method #232

tcezard commented Dec 16, 2024

apriltuesday left a comment

apriltuesday Dec 18, 2024

tcezard Jan 6, 2025

apriltuesday Dec 18, 2024

tcezard Jan 6, 2025

apriltuesday Jan 6, 2025

apriltuesday Dec 18, 2024

tcezard Jan 6, 2025

		PROCESSING_STATUS = [READY_FOR_PROCESSING, FAILURE, SUCCESS, RUNNING, ON_HOLD]


		def sub_ws_auth():

EVA-3696 - Processing via a scanner and new brokering method #232

EVA-3696 - Processing via a scanner and new brokering method #232

Conversation

tcezard commented Dec 16, 2024

apriltuesday left a comment

Choose a reason for hiding this comment

apriltuesday Dec 18, 2024

Choose a reason for hiding this comment

tcezard Jan 6, 2025

Choose a reason for hiding this comment

apriltuesday Dec 18, 2024

Choose a reason for hiding this comment

tcezard Jan 6, 2025

Choose a reason for hiding this comment

apriltuesday Jan 6, 2025

Choose a reason for hiding this comment

apriltuesday Dec 18, 2024

Choose a reason for hiding this comment

tcezard Jan 6, 2025

Choose a reason for hiding this comment