Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory and time needed for sample_concordance_plink. #362

Merged
merged 11 commits into from
Dec 11, 2024

Conversation

rajwanir
Copy link

@rajwanir rajwanir commented Nov 29, 2024

Description:

This PR reduces time and memory requirement for rule sample_concordance_plink. It does so by:

  1. Implementing a faster _sort_ids sort IDs in read_genome. (789d401).
  2. Processes the .genome file in chunks to avoid high memory usage. (a6e5c73).

Testing:

Per current default branch, the sample_concordance_plink has following performance metrics for a sample of 123,244:

attempt cpus requested_max_gb requested_time_hr execution_time_hr used_max_gb status
1 1 57.3 10 10.0 203.3 failed
2 1 86.0 48 48.0 517.6 failed
3 1 114.7 96 54.9 611.2 completed

With this PR the sample_concordance_plink uses 0.9Gb memory and completes in 6.2 minutes.

Related issues:
#358, #241, #312

@rajwanir rajwanir requested a review from jaamarks November 29, 2024 15:45
@rajwanir rajwanir self-assigned this Nov 29, 2024
Copy link
Collaborator

@jaamarks jaamarks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job! This looks like it's going to be a massive improvement in computational efficiency. I've tested this on an N=500 sample with success.

sample_concordance_plink time comparison

previously

start: [Mon Dec  2 13:46:22 2024]
end:   [Mon Dec  2 13:52:52 2024]

Current

start: [Tue Dec  3 15:52:06 2024]
end:   [Tue Dec  3 15:52:07 2024]

src/cgr_gwas_qc/parsers/plink.py Outdated Show resolved Hide resolved
src/cgr_gwas_qc/workflow/scripts/concordance_table.py Outdated Show resolved Hide resolved
src/cgr_gwas_qc/workflow/scripts/concordance_table.py Outdated Show resolved Hide resolved
src/cgr_gwas_qc/workflow/scripts/concordance_table.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@jaamarks jaamarks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jaamarks jaamarks merged commit f545d00 into default Dec 11, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants