Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize transcriptome simulation #158

Merged
merged 7 commits into from
Apr 8, 2022
Merged

Optimize transcriptome simulation #158

merged 7 commits into from
Apr 8, 2022

Conversation

kmnip
Copy link
Collaborator

@kmnip kmnip commented Mar 31, 2022

  • consolidated code for progress updates for number of read simulated in metagenome/transcriptome/genome modes to a new method check_print_progress
  • reduced frequency of writing updates to stdout, e.g. once every 10,000 reads simulated instead of 100
  • optmized code for transcriptome simulation

Simulated 500,000 reads using 4 threads on the same machine for 3 trials:

trial before optimized
1 1h:00m:13s 41m:42s 16m:38s
2 1h:12m:52s 50m:47s 16m:40s
3 1h:03m:18s 44m:53s 16m:42s

About ~75% reduction in wallclock runtime.

@kmnip kmnip requested a review from SaberHQ March 31, 2022 21:46
@kmnip
Copy link
Collaborator Author

kmnip commented Apr 2, 2022

I have further optimized the code in afe3cbe.

A new sample is only drawn from the KDE if the existing sample was previously used to simulate a read for the candidate transcript.

I have repeated my benchmarking trials on the same machine and updated the table above. The reduction in runtime is now ~75% (instead of ~30%)!

@kmnip
Copy link
Collaborator Author

kmnip commented Apr 4, 2022

Added a potential fix for #156

@kmnip kmnip linked an issue Apr 4, 2022 that may be closed by this pull request
@kmnip kmnip changed the title Optimize transcriptome similation Optimize transcriptome simulation Apr 5, 2022
@SaberHQ
Copy link
Collaborator

SaberHQ commented Apr 6, 2022

I have further optimized the code in afe3cbe.

A new sample is only drawn from the KDE if the existing sample was previously used to simulate a read for the candidate transcript.

I have repeated my benchmarking trials on the same machine and updated the table above. The reduction in runtime is now ~75% (instead of ~30%)!

This is really a neat and clever optimization @kmnip. Thanks for this. I also benchmarked it by simulating 1 million and 25 million reads using 38 threads on a same machine and the results demonstrates a reduction in wallclock runtime.

Number of reads before optimized
1 million 15m:56s 10m:29s
25 million 29h:46m:26s 5h:34m:16s

@SaberHQ
Copy link
Collaborator

SaberHQ commented Apr 8, 2022

This pull request potentially fixes #159 and #131

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reads simulation stuck in an infinite loop or something...
2 participants