Optimize transcriptome simulation #158

kmnip · 2022-03-31T21:46:01Z

consolidated code for progress updates for number of read simulated in metagenome/transcriptome/genome modes to a new method check_print_progress
reduced frequency of writing updates to stdout, e.g. once every 10,000 reads simulated instead of 100
optmized code for transcriptome simulation

Simulated 500,000 reads using 4 threads on the same machine for 3 trials:

trial	before	optimized
1	1h:00m:13s	~~41m:42s~~ 16m:38s
2	1h:12m:52s	~~50m:47s~~ 16m:40s
3	1h:03m:18s	~~44m:53s~~ 16m:42s

About ~75% reduction in wallclock runtime.

kmnip · 2022-04-02T19:10:07Z

I have further optimized the code in afe3cbe.

A new sample is only drawn from the KDE if the existing sample was previously used to simulate a read for the candidate transcript.

I have repeated my benchmarking trials on the same machine and updated the table above. The reduction in runtime is now ~75% (instead of ~30%)!

…ated

kmnip · 2022-04-04T19:40:21Z

Added a potential fix for #156

SaberHQ · 2022-04-06T01:11:30Z

I have further optimized the code in afe3cbe.

A new sample is only drawn from the KDE if the existing sample was previously used to simulate a read for the candidate transcript.

I have repeated my benchmarking trials on the same machine and updated the table above. The reduction in runtime is now ~75% (instead of ~30%)!

This is really a neat and clever optimization @kmnip. Thanks for this. I also benchmarked it by simulating 1 million and 25 million reads using 38 threads on a same machine and the results demonstrates a reduction in wallclock runtime.

Number of reads	before	optimized
1 million	15m:56s	10m:29s
25 million	29h:46m:26s	5h:34m:16s

…ch those from reference transcriptome

SaberHQ · 2022-04-08T07:46:13Z

This pull request potentially fixes #159 and #131

kmnip added 2 commits March 30, 2022 19:35

optimize transcriptome simulation

f459140

reduce frequency of progress output

5231216

kmnip requested a review from SaberHQ March 31, 2022 21:46

sample from KDE only when necessary

afe3cbe

fix bug where transcripts without a "ENS" name prefix cannot be simul…

92f0af0

…ated

kmnip linked an issue Apr 4, 2022 that may be closed by this pull request

Reads simulation stuck in an infinite loop or something... #156

Closed

add error messages for ill-formatted expression profile

5fc8468

kmnip changed the title ~~Optimize transcriptome similation~~ Optimize transcriptome simulation Apr 5, 2022

SaberHQ mentioned this pull request Apr 8, 2022

Update dependent package requirements #159

Closed

SaberHQ added 2 commits April 8, 2022 00:05

Update dependant package requirements

f192c2e

Error message when transcript IDs in the expression file does not mat…

07f4bec

…ch those from reference transcriptome

SaberHQ approved these changes Apr 8, 2022

View reviewed changes

SaberHQ merged commit fc5a67b into master Apr 8, 2022

This was referenced Apr 9, 2022

Issue with sklearn.neighbors.kde #131

Closed

Simulation stops and hangs #112

Open

Issues with simulator.py and sklearn modules #120

Closed

kmnip deleted the opt_sim branch April 11, 2022 21:20

SaberHQ mentioned this pull request Apr 19, 2022

NanoSim installation failure #162

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize transcriptome simulation #158

Optimize transcriptome simulation #158

kmnip commented Mar 31, 2022 •

edited

Loading

kmnip commented Apr 2, 2022

kmnip commented Apr 4, 2022

SaberHQ commented Apr 6, 2022 •

edited

Loading

SaberHQ commented Apr 8, 2022

Optimize transcriptome simulation #158

Optimize transcriptome simulation #158

Conversation

kmnip commented Mar 31, 2022 • edited Loading

kmnip commented Apr 2, 2022

kmnip commented Apr 4, 2022

SaberHQ commented Apr 6, 2022 • edited Loading

SaberHQ commented Apr 8, 2022

kmnip commented Mar 31, 2022 •

edited

Loading

SaberHQ commented Apr 6, 2022 •

edited

Loading