Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up spot reading #302

Merged
merged 4 commits into from
Nov 26, 2024
Merged

Speed up spot reading #302

merged 4 commits into from
Nov 26, 2024

Conversation

jpjarnoux
Copy link
Member

When reading large pangenome with numerous spots, PPanGGOLiN took a long time to read spots.

This was due to the fact that for each line all families in the spot where associated to the spot, even if they were already associated.

This issue was fixed here.

Benchmark:

Dataset of 3083 E.Coli genomes, with 2036 spots.
Before: read spots 25 minutes, total time 36 minutes
Now: read spots 3.5 seconds, total time 9 minutes 38sec

Copy link
Member

@axbazin axbazin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, nice catch !

:param pangenome: Pangenome object without spot
:param h5f: Pangenome HDF5 file with spot computed
:param disable_bar: Disable the progress bar
Args:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we changing format in the comment lines?

@axbazin axbazin merged commit d5df368 into dev Nov 26, 2024
6 checks passed
@axbazin axbazin deleted the readFastSpot branch November 26, 2024 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants