Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gather/fastgather benchmarks - feb 15, 2024 #214

Closed
ctb opened this issue Feb 15, 2024 · 10 comments
Closed

gather/fastgather benchmarks - feb 15, 2024 #214

ctb opened this issue Feb 15, 2024 · 10 comments

Comments

@ctb
Copy link
Collaborator

ctb commented Feb 15, 2024

on SRR606249 at k=31, scaled=1000

Summary

hackmd version

Screenshot 2024-02-19 at 4 54 39 PM

fastgather with zip file

v0.9.0-pre, as of #197/#211 merge (using sourmash core v0.12.1)

  • 2m 5s
  • 14 GB RAM
        Command being timed: "sourmash scripts fastgather SRR606249.trim.k31.sig.zip /group/ctbrowngrp/sourmash-db/gtdb-rs214/gtdb-rs214-k31.zip -o SRR606249.x.gtdb-rs214.fastgather.csv"
        User time (seconds): 6023.69
        System time (seconds): 31.83
        Percent of CPU this job got: 4837%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 2:05.17
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 14118520
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 974
        Minor (reclaiming a frame) page faults: 2421007
        Voluntary context switches: 45047
        Involuntary context switches: 151667
        Swaps: 0
        File system inputs: 199312
        File system outputs: 56
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0
@ctb
Copy link
Collaborator Author

ctb commented Feb 15, 2024

sourmash gather

as of v4.8.7

42m 26s
14.5 GB of RAM

        Command being timed: "sourmash gather -k 31 SRR606249.trim.k31.sig.gz /group/ctbrowngrp/sourmash-db/gtdb-rs214/gtdb-rs214-k31.zip -o SRR606249.x.gtdb-rs214.csv"
        User time (seconds): 2494.26
        System time (seconds): 58.98
        Percent of CPU this job got: 100%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 42:26.62
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 14473280
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 95793
        Minor (reclaiming a frame) page faults: 10369270
        Voluntary context switches: 3938
        Involuntary context switches: 210962
        Swaps: 0
        File system inputs: 24523896
        File system outputs: 336
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096

@ctb
Copy link
Collaborator Author

ctb commented Feb 16, 2024

fastgather with pathlist

36 min, 1.9 GB!?

        Command being timed: "sourmash scripts fastgather SRR606249.trim.k31.sig.zip gtdb-rs214-k31.d.file.list -o SRR606249.x.gtdb-rs214.fastgather.csv"
        User time (seconds): 7566.15
        System time (seconds): 176.87
        Percent of CPU this job got: 357%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 36:04.07
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 1916736
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 788119
        Voluntary context switches: 2386015
        Involuntary context switches: 1690833
        Swaps: 0
        File system inputs: 25911264
        File system outputs: 56
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

@ctb
Copy link
Collaborator Author

ctb commented Feb 16, 2024

rocksdb indexing

from zip file, 5 hours, 14 GB RAM.

        Command being timed: "sourmash scripts index -o gtdb-rs214-k31.rocksdb /group/ctbrowngrp/sourmash-db/gtdb-rs214/gtdb-rs214-k31.zip"
        User time (seconds): 47382.74
        System time (seconds): 44254.80
        Percent of CPU this job got: 506%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 5:01:47
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 14553672
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 41
        Minor (reclaiming a frame) page faults: 46714832
        Voluntary context switches: 1483801815
        Involuntary context switches: 1785402
        Swaps: 0
        File system inputs: 142267440
        File system outputs: 243309352
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

@ctb
Copy link
Collaborator Author

ctb commented Feb 16, 2024

I wonder if the zip memory stuff is related to sourmash-bio/sourmash#2340?

@ctb
Copy link
Collaborator Author

ctb commented Feb 16, 2024

fastmultigather with a rocksdb index

2m 8s, 600 MB RAM!

        Command being timed: "sourmash scripts fastmultigather SRR606249.trim.k31.sig.zip gtdb-rs214-k31.rocksdb"
        User time (seconds): 12.93
        System time (seconds): 21.23
        Percent of CPU this job got: 26%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 2:08.05
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 600260
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 5
        Minor (reclaiming a frame) page faults: 377146
        Voluntary context switches: 371919
        Involuntary context switches: 223163
        Swaps: 0
        File system inputs: 4682456
        File system outputs: 8
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

@ctb
Copy link
Collaborator Author

ctb commented Feb 16, 2024

old fastgather (v0.8.6) with a zip file

28m 34s!! but only 1.7 GB of RAM.

        Command being timed: "sourmash scripts fastgather SRR606249.trim.k31.sig.gz /group/ctbrowngrp/sourmash-db/gtdb-rs214/gtdb-rs214-k31.zip -o SRR606249.x.gtdb-rs214.fastgather.csv"
        User time (seconds): 6148.02
        System time (seconds): 231.95
        Percent of CPU this job got: 372%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 28:34.14
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 1679044
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 17
        Minor (reclaiming a frame) page faults: 1791154
        Voluntary context switches: 36714
        Involuntary context switches: 102717
        Swaps: 0
        File system inputs: 24407184
        File system outputs: 25830288
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

@ctb
Copy link
Collaborator Author

ctb commented Feb 16, 2024

old fastgather (v0.8.6) with a pathlist

2m 24s, 1.6 GB of RAM

        Command being timed: "sourmash scripts fastgather SRR606249.trim.k31.sig.gz gtdb-rs214-k31.d.file.list -o SRR606249.x.gtdb-rs214.fastgather.csv"
        User time (seconds): 4591.34
        System time (seconds): 131.89
        Percent of CPU this job got: 3276%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 2:24.16
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 1621440
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 14
        Minor (reclaiming a frame) page faults: 2114295
        Voluntary context switches: 1582072
        Involuntary context switches: 217324
        Swaps: 0
        File system inputs: 25911264
        File system outputs: 40
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

@ctb ctb changed the title some updated gather/fastgather benchmarks - feb 15, 2024 gather/fastgather benchmarks - feb 15, 2024 Feb 17, 2024
@ctb
Copy link
Collaborator Author

ctb commented Feb 18, 2024

with #221 -

pathlists with parallel loading

2m 26s, 1.8 GB of RAM.

        Command being timed: "sourmash scripts fastgather SRR606249.trim.k31.sig.zip gtdb-rs214-k31.d.file.list -o SRR606249.x.gtdb-rs214.fastgather.csv"
        User time (seconds): 6771.21
        System time (seconds): 165.43
        Percent of CPU this job got: 4744%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 2:26.21
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 1838548
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 38
        Minor (reclaiming a frame) page faults: 3288773
        Voluntary context switches: 1655356
        Involuntary context switches: 1765532
        Swaps: 0
        File system inputs: 81016
        File system outputs: 56
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

@ctb
Copy link
Collaborator Author

ctb commented Feb 18, 2024

scripts in /home/ctbrown/scratch3/SRR606249-feb13-2024

github repo for scripts: https://github.com/sourmash-bio/2024-benchmark-branchwater-plugin

srun command I used:

srun -p bmh --time=24:00:00 --nodes=1 --cpus-per-task=64 --mem=50GB --pty /bin/bash

@ctb
Copy link
Collaborator Author

ctb commented Jul 1, 2024

new benchmarks, for v0.9.5: sourmash-bio/sourmash#3232

@ctb ctb closed this as completed Jul 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant