Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

details and notes for updating manifest of sketches in wort-genomes #1965

Open
ctb opened this issue Apr 20, 2022 · 1 comment
Open

details and notes for updating manifest of sketches in wort-genomes #1965

ctb opened this issue Apr 20, 2022 · 1 comment

Comments

@ctb
Copy link
Contributor

ctb commented Apr 20, 2022

since I forgot most of this in the intervening week, I'm writing it down here ;).

Uses code in #1808.

in ~ctbrown/scratch/fromfile/test.apr20, run:

find /group/ctbrowngrp/irber/data/wort-data/wort-genomes/sigs \
     -type f > ../irber-genome-sigs.txt.apr20

../database-examples/sigs-to-manifest.py \
        --previous entire.2022-04-13.sqlmf.new --merge -F sql \
        -o entire.2022-04-20.sqlmf ../irber-genome-sigs.txt.apr20

time -v shows:

loading previous manifest from 'entire.2022-04-13.sqlmf.new'
loaded 4756209 rows with 1585403 distinct sig files from 'entire.2022-04-13.sqlmf.new'
Loading filenames from ../irber-genome-sigs.txt.apr20.
Loaded 33 manifest rows from files in '../irber-genome-sigs.txt.apr20'
merging previous rows into current.
saved 4756242 manifest rows to 'entire.2022-04-20.sqlmf'
        Command being timed: "bash ../update-script.txt"
        User time (seconds): 169.60
        System time (seconds): 20.85
        Percent of CPU this job got: 80%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 3:56.12
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 6748292

so about 4 minutes and under 7 GB of RAM.

@bluegenes
Copy link
Contributor

FYI for dib-lab folks, I've made a central spot on farm: /group/ctbrowngrp/sourmash-db/wort-manifests and included an update script using the code above. Note this version automatically adds a date to the new filenames.

Notes:

  1. downloading new wort signatures to farm is somewhat manual at the moment, check with luiz :)
  2. It's best to update the previous_manifest in the script to the latest version before running

script: /group/ctbrowngrp/sourmash-db/wort-manifests/update-wort-manifest.sh, reproduced here:

NOW=$(date +"%Y-%m-%d")
updated_siglist=$NOW.wort.sigs.txt
previous_manifest=./entire.2022-04-26.sqlmf
updated_manifest=./$NOW.wort.sqlmf

# first, get updated list of wort signatures
find /group/ctbrowngrp/irber/data/wort-data/wort-genomes/sigs \
     -type f > $updated_siglist

#Now, build a manifest from this list of files
python /group/ctbrowngrp/sourmash-db/database-examples/sigs-to-manifest.py \
       --previous $previous_manifest --merge -F sql \
        -o $updated_manifest $updated_siglist

chmod a-w $updated_manifest

Relies on code from https://github.com/sourmash-bio/database-examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants