Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sig extract and sig grep differ on duplicated md5sum #2802

Open
bluegenes opened this issue Oct 6, 2023 · 0 comments
Open

sig extract and sig grep differ on duplicated md5sum #2802

bluegenes opened this issue Oct 6, 2023 · 0 comments
Labels

Comments

@bluegenes
Copy link
Contributor

Some of our utilities may only keep *sig.gz files, leaving duplicates behind

sig extract does not keep duplicates:

sourmash sig extract -k 21 mm.dna.zip -o mm.dna-k21.zip

== This is sourmash version 4.8.4. ==
== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==

loaded 428 total that matched ksize & molecule type
extracted 328 signatures from 1 file(s)

but this was originally created via sig grep, which does keep duplicate sigs

sourmash sig grep mammarenavirus spillover.dna.zip -o mm.dna.zip

== This is sourmash version 4.8.4. ==
== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==

saving matching signatures to 'mm.dna.zip'
loaded 33566 total that matched ksize & molecule type
extracted 428 signatures from 1 file(s)

Confirmed this is not a reporting difference by listing files in *zip/signatures dir.

ref #2774 #1501 #2749

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant