Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Homopolymer compression is not applied if the first read file is empty #31

Open
maickrau opened this issue Feb 20, 2023 · 1 comment
Open

Comments

@maickrau
Copy link

Running count compress with multiple read files and an empty file as the first file does not apply homopolymer compression. The following command creates an index without homopolymer compression:

meryl count compress k=21 threads=4 memory=32g empty.fa reads.fa output kmers_withempty

But putting the empty file as the not first file will correctly create a homopolymer compressed index:

meryl count compress k=21 threads=4 memory=32g reads.fa empty.fa output kmers_withempty2

meryl print shows the first file is not homopolymer compressed but the second is:

$ meryl print kmers_withempty/ | head

Found 1 command tree.

PROCESSING TREE #1 using 1 thread.
  opLessThan
    kmers_withempty/
    print to (stdout)
AAAAAAAAAAAAAAAAATAAG   1
AAAAAAAAAAAAAAAACTACA   1
AAAAAAAAAAAAAAAATAAGG   1
AAAAAAAAAAAAAAACAATAC   1
AAAAAAAAAAAAAAACTACAG   1
AAAAAAAAAAAAAAATAAGGA   1
AAAAAAAAAAAAAACAATACT   1
AAAAAAAAAAAAAACTACAGA   1
AAAAAAAAAAAAAATAAGGAG   1
AAAAAAAAAAAAAAGTACTTT   1

$ meryl print kmers_withempty2 | head

Found 1 command tree.

PROCESSING TREE #1 using 1 thread.
  opLessThan
    kmers_withempty2/
    print to (stdout)
ACACACACACACACACTACTA   1
ACACACACACACACTACTACT   1
ACACACACACACATCATATAC   1
ACACACACACACTACAGACAT   1
ACACACACACACTACAGATCA   1
ACACACACACACTACTACTAC   2
ACACACACACATCATATACAG   1
ACACACACACTACAGACATCA   1
ACACACACACTACAGATCATC   1
ACACACACACTACTACTACTA   4

$ meryl --version
meryl snapshot v1.4-development +29 changes (r969 97d5923dd69ebc3efed67fc466c21ed8c5e6670b)
@brianwalenz
Copy link
Member

Thanks, Mikko. It's not just an empty first file that causes trouble. The 'compress' flag is reset after EACH file. The workaround is simple but annoying: add 'compress' before each input file.

I remember debating if this flag should be reset or not. I'm a little embarrassed I left it in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants