Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

database file parameters used in count output are too large #30

Open
brianwalenz opened this issue Jan 17, 2023 · 0 comments
Open

database file parameters used in count output are too large #30

brianwalenz opened this issue Jan 17, 2023 · 0 comments

Comments

@brianwalenz
Copy link
Member

The prefixSize used for writing count output is too large when inputs are large too.

https://github.com/marbl/meryl/blob/master/src/meryl/merylOp-countThreads.C#L404

Sets the output prefix based on the 'optimal' prefix used for counting. It works fine for moderate kmer sizes (e.g., 22) but when larger (e.g., 28) database chunks are too big for merging.

Example:

prefix     # of   struct   kmers/    segs/      min     data    total
  bits   prefix   memory   prefix   prefix   memory   memory   memory
------  -------  -------  -------  -------  -------  -------  -------
    14    16 kP    66 MB    98 kM   130  S    64 MB  8320 MB  8386 MB
    15    32 kP   117 MB    49 kM    64  S   128 MB  8192 MB  8309 MB
    16    64 kP   217 MB    24 kM    31  S   256 MB  7936 MB  8153 MB  Best Value!
    17   128 kP   420 MB    12 kM    16  S   512 MB  8192 MB  8612 MB
    18   256 kP   824 MB  6314  M     8  S  1024 MB  8192 MB  9016 MB
> meryl dumpIndex 001.meryl
Opened '001.meryl'.
  magic          0x646e496c7972656d33302e765f5f7865 'merylIndex__v.03'
  prefixSize     16
  suffixSize     40
  numFilesBits   6 (64 files)
  numBlocksBits  10 (1024 blocks)

But after merging, the prefix is more reasonable (though this is, iirc, a fixed hardcoded size). Merging seems to want to use around 1 GB per input database, not sure why.

> meryl dumpIndex 00x.meryl/
Opened '00x.meryl/'.
  magic          0x646e496c7972656d33302e765f5f7865 'merylIndex__v.03'
  prefixSize     12
  suffixSize     44
  numFilesBits   6 (64 files)
  numBlocksBits  6 (64 blocks)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant