Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check that reads don't exceed the maximal length #44

Merged
merged 5 commits into from
Apr 3, 2023

Conversation

csoneson
Copy link
Collaborator

@csoneson csoneson commented Apr 2, 2023

Also allow reads up to 65534 nt

@csoneson csoneson requested a review from mbstadler April 2, 2023 12:21
@mbstadler
Copy link
Collaborator

I think the check for having read a read that is too long is great!

Maybe we have to reduce the default read-length, though, or indeed expose it as a parameters, to avoid memory issues.
For example, the re-building of the vignette block starting at L174 fails under Linux on GHA with std::bad_alloc, indicating that it could not allocate requested memory. My guess is that this happened here:

FastqBuffer *chunkBuffer = new FastqBuffer((size_t)chunkSize, fastqReverseVect[0].compare("") != 0);

which will try to allocate a FastqBuffer instance with BUFFER_SIZE * chunksize * x bytes (x is 2 for single-red and 4 for paired-end data), which will here be around 64kb * 1e5 * 4 =~ 26GB (probably not available on the GitHub action runner). We could reduce sequence length, chunksize, or both. What about BUFFER_SIZE = 4096 and keep the current chunksize = 100000? That would allocate about 1.6GB for a paired end experiment.

@csoneson
Copy link
Collaborator Author

csoneson commented Apr 2, 2023

Yeah, seems I went a bit over the top (worked fine locally 🙂). A read length of 4096 will not be enough for the ONT data, so maybe we indeed do need to expose it as a parameter.

@mbstadler
Copy link
Collaborator

Right, in that case an exposed parameter and a lower default value would probably be best.

@codecov-commenter
Copy link

codecov-commenter commented Apr 2, 2023

Codecov Report

Merging #44 (aa3a249) into master (2359969) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master      #44   +/-   ##
=======================================
  Coverage   96.96%   96.96%           
=======================================
  Files          20       20           
  Lines        2369     2376    +7     
=======================================
+ Hits         2297     2304    +7     
  Misses         72       72           
Impacted Files Coverage Δ
R/digestFastqs.R 100.00% <100.00%> (ø)
src/FastqBuffer_utils.h 72.41% <100.00%> (+0.98%) ⬆️
src/digestFastqs.cpp 95.73% <100.00%> (+0.01%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

NEWS.md Outdated
# mutscan 0.2.36

* Check that reads don't exceed the maximal allowed length
* Allow read lengths up to 65534 nt
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess here we could say * Add parameter to specify maximal read length instead of the second item

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, yes, definitely - forgot to update this.

Copy link
Collaborator

@mbstadler mbstadler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a neat solution that leaves open all options, and the defaults are likely to cover the most common cases without creating a too large memory footprint. Apart from the comment on the NEWS.md file, I have no suggestions. Thank you!

@mbstadler
Copy link
Collaborator

The current failing checks on windows might be related to r-lib/actions#713 and should resolve itself soon

@csoneson csoneson merged commit b833ba0 into master Apr 3, 2023
@csoneson csoneson deleted the allow-long-reads branch April 3, 2023 06:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants