-
Notifications
You must be signed in to change notification settings - Fork 583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mpileup error messages #1652
Comments
I'm guessing this is a bug in the Is it possible to do this on successively smaller bam files (easiest done in SAM) to find a small test case that triggers it? I'll explore some more at my end, but getting public data with mods in it is challenging, so most of my testing was with made up values rather than real data. If you're confident, it's also possible to build with more debugging information which would spot memory corruptions. Not for production usage, but this can pinpoint errors more precisely (given a sufficiently modern compiler; can use clang too):
|
This attempted to grow memory by the maximum amount of space a base modification would take up, but due to a misunderstanding of kstring it kept adding this to the original size rather than actually growing the allocated size. (Probably) fixes samtools/samtools#1652
I've found a bug in htslib which likely triggers this problem. See samtools/htslib#1430 It may undergo other improvements yet, but it would be helpful if you could confirm whether this change cures your issue. It's a small two line change to htslib/sam.c:
|
Thank you so much for the quick response! Just want to make sure that I need to change the bam_plp_insertion_mod function in ./htslib-1.15/sam.c. Following is my current bam_plp_insertion_mod function, just want to make sure we are on the same page. If possible, would you please give me the entire updated bam_plp_insertion_mod function? Thank you very much! int bam_plp_insertion_mod(const bam_pileup1_t *p,
} |
The easiest way of editing it is simply to cut and paste the quoted diff above and save to e.g. file
You can also see the whole function in sam.c file from the git PR with the fix in it: |
This attempted to grow memory by the maximum amount of space a base modification would take up, but due to a misunderstanding of kstring it kept adding this to the original size rather than actually growing the allocated size. (Probably) fixes samtools/samtools#1652
Thank you @jkbonfield for the instructions! I fixed the function as you suggested, but still get the errors (including some new errors):
Just FYI, I'm processing some long nanopore sequencing reads. Maybe that causes the problem? |
Is any of your data public? It's hard to diagnose and fix the issue without having data to trigger? It may only need 1 read also. How soon does it die? Is it possible to use head/tail on a SAM version of your input file to gradually reduce it down to get a minimal data set that triggers the issue? |
Hey @jkbonfield the data I analyzed is private. So maybe I upload the data to Google drive and share with you? May I have your gmail account? Since almost all the reads gave error messages and the messages are different, maybe we should look into the merged bam? Any thoughts? P.S. It seems that mpileup gives error messages when analyzing reads with abundant modifications (Mm tag). |
The messages being different is probably irrelevant, as any malloc related error message is most likely the same bug showing up in different ways. If you can reproduce it with a single read then that makes for an easy to debug test case. You can email me direct using my gmail.com email, which has the same username as my github handle. Thanks |
Just emailed you the example read:) |
It seems the file is too big and gmail cannot deliver it. Just shared it
using google drive.
Hongxu Ding ***@***.***> 于2022年5月11日周三 09:19写道:
… Here you go:)
The error message I got is:
free(): invalid next size (fast)
Aborted
To reproduce the error message, you might run:
samtools view -h -b -F 4 $file > $file.mapped.bam
samtools mpileup --output-mods --positions dna.bed $file.mapped.bam >
$file.pileup
Thank you!
James Bonfield ***@***.***> 于2022年5月11日周三 01:19写道:
> The messages being different is probably irrelevant, as any malloc
> related error message is most likely the same bug showing up in different
> ways. If you can reproduce it with a single read then that makes for an
> easy to debug test case.
>
> You can email me direct using my gmail.com email, which has the same
> username as my github handle.
>
> Thanks
>
> —
> Reply to this email directly, view it on GitHub
> <#1652 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AEZJ22KP3HTVT3TPUUTWLILVJNUR5ANCNFSM5VAQHBTQ>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
|
I've downloaded your test data (thank you) and can reproduce the issue before I apply samtools/htslib#1430, but not after. I also tried running it under Valgrind and built using "gcc -fsanitize=address". Neither tool identifies an issue. So I'm confident that the bug I fixed was the same one you are seeing. Did you recompile both htslib and samtools after applying the fix? Are you certain you are running the correct samtools binary? |
Ahhh I see! I didn't re-compile samtools and that might be the problem. Will try and keep you posted! |
Yes it works! I did extensive testing and got no error! Thank you so much for the help @jkbonfield ! |
Please to have resolved it. Thanks for the aid in debugging this. |
Sure! BTW, will the fix be included in the next release? When would be the next release? Thank you very much! |
It's not merged in yet, but as it's an obvious bug with a simple fix I don't see any blockers to it going through review and merge so I'm sure it'll be in the next release. Our typical release cycle is every 3-4 months, but it can vary a little bit sometimes. Based on that "summer" sounds sufficiently broad to be an accurate guess on the release. I can't say more accurately. |
Got it! Thank you so much @jkbonfield ! |
This attempted to grow memory by the maximum amount of space a base modification would take up, but due to a misunderstanding of kstring it kept adding this to the original size rather than actually growing the allocated size. (Probably) fixes samtools/samtools#1652
Are you using the latest version of samtools and HTSlib? If not, please specify.
(run
samtools --version
)(built as a singularity container)
samtools 1.15
Using htslib 1.15
Copyright (C) 2022 Genome Research Ltd.
Please describe your environment.
uname -m
on Linux/Mac OS orwmic os get OSArchitecture
on Windows)x86_64 (architecture of the singularity container)
gcc --version
orclang --version
)gcc (Debian 10.2.1-6) 10.2.1 20210110
Copyright (C) 2020 Free Software Foundation, Inc.
(gcc of the singularity container)
Please specify the steps taken to generate the issue, the command you are running and the relevant output.
Greetings!
As I am running the mpileup module across multiple bam files, some bam files might produce one of the following errors:
The code I ran:
for file in *.bam; do
samtools view -h -b -F 4 $file > $file.mapped.bam
samtools mpileup --output-mods --positions dna.bed $file.mapped.bam > $file.pileup
done
It seems that mpileup breaks when there is an error, and produces a truncated pileup file. Any insights on what's going on? Thank you very much in advance for your help!
The text was updated successfully, but these errors were encountered: