Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epigenetic Analysis Pipeline Issue: "Failed to Get Modbase Info AUX Data Not Found #1080

Open
priyanagpal25 opened this issue Oct 13, 2024 · 5 comments
Labels
mods For issues related to modified base calling

Comments

@priyanagpal25
Copy link

Issue Report

Please describe the issue:

I am working on epigenetic analysis for bacterial samples using Oxford Nanopore sequencing (FLO-MIN114), and I’m transitioning from Tombo to Dorado for modified basecalling since Tombo is now deprecated. I’m encountering an issue with processing the output files, and I’m unsure if my pipeline is set up correctly.

Sequencing Setup:

Sequencing chemistry: FLO-MIN114
Raw file format: POD5
Basecalling: Dorado (sup, m6A)
Software: MinKNOW
Alignment reference genome: FASTA

Steps Taken:

Basecalling and Demultiplexing:
On the MinKNOW interface, I first performed basecalling using sup, m6A and demultiplexed the samples. This produced both .fastq and .bam files.

File Information:
    .bam files seem to contain modified base information.
    .fastq files do not have modified base information.

Alignment:
I used the .bam files for alignment in MinKNOW, with a reference genome in FASTA format. This produced multiple .bam files and corresponding .bam.bai index files.

Merging:
I merged all .bam files using samtools merge:

samtools merge merged_output.bam *.bam
Then i indexed the merged .bam file
Modkit Pileup:
I ran the following command to generate modified base information:
modkit pileup merged_output.bam > /modkitoutput/pileup.bed
However, I encountered the following error:
Failed to get modbase info AUX data not found
Issue:

It appears that the modified base information is not being recognized by Modkit during the pileup process. The error suggests missing auxiliary (AUX) data, which seems related to the modification calls.

Questions:

What is the correct pipeline for modified basecalling for bacterial samples using Dorado?
Is there a specific step I am missing to ensure that modified base information is included in the .bam file?
Should I adjust my approach to alignment or demultiplexing to resolve this issue?

Any help with understanding the pipeline and resolving this error would be appreciated.

@priyanagpal25
Copy link
Author

for an individual bam file: samtools view -H fastq_runid_e77a6fda925a5796b8b74964b42548a9fe2be7ec_6_1.bam | grep -E "MM:|ML:"
no output observed

@HalfPhoton
Copy link
Collaborator

Hi @priyanagpal25,

for an individual bam file: samtools view -H fastq_runid_e77a6fda925a5796b8b74964b42548a9fe2be7ec_6_1.bam | grep -E "MM:|ML:"

The issue here is that you have the samtools view -H flag set so the grep is only searching the header and not the read tags.

samtools view --help
... 
-H, --header-only          Print SAM header only (no alignments)

Can you check again that the bam file has mods tags?

Kind regards,
Rich

@HalfPhoton HalfPhoton added the mods For issues related to modified base calling label Nov 4, 2024
@skranz0
Copy link

skranz0 commented Dec 5, 2024

The sequencing setup states Basecalling: Dorado (sup, m6A) but shouldn't it be 6mA or doesn't it matter? If there is no model for the modification stated (because of the switched characters) there would be no modification tags in the results. Though there should be a warning/error in that case.

@HalfPhoton
Copy link
Collaborator

@skranz0,

From the list of supported modification we see that 6mA and m6A are modifications supported by dorado but exclusive to DNA and RNA respectively, so this subtlety does matter.

What command was used to basecall dorado so that we can look into this further?

Did you use a m6A RNA model for on DNA strands to generate 6mA outputs?

Kind regards,
Rich

@malton-ont
Copy link
Collaborator

dorado aligner will strip modification tags from secondary/supplementary reads unless soft-clipping is enabled

dorado aligner --mm2-opts "-Y" ...

Could this be the cause of the modkit failure?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mods For issues related to modified base calling
Projects
None yet
Development

No branches or pull requests

4 participants