Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CIGAR strings in "SA" tags differ from the CIGAR strings in the corresponding supplementary alignment records #724

Closed
fedarko opened this issue Apr 3, 2021 · 3 comments
Labels

Comments

@fedarko
Copy link
Contributor

fedarko commented Apr 3, 2021

Hi, and thank you for developing minimap2! I have two questions about an issue with supplementary alignments.

Problem description

For records that are part of a supplementary alignment in SAM files generated by minimap2, the CIGAR strings listed in the SA tags are different from the CIGAR strings listed for these supplementary alignments' records in the same file. It is not immediately clear which of the two CIGAR strings should be interpreted as the "canonical" CIGAR string for this alignment, or why the CIGAR strings are different.

It looks like this was brought up previously in #524 (comment) and in #287, which imply that the SA CIGAR string should not be relied on in these cases.

Questions

  1. As a general rule, does it make sense to ignore the CIGAR string listed in the SA tag -- and to always use the CIGAR string located on that alignment's own line in the SAM file instead?

  2. If so, do you think it would make sense to update the documentation to clarify this? From reading the description of the SA tag, it was not clear to me that CIGAR strings listed for these alignments were expected to be incorrect. I am happy to submit a PR that updates the README or the FAQ accordingly, in order to help future minimap2 users.

My apologies if I am misunderstanding anything!

Example showing the problem

This file, supplemental_alignment.sam.txt, is a subset of a SAM file generated by minimap2. This SAM file has been filtered to just the two lines originating from a read named m54033_180919_161442/4194410/ccs.

Line 1 describes the primary alignment of this read to a reference sequence named edge_25034, and includes a reference to a supplementary alignment of this read in a different reference sequence named edge_34620. The CIGAR string listed on Line 1 in the SA: tag for the edge_34620 supplementary alignment is 1136M 3I 5180S (spaces added for clarity).

Line 2 describes the edge_34620 supplementary alignment in detail: however, the CIGAR string listed on this line for this supplementary alignment is 23M 1I 537M 1I 64M 1I 512M 5180H, instead.

The counts of each operation generally match up (e.g. in both CIGAR strings there are 1,136 M operations) but the alignments represented by these strings are still slightly different.

The exact command to minimap2 used to generate the full SAM file was minimap2 -ax asm20 [reference FASTA file] [reads FASTQ file] > alignment.sam; the data is derived from this BioProject.

Software versions

minimap2 version: 2.17-r941 (installed using linuxbrew)
Running on Ubuntu version 16.04.7

@lh3 lh3 added the question label Apr 5, 2021
@lh3
Copy link
Owner

lh3 commented Apr 5, 2021

In minimap2, the SA tag mainly tells you the start and end coordinates of other alignments. It is not intended to keep detailed CIGARs.

@lh3 lh3 closed this as completed Apr 5, 2021
@CharlesARoy
Copy link

Hi @lh3, I am also very grateful for your work in developing Minimap2!

I think that Marcus makes some good points and I'd suggest updating Minimap2 so that it reports identical CIGAR strings in the SA tag and the SA record. This would be helpful for a few reasons:

  1. To my knowledge, the precedent with other aligners (such as BWA), is that the CIGAR strings in the SA tags will match those in the supplementary alignment record. This undocumented change can be confusing.
  2. Existing tools developed using the precedent of 1. will need to be partially rewritten to account for this change.
  3. It's somewhat misleading to include discordant information in the alignment records. This ambiguity wastes user's time by causing them to carefully (re)read the documentation and track down issues such as this to determine whether this is a bug. It would be better to leave the CIGAR field of the SA tag blank if it's expected to be inaccurate.

At a minimum, it would be helpful if the documentation was updated to clarify this point.

@fedarko
Copy link
Contributor Author

fedarko commented Nov 22, 2023

Thank you both! I agree with Charles' points; if keeping the SA tag in the slightly inaccurate "shortened" format is necessary, then I propose that we update minimap2's man page to explain this. Maybe the following line:

SA Z List of other supplementary alignments

could be changed to something like

SA	Z	List of other supplementary alignments (using approximate CIGAR strings)

This should make the situation much clearer. (If you'd prefer, we could also just add a link to this issue or to #287.)

If these changes seem reasonable, I would be happy to file a PR that updates the man page -- so that other users don't get stuck on this issue like we did.

fedarko added a commit to fedarko/minimap2 that referenced this issue May 18, 2024
lh3 pushed a commit that referenced this issue May 22, 2024
)

* Mention approx CIGAR strings in man page #724

* Fix FAQ typo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants