You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the RetroSeq VCF file the position for TE insertions relative to the reference are given on 1-based coordinates in the POS column. In addition, there are a set of two consecutive coordinates in the INFO field, the first of which corresponds to the POS column, and the second corresponds to the next base in the genome. Does this imply that the predicted insertion would intergate between the first and second positions in the INFO field? In other words, to convert RetroSeq predictions to 0-based coordinates, do we (i) use the two coordinates in the INFO field, or (ii) subtract 1 from the POS column to make a new start position on 0-based coordinates?
The text was updated successfully, but these errors were encountered:
Yes, that is correct. But to be honest, I never consider the breakpoints to be accurate to the exact bp. Some mini local assembly and realignment could get them to bp accuracy, I just never got around to implementing that.
We are assuming that "that is correct" refers to "Does this imply that the predicted insertion would integrate between the first and second positions in the INFO field?".
This means that RetroSeq is using the INFO field to represent the TE insertion location (which is in reality inter-base) on 1-based coordinates by annotating a consecutive span of 2 nucleotides, with the insertion site being between the first and second nucleotide. This 2-nucleotide span cannot be represented directly in the POS column of the VCF file, which only allows a 1-based single nucleotide feature to be annotated.
To convert RetroSeq output to 0-based BED format in https://github.com/bergmanlab/mcclintock, we will maintain the 2-nucleotide framework, and thus annotate POS-1 for the start and POS+1 for the end of the 2-nucleotide interval.
In the RetroSeq VCF file the position for TE insertions relative to the reference are given on 1-based coordinates in the POS column. In addition, there are a set of two consecutive coordinates in the INFO field, the first of which corresponds to the POS column, and the second corresponds to the next base in the genome. Does this imply that the predicted insertion would intergate between the first and second positions in the INFO field? In other words, to convert RetroSeq predictions to 0-based coordinates, do we (i) use the two coordinates in the INFO field, or (ii) subtract 1 from the POS column to make a new start position on 0-based coordinates?
The text was updated successfully, but these errors were encountered: