Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VCFv4.5 RC2 #770

Merged
merged 2 commits into from
Jun 6, 2024
Merged

VCFv4.5 RC2 #770

merged 2 commits into from
Jun 6, 2024

Conversation

d-cameron
Copy link
Contributor

VCF 4.5 Release Candidate 2 changes:

  • Added FORMAT Type=M to enable custom/implementation-defined base modification tags
  • Added key aliases that correspond to SAM MM tag abbreviations
  • Added DP* base modification fields
  • Added AD* base modification fields
    • Note that due to the encoding of M fields, AD is essentially a combined ADF and ADR tag
    • (This design does not support reporting both AD and ADF/ADR (AD is inferred when the negative strand information is MISSING. Please comment/raise an issue if this is a concern).

Copy link

Changed PDFs as of 2b64c97: VCFv4.5.draft (diff).

github-actions bot pushed a commit that referenced this pull request May 29, 2024
Copy link
Contributor Author

@d-cameron d-cameron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Do we need better support for use cases such as subsetting a 5mC assay to just CpG methylation. Do we need to support (optional) caching of sequence context.
    -- The current model requires a reference genome to know context.
  • Do we need a header/tag to explicitly state stranded/unstranded CpG

M5hmC & . & Float & Alias for M76792 5-(hydroxymethyl)cytosine \\
M6mA & . & Float & Alias for M28871 6-methyladenine \\
M[0-9]+[ACGTUN] & M & Float & Fraction of bases modified with the given ChEBI ID. \\
DPM[0-9]+[ACGTUN] & M & Float & Total read depth for reads able to detect the base modification with the given ChEBI ID. \\
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integer

M6mA & . & Float & Alias for M28871 6-methyladenine \\
M[0-9]+[ACGTUN] & M & Float & Fraction of bases modified with the given ChEBI ID. \\
DPM[0-9]+[ACGTUN] & M & Float & Total read depth for reads able to detect the base modification with the given ChEBI ID. \\
ADM[0-9]+[ACGTUN] & M & Float & Read depth for reads with the base modification with the given ChEBI ID. \\
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integer

@@ -637,70 +714,42 @@ \subsubsection{Genotype fields}
\item LPL: is a list of $n \choose \mathrm{Ploidy}$ integers giving phred-scaled genotype likelihoods (rounded to the closest integer; as per PL) for all possible genotypes given the set of alleles defined in the LAA local alleles.
The precise ordering is defined in the GL paragraph.

\item M[0-9]+ (Float): DNA or RNA base modification abundance for the modification with the given ChEBI ID.
\item M[0-9]+[ACGTN] (Float): Fraction of DNA or RNA bases modified with the given ChEBI ID.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

U?

@@ -196,8 +196,56 @@ \subsubsection{Individual format field format}
\item LR: Identical to R except the only alternate alleles defined in the $LAA$ field are considered present.
\item LG: Identical to G except the only alternate alleles defined in the $LAA$ field are considered present.
\item P: The field has one value for each allele value defined in $GT$.
\item M: The field has one value for each possible base modification for the corresponding ChEBI ID.
\end{itemize}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add gVCF expansion

\vspace{0.5em}
\begin{tabular}{ l l l l l l l l l l}
\#CHROM & POS & REF & ALT & FORMAT & SAMPLE\\
chr & $10$ & C & A & GT:M5mC & \tt{0/1:0.95}\\
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just reading this record, it is unknown whether these values are for all 5mC, just CpG methylation, and whether it is stranded or not.

@d-cameron
Copy link
Contributor Author

  • Do we need any additional/different fields that encode the DPM*/ADM* ambiguity when doing bisulfite sequencing on het C/T fields?

@d-cameron
Copy link
Contributor Author

Arbitrary co-methylation is out of scope but do we want special fields for CpG co-methylation?

Copy link

github-actions bot commented Jun 6, 2024

Changed PDFs as of f1e0634: VCFv4.5.draft (diff).

github-actions bot pushed a commit that referenced this pull request Jun 6, 2024
@d-cameron d-cameron merged commit 44aedcf into master Jun 6, 2024
1 check passed
@jkbonfield jkbonfield added the vcf label Jun 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants