Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

methylation bias with tagmentation based WGBS library #564

Closed
docatherine opened this issue Feb 2, 2023 · 4 comments
Closed

methylation bias with tagmentation based WGBS library #564

docatherine opened this issue Feb 2, 2023 · 4 comments

Comments

@docatherine
Copy link

Hi Felix,
Have you ever seen methylation bias due to tagmentation based WGBS libraries?
We performed some WGBS using the EZ DNA Methylation Kit which used Tn5 and observed a strange M-bias profile on both reads for the % methylation but also for the total CHG and CHH call:
image
(Of note, we did GpC methyltransferase treatment which methylated in C at GpC sites of open chromatin region (NOME-seq approach, which explains the higher level of non CpG methylation but I don t think explain this bias especially at position 5).

The fastqc showed a bias within the 10 first bp which seems to be consistent with the known bias due to Tn5 preferred cutsites. So at first I did not worry and ignored the methylation call for those bases, although I am not sure why the Gs are also lower. I did not include read 2 but it looks exactly the same as read 1 with both lower G and C level which is also strange)
image
However, when comparing the methylation level of the CpG (after excluding the GCG which can have ambiguous methylation calling due to the GpC methylation and excluding the methylation call within the first 15 bp) with RRBS data generated in the same sample, I noticed that the high methylated peak "disappeared"
image

I really don't think that this is due to the GpC methylation which in case of non specific methylation from the GpC methyltransferase would tend to artificially increase the methylation at CpG not the opposite. Low BS conversion would also look like hypermethylation not hypo.

I was thinking that maybe the preference of the Tn5 for a T at position 5 (even on genomic DNA) could explain a bias toward unmethylated CpG at this position (and overall on the CpGs covered by the same read)? Do you think it is possible. I looked in the literature and nobody mentioned a potential methylation bias for the tagmentation based methylseq libraries. I saw a paper showing that the methylation does not affect the Tn5 cutsites but they don t tell if the Tn5 cutsite bias could affect methylation call.

Thank you for your input and expertise.

@docatherine
Copy link
Author

Sorry the graph labels did not show up in the post:
the first fastqc is read 1 WGBS, the second a fastqc from genomic DNA showing the Tn5 bias.
In the density plot, blue is RRBS and red WGBS

@FelixKrueger
Copy link
Owner

Hi @docatherine

Thanks for sharing these details; I am not sure I was explicitly aware of biases arising from tagmentation experiments, but I'm not very surprised to learn that the do exist. We have seen such biases, both on the sequence composition and methylation-bias level, for a variety of applications, e.g. PBAT and single-cell applications: https://sequencing.qcfail.com/applications/pbat/. In our cases, it proved much better to get rid of the biased positions altogether by hard-clipping the affected residues before mapping (rather than just ignoring the methylation calls), as the alignment rates were often much worse due to additional errors and InDels in the biased positions. A command lilke:

trim_galore --clip_r1 15 --clip_r2 15 --paired *fastq.gz

should do the job, maybe you could compare mapping efficiencies?

Regarding the methylation values themselves, did you use the --nome-seq option within coverage2cytosine? Depending on the methylase used there might also be star activity for Cs in GCC context: http://felixkrueger.github.io/Bismark/bismark/methylation_extraction/#optional-genome-wide-cytosine-report-output
In a fairly recent issue we added a context summary report that should make it easy to detect whether any context biases occur in your data, it should be produced automatically when you run coverage2cytosine.

Furthermore, I agree that it is puzzling to see the regions with high methylation levels disappear when aggregating all data. A Sequence preference for T of the Tn5 might indeed explain this phenomenon. To understand whether Tn5 might preferentially target such unmethylated cytosines it would be important to know whether Tn5 is used prior to the bisulfite conversion process, i.e. is there a chance that unmethylated regions are converted to Ts, which then get cleaved? If the conversion takes place afterwards (which tends to occur on a single-stranded fragment), it might not explain the preference as straight forwardly.

@docatherine
Copy link
Author

Hi Felix,
Thank you for your answer!
I will try the hard clipping and check the context bias. I was not aware that Bismark has a nome-seq option (should have checked the manual...) and extracted the cytosine context "by hand" but I will definitely rerun the cytosine report with Bismark since I trust your code much more than mine ...

Yes, in the Zymo WGBS library kit the bisulfite conversion occurs first so unmethylated Cs should be converted to Ts before cleavage. We will sequence this NOME-seq DNA using Nanopore which will, I hope, tell us whether the issue was related to the library prep.

Catherine
PS: I would like to thank you overall for the incredible work that you did with Bismark. Not only the pipeline is great and easy to use but the user manual and resources that you provide helped me so much to understand methyl-seq analyses when I started working on DNA methylation.

@FelixKrueger
Copy link
Owner

Thanks very much for these nice comments, they are very much appreciated!

I shall go ahead and close this issue for the time being, you can always re-open it when you got additional information available? Best wishes, Felix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants