Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coverage2cytosine #382

Closed
ShouWenWang opened this issue Oct 9, 2020 · 3 comments
Closed

coverage2cytosine #382

ShouWenWang opened this issue Oct 9, 2020 · 3 comments

Comments

@ShouWenWang
Copy link

Hi, Felix
I have some trouble with coverage2cytosine... I am using the code:
coverage2cytosine --nome-seq --genome_folder genome_folder -o report input_data.deduplicated.bismark.cov.gz

I had expected the CpG reports to contain only the context of ACG and TCG, but this is what I get:
1 34741246 - 1 0 CG CGC
1 35403302 - 1 0 CG CGG
1 35403394 - 1 0 CG CGG
1 36471854 - 0 1 CG CGA
1 36471885 - 0 1 CG CGA
1 36471891 - 0 1 CG CGG
1 36471936 - 0 1 CG CGC
1 36471966 - 0 1 CG CGG
1 36471975 - 0 1 CG CGT

Also, I had expected the GpC reports to contain only the context of GCA, GCT, GCC, but this is what I get:
1 4445691 - 0 1 CHH CCA
1 4445750 - 0 1 CHH CCC
1 4523988 + 0 1 CHH CCC
1 4523994 + 0 1 CHH CAT
1 4523999 + 0 1 CHH CCT
1 4524014 + 0 1 CHH CTA
1 4524034 + 0 1 CHH CAA
1 4524044 + 0 1 CHH CTA
1 4524054 + 0 1 CHH CAA
1 4524066 + 0 1 CHH CAT
1 4548184 - 0 1 CHG CTG
1 4548197 - 0 1 CHG CTG
1 4548224 - 0 1 CHH CCT
1 4548227 - 0 1 CHG CTG

What is going on here? How can I get a report that I had expected...

@FelixKrueger
Copy link
Owner

Hi @ascendancy09

To be honest, so far there is nothing that looks weird to me. For CpG reports, only CpGs in the context of ACG and TCG are shown, however the format of the CpG report only shows the downstream context, but not the upstream base (which I would expect to always be A or T).

In a recent issue (#321) we have added a cytosine context summary which is now always printed when you run coverage2cytosine. This report does report the upstream base, so it should be able to see the GpC methylation in there. If you would clone the current development version (a release is due probably next week or the week after), you should benefit of this report.

For the GpC reports, you more or less want to see any context but not CG, which seems to be exactly what you got. Again, here the upstream base does not show in this report, but if you would check in a genome browser you should see that each C is preceded by a G, hence, GpC methylation.

I hope this clears things up?

@ShouWenWang
Copy link
Author

Thanks! Now it makes sense to me!

@FelixKrueger
Copy link
Owner

Excellent, best of luck!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants