Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add methyl probe annotations liftover process description #378

Merged
merged 2 commits into from
Jun 8, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion analyses/methylation-summary/01-calculate-tpm-medians.R
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ option_list <- list(
help = "OpenPedCan rnaseq tpm gene or isoform matrix file",
metavar = "character"),
make_option(opt_str = "--methyl_probe_annot", type = "character", default = NULL,
help = "Methyl gencode array probe annotation results file",
help = "Methyl gencode array probe annotations",
metavar = "character"),
make_option(opt_str = "--methyl_independent_samples", type = "character", default = NULL,
help = "OpenPedCan methyl independent biospecimen list file",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ option_list <- list(
help = "OPenPedCan methyl beta-values or m-values matrix file",
metavar = "character"),
make_option(opt_str = "--methyl_probe_annot", type = "character", default = NULL,
help = "Methyl gencode array probe annotation results file",
help = "Methyl gencode array probe annotations",
metavar = "character"),
make_option(opt_str = "--independent_samples", type = "character", default = NULL,
help = "OpenPedCan methyl independent biospecimen list file",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ def read_parameters():
p.add_argument('METHYL_INDEPENDENT_SAMPLES', type=str, default=None, help="OPenPedCan methyl independent biospecimen list file\n\n")
p.add_argument('METHLY_MATRIX', type=str, default=None, help="OpenPedCan methyl beta-values or m-values matrix file\n\n")
p.add_argument('EXP_MATRIX', type=str, default=None, help="OPenPedCan expression matrix file\n\n")
p.add_argument('PROBE_ANNOT', type=str, default=None, help="Methylation aaray probe gencode annotation results file\n\n")
p.add_argument('PROBE_ANNOT', type=str, default=None, help="Methyl gencode array probe annotations\n\n")
p.add_argument('-m', '--methyl_values', type=str, default='beta', choices=METHLY_VALUES, help="OpenPedCan methly matrix values: beta (default) and m\n\n")
p.add_argument('-e', '--exp_values', type=str, default='gene', choices=EXP_TYPE, help="OpenPedCan expression matrix values: gene (default) and isoform\n\n")
p.add_argument('-v', '--version', action='version', version="03-methyl-tpm-correlation.py version {} ({})".format(__version__, __date__), help="Print the current 03-methyl-tpm-correlation.py version and exit\n\n")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ def read_parameters():
p.add_argument('METHYL_INDEPENDENT_SAMPLES', type=str, default=None, help="OPenPedCan methyl independent biospecimen list file\n\n")
p.add_argument('GENE_EXP_MATRIX', type=str, default=None, help="OPenPedCan gene expression matrix file\n\n")
p.add_argument('ISOFORM_EXP_MATRIX', type=str, default=None, help="OPenPedCan isoform expression matrix file\n\n")
p.add_argument('PROBE_ANNOT', type=str, default=None, help="Methylation array probe gencode annotation results file\n\n")
p.add_argument('PROBE_ANNOT', type=str, default=None, help="Methyl gencode array probe annotations\n\n")
p.add_argument('-v', '--version', action='version', version="04-tpm-transcript-representation.py version {} ({})".format(__version__, __date__), help="Print the current 04-tpm-transcript-representation.py version and exit\n\n")
return p.parse_args()

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ option_list <- list(
help = "Methyl array probe beta/m-values quantiles results file",
metavar = "character"),
make_option(opt_str = "--methyl_probe_annot", type = "character", default = NULL,
help = "Methyl gencode array probe annotation results file",
help = "Methyl gencode array probe annotations",
metavar = "character"),
make_option(opt_str = "--rnaseq_tpm_medians", type = "character", default = NULL,
help = "RNA-Seq gene-level or isoform-level tmp median expression results file",
Expand Down
15 changes: 9 additions & 6 deletions analyses/methylation-summary/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,10 @@

## Purpose

Summarize preprocessed `Illumina Infinium HumanMethylation` array measurements produced by the [OpenPedCan methylation-preprocessing module](https://github.com/PediatricOpenTargets/OpenPedCan-analysis/tree/dev/analyses/methylation-preprocessing) and [Illumina infinium methylation array CpG probe coordinates](https://support.illumina.com/array/array_kits/infinium-methylationepic-beadchip-kit/downloads.html) lifted-over from GRCh37 to GRCh38 build and annotated with GENCODE v39 release that is currently utilized in the OpenPedCan data analyses.
Summarize preprocessed `Illumina Infinium Human Methylation` array measurements produced by the [OpenPedCan methylation-preprocessing module](https://github.com/PediatricOpenTargets/OpenPedCan-analysis/tree/dev/analyses/methylation-preprocessing) and [Illumina infinium methylation array CpG probe coordinates](https://support.illumina.com/array/array_kits/infinium-methylationepic-beadchip-kit/downloads.html) lifted-over from `GRCh37` to `GRCh38` build and annotated with [GENCODE v39 release](https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_39/) that is currently utilized in the OpenPedCan data analyses.

## Methylation array CpG probe coordinates liftover
The 450K and EPIC Illumina Infinium methylation array CpG probe coordinates are based on the `Human Build 37 (GRCh37/hg19)` genome assembly. Probe coordinates were converted to `Human Build 38 (GRCh38/hg38)` using the [ENSEMBL Assembly Converter tool](https://useast.ensembl.org/Homo_sapiens/Tools/AssemblyConverter). A probe annotation file, `infinium.gencode.v39.probe.annotations.tsv` currently used in the module analyses, was created by annotating all the probes that were lifted over with associated gene features (i.e., `promoter`, `5' UTR`, `exon`, `intron`, `3'UTR`, and `intergenic`) based on `GENCODE v39` release. Intron coordinates, typically not included in the GFF3/GTF genome annotation formats, were added to the GENCODE annotations file using [GenomeTools](http://genometools.org/). Probe locations were then assigned with their intersecting gene annotation features with [bedtools](https://bedtools.readthedocs.io/en/latest/content/bedtools-suite.html).


## Analysis scripts
Expand All @@ -19,7 +22,7 @@ Options:
OpenPedCan rnaseq tpm gene or isoform matrix file

--methyl_probe_annot=CHARACTER
Methyl gencode array probe annotation results file
Methyl gencode array probe annotations

--methyl_independent_samples=CHARACTER
OpenPedCan methyl independent biospecimen list file
Expand All @@ -46,7 +49,7 @@ Options:
OPenPedCan methyl beta-values or m-values matrix file

--methyl_probe_annot=CHARACTER
Methyl gencode array probe annotation results file
Methyl gencode array probe annotations

--independent_samples=CHARACTER
OpenPedCan methyl independent biospecimen list file
Expand Down Expand Up @@ -78,7 +81,7 @@ positional arguments:

EXP_MATRIX OPenPedCan expression matrix file

PROBE_ANNOT Methylation aaray probe gencode annotation results file
PROBE_ANNOT Methyl gencode array probe annotations

optional arguments:
-h, --help show this help message and exit
Expand Down Expand Up @@ -129,7 +132,7 @@ positional arguments:

ISOFORM_EXP_MATRIX OPenPedCan isoform expression matrix file

PROBE_ANNOT Methylation aaray probe gencode annotation results file
PROBE_ANNOT Methyl gencode array probe annotations

-v, --version Print the current 04-tpm-transcript-representation.py version and exit
```
Expand Down Expand Up @@ -168,7 +171,7 @@ Options:
Methyl array probe beta/m-values quantiles results file

--methyl_probe_annot=CHARACTER
Methyl gencode array probe annotation results file
Methyl gencode array probe annotations

--rnaseq_tpm_medians=CHARACTER
RNA-Seq gene-level or isoform-level tmp median expression results file
Expand Down