Updated analysis: use `annotator` API in the `rna-seq-expression-summary-stats` module #122

logstar · 2021-07-20T16:50:33Z

What analysis module should be updated and why?

The rna-seq-expression-summary-stats module should be updated to use the annotator API at analyses/long-format-table-utils/annotator/annotator-api.R, after

PR [Long-format table annotation Part 2] annotator R API OpenPedCan-analysis#56 is merged

What changes need to be made? Please provide enough detail for another participant to make the update.

Use the long-format table annotator API in an analysis module with the following steps:

Change the working directory of the analysis module to be OpenPedCan-analysis or a subdirectory of OpenPedCan-analysis. This allows the API function annotate_long_format_table to locate annotation data files.
source the long-format-table-utils/annotator/annotator-api.R file.
If the class of the table to be annotated is not tibble::tbl_df, convert the table to tibble::tbl_df with tibble::as_tibble. After conversion, carefully check rownames, colnames, column classes (especially factors), and other properties that may affect the correctness of you code.
If c("Gene_symbol", "Gene_Ensembl_ID", "Disease") are not all present in the colnames of the table to be annotated, add new columns or rename existing ones to have all these required columns.
Call annotate_long_format_table to add one or more of the available annotation columns, by specifying the columns_to_add parameter in the annotate_long_format_table function. Read the documentation comment of the function for usage.
Rename, select, and reorder the columns of the annotated table for output in TSV, or JSON, or JSONL formats.

Following is an example usage in the rna-seq-expression-summary-stats module 01-tpm-summary-stats.R.

> getwd()
[1] "/home/rstudio/OpenPedCan-analysis/analyses/rna-seq-expression-summary-stats"
> source("../long-format-table-utils/annotator/annotator-api.R")
> class(m_tpm_ss_long_tbl)
[1] "tbl_df"     "tbl"        "data.frame"
> colnames(m_tpm_ss_long_tbl)
 [1] "gene_symbol"                          "gene_id"                             
 [3] "cancer_group"                         "cohort"                              
 [5] "tpm_mean"                             "tpm_sd"                              
 [7] "tpm_mean_cancer_group_wise_zscore"    "tpm_mean_gene_wise_zscore"           
 [9] "tpm_mean_cancer_group_wise_quantiles" "n_samples"                           
> renamed_m_tpm_ss_long_tbl <- dplyr::rename(
+   m_tpm_ss_long_tbl, Gene_symbol = gene_symbol, Gene_Ensembl_ID = gene_id,
+   Disease = cancer_group)
> annotated_renamed_m_tpm_ss_long_tbl <- annotate_long_format_table(
+   renamed_m_tpm_ss_long_tbl, columns_to_add = c("MONDO", "RMTL", "EFO"))
> m_tpm_ss_long_tbl <- dplyr::rename(
+   annotated_renamed_m_tpm_ss_long_tbl,
+   gene_symbol = Gene_symbol, gene_id = Gene_Ensembl_ID,
+   cancer_group = Disease)
> m_tpm_ss_long_tbl <- dplyr::select(
+   m_tpm_ss_long_tbl, gene_symbol, RMTL, gene_id,
+   cancer_group, EFO, MONDO, n_samples, cohort,
+   tpm_mean, tpm_sd,
+   tpm_mean_cancer_group_wise_zscore, tpm_mean_gene_wise_zscore,
+   tpm_mean_cancer_group_wise_quantiles)

What input data should be used? Which data were used in the version being updated?

data/gene-expression-rsem-tpm-collapsed.rds
data/histologies.tsv
analyses/independent-samples/results/independent-specimens.rnaseq.primary.eachcohort.tsv

When do you expect the revised analysis will be completed?

1 day.

Who will complete the updated analysis?

@logstar

The text was updated successfully, but these errors were encountered:

logstar · 2021-07-30T02:01:58Z

Closed with PR d3b-center/OpenPedCan-analysis#64 merged.

logstar added the blocked label Jul 20, 2021

logstar self-assigned this Jul 20, 2021

logstar removed the blocked label Jul 23, 2021

This was referenced Jul 23, 2021

Annotate SNV table with mutation frequencies d3b-center/OpenPedCan-analysis#45

Merged

[Update rna-seq-expression-summary-stats module for v7 Part 1] use annotator API d3b-center/OpenPedCan-analysis#64

Merged

logstar closed this as completed Jul 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated analysis: use `annotator` API in the `rna-seq-expression-summary-stats` module #122

Updated analysis: use `annotator` API in the `rna-seq-expression-summary-stats` module #122

logstar commented Jul 20, 2021 •

edited

Loading

logstar commented Jul 30, 2021

Updated analysis: use annotator API in the rna-seq-expression-summary-stats module #122

Updated analysis: use annotator API in the rna-seq-expression-summary-stats module #122

Comments

logstar commented Jul 20, 2021 • edited Loading

What analysis module should be updated and why?

What changes need to be made? Please provide enough detail for another participant to make the update.

What input data should be used? Which data were used in the version being updated?

When do you expect the revised analysis will be completed?

Who will complete the updated analysis?

logstar commented Jul 30, 2021

Updated analysis: use `annotator` API in the `rna-seq-expression-summary-stats` module #122

Updated analysis: use `annotator` API in the `rna-seq-expression-summary-stats` module #122

logstar commented Jul 20, 2021 •

edited

Loading