Skip to content

Latest commit

 

History

History
executable file
·
71 lines (57 loc) · 1.74 KB

Processing_in_R.md

File metadata and controls

executable file
·
71 lines (57 loc) · 1.74 KB
jupytext kernelspec
formats text_representation
md:myst
extension format_name format_version jupytext_version
.md
myst
0.13
1.11.5
display_name language name
Python 3
python
python3

Processing in R

Step 1 - Preprocessing

Create a folder called for the dataset e.g. TCGA, and within this folder create a folder for each project.

Run the R script Preprocessing.R specifying the phenotypical trait and project, checking to ensure the paths point to the created data folder.

Save each modalities processed folder with naming convention modality_processed.RData.

The options are
BRCA :
project = 'BRCA'
trait = 'paper_paper_BRCA_Subtype_PAM50'

LGG :
project = 'LGG'
trait = 'paper_Grade'

KIPAN :
project = 'KIPAN'
trait = 'subtype'

Step 2 - Graph Generation

Point the knn_graph_generation.R to the project folder containing the processed modalities.

Create a folder called raw. This is the folder from which MOGDx will be run.

Use the R script knn_graph_generation.R specifying the phenotypical trait, project and modalities downloaded in the for loop.

Step 3 - SNF

Create a folder called Network outside data
Copy each modalities modality_graph.csv to this folder \

Specify the modalities of interest in the list mod_list

Point the SNF script to the new Network folder

Run the R script SNF.R

Example of directory structure for TCGA

  • data
    • TCGA-BRCA
      • mRNA
        • mRNA.rda
      • miRNA
        • miRNA.rda
      • processed
        • mRNA_processed.RData
        • miRNA_processed.RData
    • raw
      • datExpr_mRNA.csv
      • datMeta_mRNA.csv
      • datExpr_miRNA.csv
      • datMeta_miRNA.csv
    • Network
      • mRNA_graph.csv
      • miRNA_graph.csv
      • mRNA_miRNA_graph.csv