Skip to content

ttdtrang/data-rnaseq-sarcoma

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data package for sarcoma RNA-seq data from PRJNA282597

Sources

  • Experimental data were generated by Lesluyes et al. Original citations:
    • Lesluyes T, Pérot G, Largeau MR, Brulard C et al. RNA sequencing validation of the Complexity INdex in SARComas prognostic signature. Eur J Cancer 2016 Apr;57:104-11. PMID: 26916546
    • Lesluyes T, Baud J, Pérot G, Charon-Barra C et al. Genomic and transcriptomic comparison of post-radiation versus sporadic sarcomas. Mod Pathol 2019 Dec;32(12):1786-1794. PMID: 31243333
    • Lesluyes T et al., Genomic and transcriptomic comparison of post-radiation versus sporadic sarcomas., Mod Pathol, 2019 Dec;32(12):1786-1794
  • Processing:
    • Sequencing reads were downloaded from SRA, at PRJNA282597
    • Quantification was done by 2 alternative workflows:
      1. Using Kallisto 0.45.0 with an index built from Human genome GRCh38.99 and 92 ERCC sequences
      2. Using STAR 2.7.1a to align against the Gencode human genome v27, GRCh38.p10 and 92 ERCC sequences, and RSEM to estimate abundance levels for genes/isoforms.
  • Metadata: compiled from SRA, GEO soft-formatted file, plus extracted information from the sequence identifiers in fastq files.

Usage

Install the package, import the library and load the ExpressionSet of interest, for example

devtools::install_github('ttdtrang/data-rnaseq-sarcoma')
data(sarcoma.rnaseq.gene, package='data.rnaseq.sarcoma')
dim(sarcoma.rnaseq.gene.kallisto@assayData$exprs)

The package includes 4 data sets.

sarcoma.rnaseq.gene.kallisto
sarcoma.rnaseq.transcript.kallisto
sarcoma.rnaseq.gene.star_rsem
sarcoma.rnaseq.transcript.star_rsem

Steps to re-produce data curation

  1. cd data-raw
  2. Download all necessary raw data files.
  3. Set the environment variable DBDIR to point to the path containing said files. It is assumed that files are organized into directories corresponding to workflow, e.g.
├── kallisto
│   ├── feature_attributes.tsv
│   ├── matrix.est_counts.RDS
│   ├── matrix.gene.est_counts.RDS
│   ├── matrix.gene.tpm.RDS
│   └── matrix.tpm.RDS
├── PRJNA282597_metadata_cleaned.tsv
├── fastq_metadata.tsv
└── star-rsem
    ├── feature_attrs.rsem.transcripts.tsv
    ├── matrix.gene.expected_count.RDS
    ├── matrix.gene.tpm.RDS
    ├── matrix.transcripts.expected_count.RDS
    ├── matrix.transcripts.tpm.RDS
    └── starLog.final.tsv
  1. Run the R notebook make-data-package.Rmd to assemble parts into ExpressionSet objects.

About

Data package for RNA-seq on sarcoma samples

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages