Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update merge files code (2/11) #105

Merged
merged 4 commits into from
Aug 20, 2024
Merged

Update merge files code (2/11) #105

merged 4 commits into from
Aug 20, 2024

Conversation

komalsrathi
Copy link
Collaborator

@komalsrathi komalsrathi commented Jun 20, 2024

Quick update:

I first updated the histologies base file to add the two missing samples and soft-linked it under data/. Then, I re-ran the modified scripts (i.e. remove title case from CNV file, update path to gencode v39) to generate the merged files for v3 release.

In addition to the updated histologies file, I have updated and uploaded to s3 (v3 folder) the following merged files:

results
├── Hope-cnv-controlfreec-tumor-only.rds
├── Hope-cnv-controlfreec.rds
├── Hope-fusion-putative-oncogenic.rds
├── Hope-gene-counts-rsem-expected_count-collapsed.rds
├── Hope-gene-counts-rsem-expected_count.rds
├── Hope-gene-expression-rsem-tpm-collapsed.rds
├── Hope-gene-expression-rsem-tpm.rds
├── Hope-snv-consensus-plus-hotspots.maf.tsv.gz
├── Hope-tumor-only-snv-mutect2.maf.tsv.gz
└── md5sum.txt

For the md5sum.txt, I have only updated the md5sums for the above files generated by my merge script).

Here is the comparison of sample size between v2 and the above merged files (i.e. v3) - each file's sample size has increased by 2:

> # Counts
> counts_file = readRDS("data/Hope-gene-counts-rsem-expected_count-collapsed.rds")
> length(colnames(counts_file))
[1] 85

> counts_file = readRDS("analyses/merge-files/results/Hope-gene-counts-rsem-expected_count-collapsed.rds")
> length(colnames(counts_file))
[1] 87

> # TPM
> tpm_file = readRDS("data/Hope-gene-expression-rsem-tpm-collapsed.rds")
> length(colnames(tpm_file))
[1] 85

> tpm_file = readRDS("analyses/merge-files/results/Hope-gene-expression-rsem-tpm-collapsed.rds")
> length(colnames(tpm_file))
[1] 87

> # SNV
> snv_file <- data.table::fread("data/Hope-snv-consensus-plus-hotspots.maf.tsv.gz")
> length(unique(snv_file$Tumor_Sample_Barcode))
[1] 71

> snv_file <- data.table::fread("analyses/merge-files/results/Hope-snv-consensus-plus-hotspots.maf.tsv.gz")
> length(unique(snv_file$Tumor_Sample_Barcode))
[1] 73

> # SNV tumor-only 
> snv_tumor_only_file <- data.table::fread("data/Hope-tumor-only-snv-mutect2.maf.tsv.gz")
> length(unique(snv_tumor_only_file$Tumor_Sample_Barcode))
[1] 88

> snv_tumor_only_file <- data.table::fread("analyses/merge-files/results/Hope-tumor-only-snv-mutect2.maf.tsv.gz")
> length(unique(snv_tumor_only_file$Tumor_Sample_Barcode))
[1] 90

> # CNV
> cnv_file <- readRDS("data/Hope-cnv-controlfreec.rds")
> length(unique(cnv_file$Kids_First_Biospecimen_ID))
[1] 71

> cnv_file <- readRDS("analyses/merge-files/results/Hope-cnv-controlfreec.rds")
> length(unique(cnv_file$Kids_First_Biospecimen_ID))
[1] 73

> # CNV tumor-only
> cnv_tumor_only_file <- readRDS("data/Hope-cnv-controlfreec-tumor-only.rds")
> length(unique(cnv_tumor_only_file$Kids_First_Biospecimen_ID))
[1] 88

> cnv_tumor_only_file <- readRDS("analyses/merge-files/results/Hope-cnv-controlfreec-tumor-only.rds")
> length(unique(cnv_tumor_only_file$Kids_First_Biospecimen_ID))
[1] 90

> # Fusions
> fusion_file <- readRDS("data/Hope-fusion-putative-oncogenic.rds")
> length(unique(fusion_file$Sample))
[1] 85

> fusion_file <- readRDS("analyses/merge-files/results/Hope-fusion-putative-oncogenic.rds")
> length(unique(fusion_file$Sample))
[1] 87

@komalsrathi komalsrathi self-assigned this Jun 20, 2024
@komalsrathi komalsrathi changed the title Update merge files code Update merge files code (2/11) Jul 15, 2024
@jharenza jharenza requested a review from naqvia August 16, 2024 16:57
@komalsrathi
Copy link
Collaborator Author

komalsrathi commented Aug 16, 2024

Note: this PR creates merged matrices and needs raw files + manifests from Cavatica HOPE project to be downloaded first. @naqvia you can skip running this one. We can also add this to .gitignore. The output files generated by this are in the s3 bucket so should be available with the latest data release and can be downloaded using the data-download bash script. All downstream modules can be run using those files.

cc: @jharenza

@jharenza
Copy link
Member

jharenza commented Aug 16, 2024

for this @komalsrathi can you update the release notes and the download script to include v3 https://github.com/d3b-center/hope-cohort-analysis/blob/master/doc/release-notes.md

@komalsrathi
Copy link
Collaborator Author

komalsrathi commented Aug 16, 2024

  1. Tried to run the download script after updating to v3:
Checking MD5 hashes...
Hope-and-CPTAC-GBM-gene-expression-rsem-tpm-collapsed.rds: FAILED
Hope-and-CPTAC-GBM.splice-events-rmats.tsv.gz: FAILED
Hope-GBM-histologies.tsv: FAILED
Hope-methyl-beta-values.rds: FAILED
Hope-methyl-m-values.rds: FAILED
release-notes.md: FAILED
Hope-sv-manta.tsv.gz: FAILED
Hope-GBM-histologies-base.tsv: OK
Hope-cnv-controlfreec-tumor-only.rds: OK
Hope-cnv-controlfreec.rds: OK
Hope-fusion-putative-oncogenic.rds: OK
Hope-gene-counts-rsem-expected_count-collapsed.rds: OK
Hope-gene-counts-rsem-expected_count.rds: OK
Hope-gene-expression-rsem-tpm-collapsed.rds: OK
Hope-gene-expression-rsem-tpm.rds: OK
Hope-snv-consensus-plus-hotspots.maf.tsv.gz: OK
Hope-tumor-only-snv-mutect2.maf.tsv.gz: OK
md5sum: WARNING: 7 of 17 computed checksums did NOT match
  1. Unsure how release notes numbers were added but I just did the following:
dat = read_tsv("data/v3/Hope-GBM-histologies.tsv")
> plyr::count(dat$experimental_strategy) %>% arrange(x)
                      x freq
1           Methylation   80
2    Phospho-Proteomics   91
3               RNA-Seq   87
4                   WGS  157
5 Whole Cell Proteomics   91
6             snRNA-Seq   27
7                  <NA>   99
  1. Added today's date for v3 release date

Copy link
Member

@jharenza jharenza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@komalsrathi komalsrathi merged commit d7ef282 into master Aug 20, 2024
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants