Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update HGG subtyping (4/11) #108

Merged
merged 10 commits into from
Oct 28, 2024
1 change: 1 addition & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ RUN R -e 'BiocManager::install(c( \
"survival", \
"survminer", \
"sva", \
"txdbmaker", \
"WGCNA" \
))'

Expand Down
71 changes: 37 additions & 34 deletions analyses/molecular-subtyping-HGG/00-fusion-summary.nb.html

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ library(annoFuseData)

```{r directories}
root_dir <- rprojroot::find_root(rprojroot::has_dir(".git"))

scratch_dir <- file.path(root_dir, "scratch")
analyses_dir <- hgg_subset_dir <- file.path(root_dir,
"analyses",
"molecular-subtyping-HGG")
Expand Down Expand Up @@ -59,10 +59,12 @@ keep_cols <- c("Chromosome",
# snv files
snv_tumor_maf <- data.table::fread(
file.path(root_dir, "data" , "Hope-tumor-only-snv-mutect2.maf.tsv.gz"),
tmpdir = scratch_dir,
select = keep_cols)

snv_consensus_hotspot_maf <- data.table::fread(
file.path(root_dir, "data" , "Hope-snv-consensus-plus-hotspots.maf.tsv.gz"),
tmpdir = scratch_dir,
select = keep_cols) %>%
bind_rows(snv_tumor_maf)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -356,21 +356,21 @@ <h4 class="date">2023-08-02</h4>


<pre class="r"><code>library(tidyverse)</code></pre>
<pre><code>## ── Attaching core tidyverse packages ───────────────────────────────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.1 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ─────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
<pre><code>## ── Attaching core tidyverse packages ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (&lt;http://conflicted.r-lib.org/&gt;) to force all conflicts to become errors</code></pre>
<pre class="r"><code>library(annoFuseData)</code></pre>
<div id="get-directories" class="section level2">
<h2>Get directories</h2>
<pre class="r"><code>root_dir &lt;- rprojroot::find_root(rprojroot::has_dir(&quot;.git&quot;))

scratch_dir &lt;- file.path(root_dir, &quot;scratch&quot;)
analyses_dir &lt;- hgg_subset_dir &lt;- file.path(root_dir,
&quot;analyses&quot;,
&quot;molecular-subtyping-HGG&quot;)
Expand All @@ -388,8 +388,8 @@ <h2>Get directories</h2>
</div>
<div id="load-histologies-files" class="section level2">
<h2>load histologies files</h2>
<pre><code>## Rows: 634 Columns: 94
## ── Column specification ─────────────────────────────────────────────────────────────────────────────────
<pre><code>## Rows: 632 Columns: 94
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: &quot;\t&quot;
## chr (56): Kids_First_Biospecimen_ID, sample_id, Kids_First_Participant_ID, e...
## dbl (29): aliquot_id, age_at_diagnosis_days, age_at_event_days, age_at_chemo...
Expand All @@ -416,10 +416,12 @@ <h2>Defining lesions snv data</h2>
# snv files
snv_tumor_maf &lt;- data.table::fread(
file.path(root_dir, &quot;data&quot; , &quot;Hope-tumor-only-snv-mutect2.maf.tsv.gz&quot;),
tmpdir = scratch_dir,
select = keep_cols)

snv_consensus_hotspot_maf &lt;- data.table::fread(
file.path(root_dir, &quot;data&quot; , &quot;Hope-snv-consensus-plus-hotspots.maf.tsv.gz&quot;),
tmpdir = scratch_dir,
select = keep_cols) %&gt;%
bind_rows(snv_tumor_maf)

Expand Down Expand Up @@ -537,15 +539,15 @@ <h2>cn data</h2>
replace(is.na(.), &quot;Neutral&quot;) %&gt;%
distinct() %&gt;%
write_tsv(file.path(results_dir, &quot;HGG_cleaned_cnv.tsv&quot;))</code></pre>
<pre><code>## `summarise()` has grouped output by &#39;Kids_First_Biospecimen_ID&#39;. You can override using the `.groups`
## argument.</code></pre>
<pre><code>## `summarise()` has grouped output by &#39;Kids_First_Biospecimen_ID&#39;. You can
## override using the `.groups` argument.</code></pre>
</div>
<div id="mutations" class="section level2">
<h2>mutations</h2>
<pre class="r"><code>gencode_cds_bed &lt;- readr::read_tsv(file.path(root_dir, &quot;scratch&quot;, &quot;gencode.v39.primary_assembly.annotation.bed&quot;),
col_names = FALSE)</code></pre>
<pre><code>## Rows: 840001 Columns: 10
## ── Column specification ─────────────────────────────────────────────────────────────────────────────────
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: &quot;\t&quot;
## chr (7): X1, X4, X5, X6, X7, X8, X10
## dbl (3): X2, X3, X9
Expand All @@ -555,7 +557,7 @@ <h2>mutations</h2>
<pre class="r"><code>defining_lesions_df &lt;- readr::read_tsv(file.path(results_dir, &quot;Hope_HGG_defining_lesions.tsv&quot;)) %&gt;%
rename(Tumor_Sample_Barcode = Kids_First_Biospecimen_ID)</code></pre>
<pre><code>## Rows: 110 Columns: 14
## ── Column specification ─────────────────────────────────────────────────────────────────────────────────
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: &quot;\t&quot;
## chr (13): Kids_First_Participant_ID, sample_id, Kids_First_Biospecimen_ID, H...
## lgl (1): defining_lesion
Expand Down Expand Up @@ -680,11 +682,11 @@ <h1>Combine all together and save</h1>
<div id="fusion-subsetting" class="section level2">
<h2>fusion subsetting</h2>
<pre class="r"><code>fusion_hgg &lt;- read_tsv(file.path(analyses_dir, &quot;input&quot;, &quot;fusion_summary_hgg_foi.tsv&quot;)) </code></pre>
<pre><code>## Rows: 87 Columns: 35
## ── Column specification ─────────────────────────────────────────────────────────────────────────────────
<pre><code>## Rows: 87 Columns: 36
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: &quot;\t&quot;
## chr (1): Kids_First_Biospecimen_ID
## dbl (34): BCL11A--ALK, BRAF--BRAF, CCDC88A--ALK, CHD7--MYBL1, CLIP1--ROS1, E...
## dbl (35): BCL11A--ALK, BRAF--BRAF, BRAF--WSB1, CCDC88A--ALK, CHD7--MYBL1, CL...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.</code></pre>
Expand Down Expand Up @@ -773,13 +775,13 @@ <h3>Clean and Wrangle Expression Data</h3>
clean_wrangle_expression(expression_matrix,
output_file_path = file.path(results_dir, &quot;HGG_cleaned_expression.tsv&quot;))</code></pre>
<pre class="r"><code>sessionInfo()</code></pre>
<pre><code>## R version 4.2.3 (2023-03-15)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.2 LTS
<pre><code>## R version 4.4.0 (2024-04-24)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
Expand All @@ -789,35 +791,38 @@ <h3>Clean and Wrangle Expression Data</h3>
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] annoFuseData_0.1.0 lubridate_1.9.2 forcats_1.0.0 stringr_1.5.0
## [5] dplyr_1.1.1 purrr_1.0.1 readr_2.1.4 tidyr_1.3.0
## [9] tibble_3.2.1 ggplot2_3.4.2 tidyverse_2.0.0
## [1] annoFuseData_0.1.0 lubridate_1.9.3 forcats_1.0.0 stringr_1.5.1
## [5] dplyr_1.1.4 purrr_1.0.2 readr_2.1.5 tidyr_1.3.1
## [9] tibble_3.2.1 ggplot2_3.5.1 tidyverse_2.0.0
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.0 xfun_0.38 bslib_0.4.2
## [4] colorspace_2.1-0 vctrs_0.6.2 generics_0.1.3
## [7] htmltools_0.5.5 stats4_4.2.3 yaml_2.3.7
## [10] utf8_1.2.3 rlang_1.1.0 R.oo_1.25.0
## [13] jquerylib_0.1.4 pillar_1.9.0 glue_1.6.2
## [16] withr_2.5.0 R.utils_2.12.2 BiocGenerics_0.44.0
## [19] bit64_4.0.5 GenomeInfoDbData_1.2.9 lifecycle_1.0.3
## [22] zlibbioc_1.44.0 munsell_0.5.0 gtable_0.3.3
## [25] R.methodsS3_1.8.2 evaluate_0.20 knitr_1.42
## [28] IRanges_2.32.0 tzdb_0.3.0 fastmap_1.1.1
## [31] GenomeInfoDb_1.34.9 parallel_4.2.3 fansi_1.0.4
## [34] scales_1.2.1 cachem_1.0.7 S4Vectors_0.36.2
## [37] XVector_0.38.0 vroom_1.6.1 jsonlite_1.8.4
## [40] bit_4.0.5 hms_1.1.3 digest_0.6.31
## [43] stringi_1.7.12 GenomicRanges_1.50.2 grid_4.2.3
## [46] rprojroot_2.0.3 bitops_1.0-7 cli_3.6.1
## [49] tools_4.2.3 magrittr_2.0.3 sass_0.4.5
## [52] RCurl_1.98-1.12 crayon_1.5.2 pkgconfig_2.0.3
## [55] data.table_1.14.8 timechange_0.2.0 rmarkdown_2.21
## [58] R6_2.5.1 compiler_4.2.3</code></pre>
## [1] sass_0.4.9 utf8_1.2.4 generics_0.1.3
## [4] stringi_1.8.4 hms_1.1.3 digest_0.6.35
## [7] magrittr_2.0.3 evaluate_0.24.0 grid_4.4.0
## [10] timechange_0.3.0 fastmap_1.2.0 R.oo_1.26.0
## [13] rprojroot_2.0.4 jsonlite_1.8.8 R.utils_2.12.3
## [16] GenomeInfoDb_1.40.1 httr_1.4.7 fansi_1.0.6
## [19] UCSC.utils_1.0.0 scales_1.3.0 jquerylib_0.1.4
## [22] cli_3.6.2 rlang_1.1.4 crayon_1.5.2
## [25] XVector_0.44.0 R.methodsS3_1.8.2 bit64_4.0.5
## [28] munsell_0.5.1 withr_3.0.0 cachem_1.1.0
## [31] yaml_2.3.8 tools_4.4.0 parallel_4.4.0
## [34] tzdb_0.4.0 colorspace_2.1-0 GenomeInfoDbData_1.2.12
## [37] BiocGenerics_0.50.0 vctrs_0.6.5 R6_2.5.1
## [40] stats4_4.4.0 lifecycle_1.0.4 zlibbioc_1.50.0
## [43] IRanges_2.38.0 S4Vectors_0.42.0 bit_4.0.5
## [46] vroom_1.6.5 pkgconfig_2.0.3 pillar_1.9.0
## [49] bslib_0.7.0 gtable_0.3.5 glue_1.7.0
## [52] data.table_1.15.4 GenomicRanges_1.56.1 xfun_0.44
## [55] tidyselect_1.2.1 knitr_1.47 htmltools_0.5.8.1
## [58] rmarkdown_2.27 compiler_4.4.0</code></pre>
</div>
</div>
</div>
Expand Down
Loading