Skip to content

Commit

Permalink
Merge pull request #163 from PediatricOpenTargets/pedc-names
Browse files Browse the repository at this point in the history
PedCBio formatted sample ID column
  • Loading branch information
runjin326 committed Apr 18, 2022
2 parents 05a08b0 + f50fea3 commit 9a1f438
Show file tree
Hide file tree
Showing 8 changed files with 28,447 additions and 0 deletions.
41 changes: 41 additions & 0 deletions analyses/pedcbio-sample-name/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
## Add formatted sample id column for PedCBio upload

**Module author:** Run Jin ([@runjin326](https://github.com/runjin326))

Currently, for some of the samples, when multiple DNA or RNA specimens are associated with the same sample, there
is no column that would distinguish between different aliquots while still tying DNA and RNA together.
This module adds a column called `formatted_sample_id` where the base name is the sample id and additional `tiebreaks` were added when multiple RNA or DNA samples are associated with the same participant.

For PBTA samples, `sample_id` column is used as the basename
- Using `sample_id` column, we can tie all DNA and RNA samples together
- Using `formatted_sample_id` column, we can distinguish amongst multiple DNA or RNA samples
- Multiple DNA samples associated with the same sample would use `aliquot_id` as the tie breaker
- Multiple RNA samples associated with the same sample would use `RNA_library` as the tie breaker

For TARGET, TCGA, and GTEx samples, `Kids_First_Participant_ID` column is used as the basename
- Using `Kids_First_Participant_ID` column, we can tie all DNA and RNA samples together
- Using `formatted_sample_id` column, we can distinguish amongst multiple DNA or RNA samples
- For TARGET, `Kids_First_Participant_ID` + last 7 digits from the `Kids_First_Specimen_ID` is used as formatted sample ID
- For TCGA, `Kids_First_Participant_ID` + `sample_id` + `aliquot_id` is used as formatted sample ID
- For GTEx, `Kids_First_Participant_ID` + `aliquot_id` is used as formatted sample ID

Usage:
```
Rscript -e "rmarkdown::render('pedcbio_sample_name_col.Rmd', clean = TRUE)"
```
or
```
bash run_add_name.sh
```

Input:
- `input/cbtn_cbio_sample.csv`
- `input/oligo_nation_cbio_sample.csv`
- `input/dgd_cbio_sample.csv`
- `input/x01_fy16_nbl_maris_cbio_sample.csv`

Output:
- `results/histologies-formatted-id-added.tsv`

The output files are directly uploaded to S3 buckets for loading into PedCBio.
1,049 changes: 1,049 additions & 0 deletions analyses/pedcbio-sample-name/input/cbtn_cbio_sample.csv

Large diffs are not rendered by default.

936 changes: 936 additions & 0 deletions analyses/pedcbio-sample-name/input/dgd_cbio_sample.csv

Large diffs are not rendered by default.

62 changes: 62 additions & 0 deletions analyses/pedcbio-sample-name/input/oligo_nation_cbio_sample.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
"participant_id","sequencing_center_ids","collection_event_id","formatted_sample_id","specimen_id","analyte_types","normal_bs_id","normal_sample_id"
"PT_BSR2P67X","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-1","16510-1","{BS_6EH1T1S8,BS_PQY28YXA}","{DNA,RNA}","BS_BMYBS3EX","16510-1"
"PT_NKGQYCKZ","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-10","16510-10","{BS_E4VCDN6S,BS_JAT12TX2}","{DNA,RNA}","BS_XZMQZMY9","16510-10"
"PT_3B27053H","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-11","16510-11","{BS_AQYC2SXT,BS_17SAB1QN}","{DNA,RNA}","BS_WSS7TGJX","16510-11"
"PT_VK94C42M","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-12","16510-12","{BS_ZK7DMJQS,BS_XFWH05PB}","{DNA,RNA}","BS_NAKGZ1F0","16510-12"
"PT_4VYW4AAZ","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-13","16510-13","{BS_S1RSJCAM,BS_9M84M36J}","{DNA,RNA}","BS_K33VBFJV","16510-13"
"PT_D7GN92VR","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-14","16510-14","{BS_T7PQR658,BS_AS4K2YQW}","{DNA,RNA}","BS_FRY5QDDQ","16510-14"
"PT_1PVVBG1B","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-15","16510-15","{BS_P867VK77,BS_M8A8KHD8}","{DNA,RNA}","BS_1VKXE4H5","16510-15"
"PT_74Y53WN8","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-16","16510-16","{BS_7S2CDYVX,BS_AYAMBZHS}","{DNA,RNA}","BS_E6AVH26R","16510-16"
"PT_F2CG3ZZB","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-17","16510-17","{BS_91P6RMCF,BS_QV09SGSJ}","{DNA,RNA}","BS_FATPVK66","16510-17"
"PT_G24FN04H","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-18","16510-18","{BS_V90NNZ0D,BS_JT5BXPHZ}","{DNA,RNA}","BS_YZDD793H","16510-18"
"PT_PMFS3A49","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-19","16510-19","{BS_0J57HJ1S,BS_ZYPMR035}","{DNA,RNA}","BS_HP0BWRY0","16510-19"
"PT_DN84REQ3","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-2","16510-2","{BS_ZK8DW7Y1,BS_8ZTEXH90}","{DNA,RNA}","BS_X9GKMCXV","16510-2"
"PT_28A07ZDV","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-20","16510-20","{BS_FHJP8HE0,BS_WTVQNPE4}","{DNA,RNA}","BS_5A11TKW2","16510-20"
"PT_JBCAXNK2","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-21","16510-21","{BS_K6BDQS4D,BS_M9CH91QV}","{DNA,RNA}","BS_1B8B622B","16510-21"
"PT_F1MDWB5W","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-22","16510-22","{BS_1SV5KG4Y,BS_2K3FVT86}","{DNA,RNA}","BS_6CXJKHG6","16510-22"
"PT_4X7VW8WT","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-24","16510-24","{BS_PPAGQPMZ,BS_SRD8TFGV}","{DNA,RNA}","BS_AXJ00JPW","16510-24"
"PT_MKWZC1QY","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-25","16510-25","{BS_QBPZQF73,BS_D5MT05YK}","{DNA,RNA}","BS_F3K6K7R6","16510-25"
"PT_STE26Y8Z","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-26","16510-26","{BS_YRNJ7Z3E,BS_NGMQF1B5}","{DNA,RNA}","BS_1HQACYRD","16510-26"
"PT_XQ001FC5","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-3","16510-3","{BS_5PTCMTJ0,BS_A8V685RR}","{DNA,RNA}","BS_VRHRMEW1","16510-3"
"PT_VN63HHEN","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-4","16510-4","{BS_Z45QE3VE,BS_6NJRE82B}","{DNA,RNA}","BS_E758QJDD","16510-4"
"PT_TG2YJS5J","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-5","16510-5","{BS_C4WJZS6W,BS_WQNXWJXE}","{DNA,RNA}","BS_T5QB3H8B","16510-5"
"PT_G9D39RG0","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-6","16510-6","{BS_14PEVAHC,BS_V0JFAS12}","{DNA,RNA}","BS_M9J949F5","16510-6"
"PT_B694AQYE","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-7","16510-7","{BS_FEX3A4SE,BS_1739JAKK}","{DNA,RNA}","BS_9DKCQN5E","16510-7"
"PT_401GQ0VV","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-8","16510-8","{BS_A8P99605,BS_1HHS9AFK}","{DNA,RNA}","BS_KPVPQYD0","16510-8"
"PT_R4JE89YX","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-9","16510-9","{BS_EYFC3E6B,BS_BGTPAACY}","{DNA,RNA}","BS_QJ8EE8WP","16510-9"
"PT_88F5PD21","{SC_2ZBAMKK0}","7316-6188","7316-6188","{BS_P1AZ9H9A}","{RNA}","",""
"PT_9CSACR9N","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-1136","7316UP-1136","{BS_NV67MSZN,BS_GKWT3J9X}","{DNA,RNA}","BS_54HFWZ7Q","7316UP-1136"
"PT_TC4ZC42Z","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-1333","7316UP-1333","{BS_AX1ACH6G,BS_28CTRAS5}","{DNA,RNA}","BS_YEEKXSJD","7316UP-1333"
"PT_P40S0HQK","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-1641","7316UP-1641","{BS_WH6C5Z6S,BS_YF44RR88}","{DNA,RNA}","BS_CNHA65ED","7316UP-1641"
"PT_GKSTR4C1","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-1975","7316UP-1975","{BS_5AZW4DKB,BS_5FRQVRRE}","{DNA,RNA}","BS_M5R9Q04Z","7316UP-1975"
"PT_7Y81E8DQ","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-2013","7316UP-2013","{BS_5ARWRQ8E,BS_TYQ0N4YP}","{DNA,RNA}","BS_CN4VY1RE","7316UP-2013"
"PT_TYWS8PXY","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-2064","7316UP-2064","{BS_HTXZA5CH,BS_D432YMV4}","{DNA,RNA}","BS_PQTX4827","7316UP-2064"
"PT_KCDPF85V","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-2066","7316UP-2066","{BS_EPX4DZN8,BS_8SQARNQW}","{DNA,RNA}","BS_Q16PX3HA","7316UP-2066"
"PT_FAZEJ3VR","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-2238","7316UP-2238","{BS_GF83ZWD3,BS_CV9QNWN7}","{DNA,RNA}","BS_8K1Y6NE6","7316UP-2238"
"PT_6QCSWBBQ","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-2243","7316UP-2243","{BS_S8M4XW3Z,BS_NW3NGVR8}","{DNA,RNA}","BS_VFK2ACZ4","7316UP-2243"
"PT_8YS7JD5V","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-2398","7316UP-2398","{BS_P2377HHG,BS_4Q3NRFBD}","{DNA,RNA}","BS_16KWXER9","7316UP-2398"
"PT_3NQ55S57","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-2413","7316UP-2413","{BS_6FZTHTXK,BS_2QGRF15R}","{DNA,RNA}","BS_5HPHMEF3","7316UP-2413"
"PT_SNTWW5V0","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-2448","7316UP-2448","{BS_77XZNGVC,BS_ZD6XFSVJ}","{DNA,RNA}","BS_MT98YTT4","7316UP-2448"
"PT_DF9T1B5G","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-2477","7316UP-2477","{BS_NKSZBDY8,BS_MKGP8P5A}","{DNA,RNA}","BS_4C6204BS","7316UP-2477"
"PT_JE9RX5DR","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-2508","7316UP-2508","{BS_6AFWGT6Z,BS_ZMA11A2B}","{DNA,RNA}","BS_96TGJVMN","7316UP-2508"
"PT_Q9VRRAR8","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-2515","7316UP-2515","{BS_0Q7HKPD2,BS_R98ZGVVW}","{DNA,RNA}","BS_CFYBAG3K","7316UP-2515"
"PT_3FH27J1X","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-2620","7316UP-2620","{BS_FZPNPDQQ,BS_FA3JYABA}","{DNA,RNA}","BS_60BP98YA","7316UP-2620"
"PT_5SPSW3YF","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-2722","7316UP-2722","{BS_3R4N22YY,BS_TPTEE9Y9}","{RNA,DNA}","BS_29AM0104","7316UP-2722"
"PT_6SYP6A7X","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-2838","7316UP-2838","{BS_SATYBH1F,BS_B47WQMHK}","{DNA,RNA}","BS_B77EHCCP","7316UP-2838"
"PT_FFSCZ7CA","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-2877","7316UP-2877","{BS_J5R6HX4M,BS_74GHJS7N}","{DNA,RNA}","BS_8X5V1WPA","7316UP-2877"
"PT_3G8J8D4M","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-2900","7316UP-2900","{BS_XKCDEJZ2,BS_TQJDGF04}","{DNA,RNA}","BS_HSPX2BV8","7316UP-2900"
"PT_GWBKEDF7","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-2960","7316UP-2960","{BS_4KHK5ZVS,BS_B3516CQJ}","{DNA,RNA}","BS_GQ8X5D7D","7316UP-2960"
"PT_JVZ4T317","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-3083","7316UP-3083","{BS_KHF6DPKN,BS_3XVSTN5S}","{DNA,RNA}","BS_42XKWTEF","7316UP-3083"
"PT_7G7G1B27","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-3249","7316UP-3249","{BS_D4844MVS,BS_SQ6ZN982}","{DNA,RNA}","BS_SHEX782D","7316UP-3249"
"PT_N0KJ1CR2","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-3296","7316UP-3296","{BS_G753Z42N,BS_8AA254XK}","{DNA,RNA}","BS_3QM87V4D","7316UP-3296"
"PT_C9647EKZ","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-3675","7316UP-3675","{BS_HED4167A,BS_7PXAWDC9}","{DNA,RNA}","BS_31NJR956","7316UP-3675"
"PT_XMYN334C","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-3705","7316UP-3705","{BS_A4G7F9CH,BS_0Y1BP5A4}","{DNA,RNA}","BS_331NCNGJ","7316UP-3705"
"PT_33EK2MN8","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-474","7316UP-474","{BS_8EMJ3PZY,BS_CF5HC0PR}","{DNA,RNA}","BS_X70XT4GF","7316UP-474"
"PT_9BJWHYZP","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-528","7316UP-528","{BS_6AYC4A6W,BS_73GEX9CY}","{DNA,RNA}","BS_F48CQZ0E","7316UP-528"
"PT_TJ74G94K","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-580","7316UP-580","{BS_2Y7H85D0,BS_RDBNX7YJ}","{DNA,RNA}","BS_9ZM2J7V4","7316UP-580"
"PT_DRVY2FQQ","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-584","7316UP-584","{BS_0KP6JYPD,BS_QNE4NG2X}","{DNA,RNA}","BS_NFHNG4MV","7316UP-584"
"PT_V469B27Z","{SC_2ZBAMKK0}","7316UP-646","7316UP-646","{BS_NDX7BHNT}","{RNA}","",""
"PT_A8YXZ42Q","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-716","7316UP-716","{BS_M9RTNS4X,BS_CSXE7SXN}","{DNA,RNA}","BS_MNF5CN7X","7316UP-716"
"PT_5JCY0JFD","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-756","7316UP-756","{BS_2EVAFHE0,BS_MJMZZ1EY}","{DNA,RNA}","BS_ZFNG7X82","7316UP-756"
"PT_D75EGJWB","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-857","7316UP-857","{BS_3A8D4FXB,BS_BSNW9PTR}","{DNA,RNA}","BS_HVY3C919","7316UP-857"
"PT_7S81MVTV","{SC_2ZBAMKK0}","7316UP-903","7316UP-903","{BS_0JCEEVZT}","{RNA}","",""
Loading

0 comments on commit 9a1f438

Please sign in to comment.