forked from AlexsLemonade/OpenPBTA-analysis
-
Notifications
You must be signed in to change notification settings - Fork 13
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #163 from PediatricOpenTargets/pedc-names
PedCBio formatted sample ID column
- Loading branch information
Showing
8 changed files
with
28,447 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
## Add formatted sample id column for PedCBio upload | ||
|
||
**Module author:** Run Jin ([@runjin326](https://github.com/runjin326)) | ||
|
||
Currently, for some of the samples, when multiple DNA or RNA specimens are associated with the same sample, there | ||
is no column that would distinguish between different aliquots while still tying DNA and RNA together. | ||
This module adds a column called `formatted_sample_id` where the base name is the sample id and additional `tiebreaks` were added when multiple RNA or DNA samples are associated with the same participant. | ||
|
||
For PBTA samples, `sample_id` column is used as the basename | ||
- Using `sample_id` column, we can tie all DNA and RNA samples together | ||
- Using `formatted_sample_id` column, we can distinguish amongst multiple DNA or RNA samples | ||
- Multiple DNA samples associated with the same sample would use `aliquot_id` as the tie breaker | ||
- Multiple RNA samples associated with the same sample would use `RNA_library` as the tie breaker | ||
|
||
For TARGET, TCGA, and GTEx samples, `Kids_First_Participant_ID` column is used as the basename | ||
- Using `Kids_First_Participant_ID` column, we can tie all DNA and RNA samples together | ||
- Using `formatted_sample_id` column, we can distinguish amongst multiple DNA or RNA samples | ||
- For TARGET, `Kids_First_Participant_ID` + last 7 digits from the `Kids_First_Specimen_ID` is used as formatted sample ID | ||
- For TCGA, `Kids_First_Participant_ID` + `sample_id` + `aliquot_id` is used as formatted sample ID | ||
- For GTEx, `Kids_First_Participant_ID` + `aliquot_id` is used as formatted sample ID | ||
|
||
Usage: | ||
``` | ||
Rscript -e "rmarkdown::render('pedcbio_sample_name_col.Rmd', clean = TRUE)" | ||
``` | ||
or | ||
``` | ||
bash run_add_name.sh | ||
``` | ||
|
||
Input: | ||
- `input/cbtn_cbio_sample.csv` | ||
- `input/oligo_nation_cbio_sample.csv` | ||
- `input/dgd_cbio_sample.csv` | ||
- `input/x01_fy16_nbl_maris_cbio_sample.csv` | ||
|
||
Output: | ||
- `results/histologies-formatted-id-added.tsv` | ||
|
||
The output files are directly uploaded to S3 buckets for loading into PedCBio. |
1,049 changes: 1,049 additions & 0 deletions
1,049
analyses/pedcbio-sample-name/input/cbtn_cbio_sample.csv
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
62 changes: 62 additions & 0 deletions
62
analyses/pedcbio-sample-name/input/oligo_nation_cbio_sample.csv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
"participant_id","sequencing_center_ids","collection_event_id","formatted_sample_id","specimen_id","analyte_types","normal_bs_id","normal_sample_id" | ||
"PT_BSR2P67X","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-1","16510-1","{BS_6EH1T1S8,BS_PQY28YXA}","{DNA,RNA}","BS_BMYBS3EX","16510-1" | ||
"PT_NKGQYCKZ","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-10","16510-10","{BS_E4VCDN6S,BS_JAT12TX2}","{DNA,RNA}","BS_XZMQZMY9","16510-10" | ||
"PT_3B27053H","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-11","16510-11","{BS_AQYC2SXT,BS_17SAB1QN}","{DNA,RNA}","BS_WSS7TGJX","16510-11" | ||
"PT_VK94C42M","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-12","16510-12","{BS_ZK7DMJQS,BS_XFWH05PB}","{DNA,RNA}","BS_NAKGZ1F0","16510-12" | ||
"PT_4VYW4AAZ","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-13","16510-13","{BS_S1RSJCAM,BS_9M84M36J}","{DNA,RNA}","BS_K33VBFJV","16510-13" | ||
"PT_D7GN92VR","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-14","16510-14","{BS_T7PQR658,BS_AS4K2YQW}","{DNA,RNA}","BS_FRY5QDDQ","16510-14" | ||
"PT_1PVVBG1B","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-15","16510-15","{BS_P867VK77,BS_M8A8KHD8}","{DNA,RNA}","BS_1VKXE4H5","16510-15" | ||
"PT_74Y53WN8","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-16","16510-16","{BS_7S2CDYVX,BS_AYAMBZHS}","{DNA,RNA}","BS_E6AVH26R","16510-16" | ||
"PT_F2CG3ZZB","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-17","16510-17","{BS_91P6RMCF,BS_QV09SGSJ}","{DNA,RNA}","BS_FATPVK66","16510-17" | ||
"PT_G24FN04H","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-18","16510-18","{BS_V90NNZ0D,BS_JT5BXPHZ}","{DNA,RNA}","BS_YZDD793H","16510-18" | ||
"PT_PMFS3A49","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-19","16510-19","{BS_0J57HJ1S,BS_ZYPMR035}","{DNA,RNA}","BS_HP0BWRY0","16510-19" | ||
"PT_DN84REQ3","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-2","16510-2","{BS_ZK8DW7Y1,BS_8ZTEXH90}","{DNA,RNA}","BS_X9GKMCXV","16510-2" | ||
"PT_28A07ZDV","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-20","16510-20","{BS_FHJP8HE0,BS_WTVQNPE4}","{DNA,RNA}","BS_5A11TKW2","16510-20" | ||
"PT_JBCAXNK2","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-21","16510-21","{BS_K6BDQS4D,BS_M9CH91QV}","{DNA,RNA}","BS_1B8B622B","16510-21" | ||
"PT_F1MDWB5W","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-22","16510-22","{BS_1SV5KG4Y,BS_2K3FVT86}","{DNA,RNA}","BS_6CXJKHG6","16510-22" | ||
"PT_4X7VW8WT","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-24","16510-24","{BS_PPAGQPMZ,BS_SRD8TFGV}","{DNA,RNA}","BS_AXJ00JPW","16510-24" | ||
"PT_MKWZC1QY","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-25","16510-25","{BS_QBPZQF73,BS_D5MT05YK}","{DNA,RNA}","BS_F3K6K7R6","16510-25" | ||
"PT_STE26Y8Z","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-26","16510-26","{BS_YRNJ7Z3E,BS_NGMQF1B5}","{DNA,RNA}","BS_1HQACYRD","16510-26" | ||
"PT_XQ001FC5","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-3","16510-3","{BS_5PTCMTJ0,BS_A8V685RR}","{DNA,RNA}","BS_VRHRMEW1","16510-3" | ||
"PT_VN63HHEN","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-4","16510-4","{BS_Z45QE3VE,BS_6NJRE82B}","{DNA,RNA}","BS_E758QJDD","16510-4" | ||
"PT_TG2YJS5J","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-5","16510-5","{BS_C4WJZS6W,BS_WQNXWJXE}","{DNA,RNA}","BS_T5QB3H8B","16510-5" | ||
"PT_G9D39RG0","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-6","16510-6","{BS_14PEVAHC,BS_V0JFAS12}","{DNA,RNA}","BS_M9J949F5","16510-6" | ||
"PT_B694AQYE","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-7","16510-7","{BS_FEX3A4SE,BS_1739JAKK}","{DNA,RNA}","BS_9DKCQN5E","16510-7" | ||
"PT_401GQ0VV","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-8","16510-8","{BS_A8P99605,BS_1HHS9AFK}","{DNA,RNA}","BS_KPVPQYD0","16510-8" | ||
"PT_R4JE89YX","{SC_2ZBAMKK0,SC_2ZBAMKK0}","16510-9","16510-9","{BS_EYFC3E6B,BS_BGTPAACY}","{DNA,RNA}","BS_QJ8EE8WP","16510-9" | ||
"PT_88F5PD21","{SC_2ZBAMKK0}","7316-6188","7316-6188","{BS_P1AZ9H9A}","{RNA}","","" | ||
"PT_9CSACR9N","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-1136","7316UP-1136","{BS_NV67MSZN,BS_GKWT3J9X}","{DNA,RNA}","BS_54HFWZ7Q","7316UP-1136" | ||
"PT_TC4ZC42Z","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-1333","7316UP-1333","{BS_AX1ACH6G,BS_28CTRAS5}","{DNA,RNA}","BS_YEEKXSJD","7316UP-1333" | ||
"PT_P40S0HQK","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-1641","7316UP-1641","{BS_WH6C5Z6S,BS_YF44RR88}","{DNA,RNA}","BS_CNHA65ED","7316UP-1641" | ||
"PT_GKSTR4C1","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-1975","7316UP-1975","{BS_5AZW4DKB,BS_5FRQVRRE}","{DNA,RNA}","BS_M5R9Q04Z","7316UP-1975" | ||
"PT_7Y81E8DQ","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-2013","7316UP-2013","{BS_5ARWRQ8E,BS_TYQ0N4YP}","{DNA,RNA}","BS_CN4VY1RE","7316UP-2013" | ||
"PT_TYWS8PXY","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-2064","7316UP-2064","{BS_HTXZA5CH,BS_D432YMV4}","{DNA,RNA}","BS_PQTX4827","7316UP-2064" | ||
"PT_KCDPF85V","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-2066","7316UP-2066","{BS_EPX4DZN8,BS_8SQARNQW}","{DNA,RNA}","BS_Q16PX3HA","7316UP-2066" | ||
"PT_FAZEJ3VR","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-2238","7316UP-2238","{BS_GF83ZWD3,BS_CV9QNWN7}","{DNA,RNA}","BS_8K1Y6NE6","7316UP-2238" | ||
"PT_6QCSWBBQ","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-2243","7316UP-2243","{BS_S8M4XW3Z,BS_NW3NGVR8}","{DNA,RNA}","BS_VFK2ACZ4","7316UP-2243" | ||
"PT_8YS7JD5V","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-2398","7316UP-2398","{BS_P2377HHG,BS_4Q3NRFBD}","{DNA,RNA}","BS_16KWXER9","7316UP-2398" | ||
"PT_3NQ55S57","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-2413","7316UP-2413","{BS_6FZTHTXK,BS_2QGRF15R}","{DNA,RNA}","BS_5HPHMEF3","7316UP-2413" | ||
"PT_SNTWW5V0","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-2448","7316UP-2448","{BS_77XZNGVC,BS_ZD6XFSVJ}","{DNA,RNA}","BS_MT98YTT4","7316UP-2448" | ||
"PT_DF9T1B5G","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-2477","7316UP-2477","{BS_NKSZBDY8,BS_MKGP8P5A}","{DNA,RNA}","BS_4C6204BS","7316UP-2477" | ||
"PT_JE9RX5DR","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-2508","7316UP-2508","{BS_6AFWGT6Z,BS_ZMA11A2B}","{DNA,RNA}","BS_96TGJVMN","7316UP-2508" | ||
"PT_Q9VRRAR8","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-2515","7316UP-2515","{BS_0Q7HKPD2,BS_R98ZGVVW}","{DNA,RNA}","BS_CFYBAG3K","7316UP-2515" | ||
"PT_3FH27J1X","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-2620","7316UP-2620","{BS_FZPNPDQQ,BS_FA3JYABA}","{DNA,RNA}","BS_60BP98YA","7316UP-2620" | ||
"PT_5SPSW3YF","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-2722","7316UP-2722","{BS_3R4N22YY,BS_TPTEE9Y9}","{RNA,DNA}","BS_29AM0104","7316UP-2722" | ||
"PT_6SYP6A7X","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-2838","7316UP-2838","{BS_SATYBH1F,BS_B47WQMHK}","{DNA,RNA}","BS_B77EHCCP","7316UP-2838" | ||
"PT_FFSCZ7CA","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-2877","7316UP-2877","{BS_J5R6HX4M,BS_74GHJS7N}","{DNA,RNA}","BS_8X5V1WPA","7316UP-2877" | ||
"PT_3G8J8D4M","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-2900","7316UP-2900","{BS_XKCDEJZ2,BS_TQJDGF04}","{DNA,RNA}","BS_HSPX2BV8","7316UP-2900" | ||
"PT_GWBKEDF7","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-2960","7316UP-2960","{BS_4KHK5ZVS,BS_B3516CQJ}","{DNA,RNA}","BS_GQ8X5D7D","7316UP-2960" | ||
"PT_JVZ4T317","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-3083","7316UP-3083","{BS_KHF6DPKN,BS_3XVSTN5S}","{DNA,RNA}","BS_42XKWTEF","7316UP-3083" | ||
"PT_7G7G1B27","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-3249","7316UP-3249","{BS_D4844MVS,BS_SQ6ZN982}","{DNA,RNA}","BS_SHEX782D","7316UP-3249" | ||
"PT_N0KJ1CR2","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-3296","7316UP-3296","{BS_G753Z42N,BS_8AA254XK}","{DNA,RNA}","BS_3QM87V4D","7316UP-3296" | ||
"PT_C9647EKZ","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-3675","7316UP-3675","{BS_HED4167A,BS_7PXAWDC9}","{DNA,RNA}","BS_31NJR956","7316UP-3675" | ||
"PT_XMYN334C","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-3705","7316UP-3705","{BS_A4G7F9CH,BS_0Y1BP5A4}","{DNA,RNA}","BS_331NCNGJ","7316UP-3705" | ||
"PT_33EK2MN8","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-474","7316UP-474","{BS_8EMJ3PZY,BS_CF5HC0PR}","{DNA,RNA}","BS_X70XT4GF","7316UP-474" | ||
"PT_9BJWHYZP","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-528","7316UP-528","{BS_6AYC4A6W,BS_73GEX9CY}","{DNA,RNA}","BS_F48CQZ0E","7316UP-528" | ||
"PT_TJ74G94K","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-580","7316UP-580","{BS_2Y7H85D0,BS_RDBNX7YJ}","{DNA,RNA}","BS_9ZM2J7V4","7316UP-580" | ||
"PT_DRVY2FQQ","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-584","7316UP-584","{BS_0KP6JYPD,BS_QNE4NG2X}","{DNA,RNA}","BS_NFHNG4MV","7316UP-584" | ||
"PT_V469B27Z","{SC_2ZBAMKK0}","7316UP-646","7316UP-646","{BS_NDX7BHNT}","{RNA}","","" | ||
"PT_A8YXZ42Q","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-716","7316UP-716","{BS_M9RTNS4X,BS_CSXE7SXN}","{DNA,RNA}","BS_MNF5CN7X","7316UP-716" | ||
"PT_5JCY0JFD","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-756","7316UP-756","{BS_2EVAFHE0,BS_MJMZZ1EY}","{DNA,RNA}","BS_ZFNG7X82","7316UP-756" | ||
"PT_D75EGJWB","{SC_2ZBAMKK0,SC_2ZBAMKK0}","7316UP-857","7316UP-857","{BS_3A8D4FXB,BS_BSNW9PTR}","{DNA,RNA}","BS_HVY3C919","7316UP-857" | ||
"PT_7S81MVTV","{SC_2ZBAMKK0}","7316UP-903","7316UP-903","{BS_0JCEEVZT}","{RNA}","","" |
Oops, something went wrong.