-
Notifications
You must be signed in to change notification settings - Fork 22
HCC1395 WGS Exome RNA Seq Data
If you wish to use the HCC1395 and HCC1395/BL whole genome (WGS), exome, and/or RNA-seq data you are welcome to do so but please cite the GMS manuscript and website.
Refer to the following ATCC links for details on the breast cancer and matched normal lymphoblastoid cell lines: HCC1395 at ATCC and HCC1395/BL at ATCC cell lines.
All data are 2x100 bp reads generated on an Illumina HiSeq 2000 instrument. The exome data was generated by use of a NimbleGen SeqCap EZ Human Exome Library v3.0 reagent (download annotation bed file here: NimbleGenExome_v3.bed). For precise details on how the libraries were isolated please refer to the GMS manuscript.
While stored in BAM format for efficiency, all files below contain ALL reads. Some will not actually align (percentage will depend on alignment strategy). We recommend using Picard SamToFastq to convert these BAMs back to Fastq format.
Data type | File name | Link |
---|---|---|
WGS Normal (lane 1) | gerald_D1VCPACXX_6.bam | download |
WGS Normal (lane 2) | gerald_D1VCPACXX_7.bam | download |
WGS Normal (lane 3) | gerald_D1VCPACXX_8.bam | download |
WGS Tumor (lane 1) | gerald_D1VCPACXX_1.bam | download |
WGS Tumor (lane 2) | gerald_D1VCPACXX_2.bam | download |
WGS Tumor (lane 3) | gerald_D1VCPACXX_3.bam | download |
WGS Tumor (lane 4) | gerald_D1VCPACXX_4.bam | download |
WGS Tumor (lane 5) | gerald_D1VCPACXX_5.bam | download |
Exome Normal (lane 1) | gerald_C1TD1ACXX_7_CGATGT.bam | download |
Exome Tumor (lane 1) | gerald_C1TD1ACXX_7_ATCACG.bam | download |
RNAseq Normal (lane 1) | gerald_C2DBEACXX_3.bam | download |
RNAseq Tumor (lane 1) | gerald_C1TD1ACXX_8_ACAGTG.bam | download |
Various downsampled versions of these BAMs are available here:
Data set | Description | Link |
---|---|---|
Full sized | 12 Complete WGS, Exome, RNA-seq BAMs | download |
1/100th | BAMs that were evenly downsampled to 1/100th original size | download |
1/1000th | BAMs that were downsampled to 1/1000th and then supplemented with extra coverage for some regions | download |
Exome only | 2 complete Exome BAMs | download |
After installation of the GMS you should prime the system with one of these data sets using a command like the following (from where you cloned the GMS repository):
./setup/prime-system.pl --data=hcc1395_1tenth_percent --sync=tarball --low_resources --memory=Xgb
To select one of the datasets above set --data
to one of: hcc1395, hcc1395_1percent, hcc1395_1tenth_percent, hcc1395_exome_only
Note: If you just want to use this data and need to create a complete version of the data in FASTQ format, you can do this quite easily using Picard SamToFastq
For example:
java -Xmx2g -jar picard-tools-1.118/SamToFastq.jar INPUT=gerald_D1VCPACXX_6.bam FASTQ=gerald_D1VCPACXX_6_R1.fastq SECOND_END_FASTQ=gerald_D1VCPACXX_6_R2.fastq
Incidentally it is way more space efficient to store your raw data in BAM format as we do rather than FASTQ. Most tools (even aligners) now take a BAM as input. To convert FASTQ to a BAM of unaligned reads. You can use Picard FastqToSam
java -Xmx2g -jar picard-tools-1.118/FastqToSam.jar FASTQ=gerald_D1VCPACXX_6_R1.fastq FASTQ2=gerald_D1VCPACXX_6_R2.fastq OUTPUT=gerald_D1VCPACXX_6.bam