Skip to content

Data Migration

Compare
Choose a tag to compare
@s-paquette s-paquette released this 15 Jun 18:21
· 5092 commits to isb-cgc-prod since this release

Associated Releases of ISB-CGC-Common: 3.0 and ISB-CGC-API: 3.0

Multiple Program Datasets

In collaboration with the GDC we now have TARGET pediatric cancer data available for analysis in the user interface. You are now able to create cohorts and plot analysis with information from TARGET, TCGA, and CCLE data.

In addition, we have replaced the previous APIs with a new version that supports the new user interface.

We have also released the analyzed data types that are based on genome build GRCh38 for TCGA and TARGET data. GRCh37 (HG19) is also still available for TCGA, TARGET, and CCLE datasets.

Workbooks, cohorts, and variables favorites list created before the data structure migration will still be available for analysis and have been labeled as legacy and version 1. If you have difficulty using version 1 workbooks, please contact us

Please Note:

NOTE 1: A number of TCGA and CCLE case IDs shown below will have been removed from all cohorts since they are no longer available from NCI’s Genomics Data Commons, and ISB-CGC is trying to mirror that data as much as possible.

TCGA cases: TCGA-33-4579, TCGA-35-3621, TCGA-66-2746, TCGA-66-2747, TCGA-66-2750, TCGA-66-2751, TCGA-66-2752, TCGA-AN-A0FE, TCGA-AN-A0FG, TCGA-BH-A0B2, TCGA-BR-4186, TCGA-BR-4190, TCGA-BR-4194, TCGA-BR-4195, TCGA-BR-4196, TCGA-BR-4197, TCGA-BR-4199, TCGA-BR-4200, TCGA-BR-4205, TCGA-BR-4259, TCGA-BR-4260, TCGA-BR-4261, TCGA-BR-4263, TCGA-BR-4264, TCGA-BR-4265, TCGA-BR-4266, TCGA-BR-4270, TCGA-BR-4271, TCGA-BR-4272, TCGA-BR-4273, TCGA-BR-4274, TCGA-BR-4276, TCGA-BR-4277, TCGA-BR-4278, TCGA-BR-4281, TCGA-BR-4282, TCGA-BR-4283, TCGA-BR-4284, TCGA-BR-4285, TCGA-BR-4286, TCGA-BR-4288, TCGA-BR-4291, TCGA-BR-4298, TCGA-BR-4375, TCGA-BR-4376, TCGA-DM-A286, TCGA-E2-A1IP, TCGA-F4-6857, TCGA-GN-A261, TCGA-O2-A5IC, TCGA-PN-A8M9

CCLE cases: LS123, LS1034

NOTE 2: The number of cases and samples when viewed in the User Interface as compared to the BigQuery tables vary across all three projects (TCGA, TARGET, and CCLE). This is because the user interface reflects the data available at the Genomic Data Commons, whereas data in BigQuery reflects either (for TCGA and CCLE) data at the original TCGA data coordinating center supplemented with Genomic Data Commons Data, or for TARGET, data received from the TARGET data coordinating center, not the Genomic Data Commons.

NOTE 3: We have removed Google Genomics functionality from the user interface. You will still be able to access CCLE open access data in Google Genomics from the command line. We are open to adding Google Genomics controlled data back into the user interface if you have a use case for it. Also we are restructuring the handling of multiple Programs of data. Please feel free to provide feedback <https://groups.google.com/a/isb-cgc.org/forum/#!newtopic/feedback>_.

NOTE 4: For TARGET data the clinical and Gene Expression files themselves are available in the system. The bam files will be available soon!

New Enhancements

  • You will be returned a more detailed error message when uploading your own user data.
  • The user interface now displays the same nomenclature as the Genomic Data Commons (GDC).

Bug Fixes

  • The user data upload is enabled and users can now upload their own datasets and create cohorts using existing programs and newly uploaded data by the user.
  • You can now have multiple Google Cloud Projects associated to your account and use only one bucket and dataset on one project with no interference.

Known Issues in this Data Structure Migration Sprint as of 05/25/2017

  • Analysis Type : Seq peek Formatting Elongated on occasion
  • The CCLE data in GUI is not parallel to the CCLE data in BigQuery.
  • When a user duplicates a worksheet, then tries to implement the log scale it will not function properly.
  • On the existing cohorts table list page, the confirmation delete ‘blue x’ button does not remove selected cohort if you select another option e.g Set Operation. The same issue can be found in reverse if you select the ‘blue x’ on the confirmation page for set operation you can then select the delete button and see the cohort on the confirmation panel.
  • On the cohort view files page there are capitalization bugs on the Platform filter.
  • Swap values is not working properly for the plot settings.
  • The set operation for existing cohorts complement is behaving exceptionally slow.
  • A duplication of the exact cohort happens when you select the confirmation multiple times while the page is loading working with Set Operations.
  • When working with a new worksheet or a duplicate worksheet with workbooks for categorical features e.g bar chart you can select the log option. The log option only applies to numerical options.
  • When working with workbooks, if you select the delete confirmation button multiple times while the page is loading you will be sent to an error page.
  • When working on a scatter plot the Tobacco Smoking being used as the Legend is displayed in numerical values when it should be displayed as categorical values.
  • The character limit for a workbook title name is currently inactive, if you exceed the possible limit you will be sent to an error page.
  • You currently cannot plot user uploaded data when working with workbooks.
  • Selecting cohort from worksheet “To Complete Analysis” section will send you to a 400 Bad Request error.
  • You will experience latency issues when working with the create a new cohort page.
  • When plotting, certain values will be displayed as numerical when it should be a categorical value e.g Tobacco Smoking History.
  • On the File List page you currently unable to access the bam files for the IGV Browser associated to build hg38 when working with TCGA data.