Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

September 2024: Public Release Studies #2044

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Conversation

Rima-Waleed
Copy link
Collaborator

@Rima-Waleed Rima-Waleed commented Jul 8, 2024

@ritikakundra
Copy link
Collaborator

ritikakundra commented Jul 30, 2024

https://private.cbioportal.mskcc.org/study/summary?id=mixed_msk_tcga_2021:

  • So this is germline study from MSK-IMPACT? The IDs are all different. Can we confirm these are not part of the main impact series?
    *The samples are not DMP samples so they would not be part of the main cohort.
  • Do all the samples have a RAD51B mutation? The description says so.
    *All cases have RAD51B germline alterations but only somatic variants are reported in cBioPortal. Otherwise they would all be have RAD51B mutations.
  • Need to confirm if the legalities are taken care of.
    *Yes, all the required approvals are handeled (senior author, Dr Mandelker, is part of the MSK clinical genetics team and handled that)
  • Confused about TCGA, were the TCGA samples sequenced on targeted panel?
    *No they were not sequenced, the TCGA samples are TCGA cases that were identified to have loss of function germline RAD51B mutations. Authors downloaded their BAM files and ran them through their pipeline. We then deposited the processed data on cbio.
  • The paper is calling them with old TCGA IDs but we are not reporting the same way
    *The MSK IDs were de-identified and the TCGA id's were reformatted in the portal. Confirmed by verifying the supplemental files.
Screenshot 2024-07-30 at 12 57 26 PM

@ritikakundra
Copy link
Collaborator

ritikakundra commented Jul 30, 2024

https://private.cbioportal.mskcc.org/study/summary?id=gist_msk_2023:

  • Can institute source be normalized
  • Race is also all upper case

*Following points are pending author response. Followed back with author and still no response for:

  • Can we figure out the CDKN2A issue?
  • The paper talks about RFS curves, can we get that?
  • There was a risk stratification done. Is that possible to curate?

@ritikakundra
Copy link
Collaborator

ritikakundra commented Jul 30, 2024

https://private.cbioportal.mskcc.org/study/summary?id=brca_fuscc_2020:

  • The supp file has so much more clinical info. Why is that not part of the cohort?

@rmadupuri
Copy link
Collaborator

rmadupuri commented Jul 31, 2024

Thanks for working on this Rima! A couple of things we could improve:

https://private.cbioportal.mskcc.org/study/summary?id=ptad_msk_2024

  • Update the Journal name to Acta Neuropathologica
  • There are both TMB nonsynonymous and IMPACT TMB score attributes and the scores shouldnt vary much bt them, can we double check?
    *Recalculated both TMB attributes and removed IMPACT TMB score attributes.
  • Clinical:
    • Attributes to Remove: Archer panel, Impact TMB Percentile (Across All Tumor Types), Impact TMB Percentile (Within Tumor Type), Impact TMB Score attributes, MSI comment
    • Attributes to Update: Sample coverage -> Sample Coverage
    • metastatic site has Epidural as value
      *Confirmed epidural metastasis and metastasis to spine or epidural space.
    • Age attribute here refers to patients current age? Is this required?
  • Timeline events:
    • Timeline Events Availability table has some events capitalized and others in camel case. Can we normalize this?
    • Can we check with authors if the timeline/treatment events are to be released. Dont see many discussed in the paper.
      *Confirmed with author, he asked to keep timeline events.
  • From Fig1 we can add the following clinical attributes
    • Cohort: Retrospective/Prospective
    • Subset: Treatment-refractory/Benign
    • PitNET shows 10 samples are metastatic, the Sample_Type attr on portal shows 11. Can we double check? *Confirmed with author: 10 patients with metastasis and some patients have >1 sample, therefore 11 samples are shown to have metastasis.
    • Lineage, Histologic features, Radiotherapy Status.
  • The variant counts are a bit different in portal in comparison to fig1. There are a couple of variants missing for ex, USP8 missense mutation in TR-4 etc, can we double check?
    *Confirmed with author: The oncoprint in the paper only includes oncogenic/likely oncogenic mutations, while cBio study page displays all mutations in the oncoprint. Also, the USP8 row in the paper's oncoprint is based on WES, which will be uploaded to dbGAP. The USP8 row is NOT based on the data from cbioportal.

https://private.cbioportal.mskcc.org/study/summary?id=plmeso_msk_2024

  • Update genomic near haploidization to genomic near-haploidization. The paper defines it this way.
  • Missing Somatic Status.
  • Did the paper analyze 10 patients with GNH (from results section)? Portal shows 14. Cant verify as the full paper is not accessible.
    *10 GNH cases are from the study and 4 additional GNH validation IMPACT cases were added (study GNH cohort and validation GNH cohort) adding up to 14 samples.
  • Paper mentions SETDB1 mutations are present in all GNH tumors. But only IMPACT505 has this gene. Guess we only show it as profiled for impact505 patients.
    *Correct, not all samples were sequenced with impact505 (panel with SETDB1 gene) so only those sequenced with impact505 are showing this gene

https://private.cbioportal.mskcc.org/study/summary?id=lms_msk_2024

  • Paper mentions 195 soft tissue LMS & 238 uterine LMS were analyzed (total 433). portal shows 435 patients.
    *2 extra patients were added for the private cohort, removed from the public.
  • Remove MSI Comment, Varinat type?
  • Does 1 in Necrosis mean Yes?
  • Can we add cm to tumor size attr
  • Progression Type, Metastatic Site has "
  • Has Germline data. Is this supposed to be released?
    *Germline data/patients removed and added in a separate private cohort.

https://www.cbioportal.org/study/summary?id=pancan_mappyacts_2022
*Tried to extract as much data as possible from the team. Details are added to the README.MD

  • Can we update the name to Pediatric European MAPPYACTS Trial (Gustave Roussy, Cancer Discov 2022).
  • Per Fig1 628 samples from 624 pats has successful WES. Should the case list and profile counts be per the flowchart in fig1? Case lists show different numbers.
    Screenshot 2024-07-31 at 10 20 46 AM
  • And should the cohort size also be 624p/632s that had successful WES/RNA-seq/Panel sequencing. The other samples had either screening failure or no sequencing done. From Fig1.
  • Can we get the RNA-seq data?
  • Are we not including the Panel sequencing data?

https://private.cbioportal.mskcc.org/study/summary?id=cscc_ranson_2022

  • Rename Tumor Site -> Metastatic Site and Primary location -> Primary Site
  • Missing mutational signatures SBS, ID data from Supplementary Table S2.
  • Mut% for a couple of genes is off when compared to Fig2G of paper. Can we double check?
    *Pending author response. Authors account for synonymous mutations.

https://private.cbioportal.mskcc.org/study/summary?id=mbn_msk_2024
*Pending author response for both points below (followed up):

  • Can we double check with the authors on the clinical data? Don't see any attributes used/mentioned in the paper. We can remove many if not needed.
  • Can we get the subtype info as mentioned in the paper?

https://private.cbioportal.mskcc.org/study/summary?id=hcc_clca_2024

  • Update the description to The Chinese Liver Cancer Atlas (CLCA) project. Deep whole-genome sequencing of 494 hepatocellular carcinomas and their matched normals. Data from [CLC Atlas](http://lifeome.net:8080/clca/#/)
  • Can AFP ( Alpha-Fetoprotein) and CA199U/ml be updated to NUMBER's? And their names be updated to AFP (Alpha-Fetoprotein) and CA199 (U/ml)? Can we add a description to CA199? What does this mean?
  • The values in Histopathological Type are not in camel case.
  • Supp Table 1a has more clinical elements like Vital Status, RFS, Tumor size etc, can we add the missing elements to the cohort?
  • Supp Table 3 has mutational signatures data.
  • Paper mentions After stringent quality control, a total of 9,287,828 somatic mutations was identified. The data file has 283,226 total variants. Are we missing data?
    *Emailed authors for confirmation. Seems like only 283,226 mutations with potential targets are supplied in supp files
  • The CLCA site also has CNA and SV data for the samples, can we add that?
    *CNA data is given as summary; reached out to author. Added SV

https://private.cbioportal.mskcc.org/study/summary?id=ucec_msk_2024

  • Update journal to Nature Medicine in study name
  • Paper has PFS, treatment, response info - is it possible to add that to the cohort?
    *Pending author response

@ritikakundra
Copy link
Collaborator

ritikakundra commented Aug 30, 2024

https://private.cbioportal.mskcc.org/study/summary?id=plmeso_msk_2024:

  • There is survival data in the supp files
  • Can we not get data for non-GNH cases?
  • Supp files have more clinical data
  • The SETDB1 issue that @rmadupuri brought up is still an issue I feel. The paper and portal are not in sync.

*Author requested to keep only 14 GNH cases and the following attributes: “Patient ID”, “Sample ID”, “Sample Display Name”, and “Genomic near haploidization” in that order. Just four metadata.

@ritikakundra
Copy link
Collaborator

ritikakundra commented Aug 30, 2024

cscc_ranson_2022:

  • Can we normalize the genomic profiles tables? Sometimes it is the proper case and some are lower
  • @rmadupuri @Rima-Waleed for the % off I think its cos the figure accounts for the synonymous mutations.
    *Still pending on author response.
    Screenshot 2024-08-30 at 4 35 44 PM

@ritikakundra
Copy link
Collaborator

ritikakundra commented Sep 4, 2024

https://www.cbioportal.org/study/summary?id=lms_msk_2024:

  • The survival data needs more public friendly description. They are from IMPACT.
  • All three need a general description with what is the anchor date and freeze date.
    *Added the proper general description for OS, PFS, DSS.

@ritikakundra
Copy link
Collaborator

ritikakundra commented Sep 4, 2024

https://www.cbioportal.org/study/summary?id=pancan_mappyacts_2022:

@ritikakundra
Copy link
Collaborator

ritikakundra commented Sep 4, 2024

@Rima-Waleed for https://private.cbioportal.mskcc.org/study/summary?id=mixed_msk_tcga_2021:
You said The samples are not DMP samples so they would not be part of the main cohort. But in the paper, under methods, it says:
*The samples are IMPACT samples but since the study is old, the IDs are de-identified hence they're different.
Screenshot 2024-09-04 at 5 37 55 PM

@ritikakundra
Copy link
Collaborator

ritikakundra commented Sep 5, 2024

https://private.cbioportal.mskcc.org/study/summary?id=mbn_msk_2024:

  • I still see "". Why are we still seeing them?

@ritikakundra
Copy link
Collaborator

ritikakundra commented Sep 5, 2024

https://private.cbioportal.mskcc.org/study/summary?id=ucec_msk_2024

  • These are MSK-IMPACT cases so why is the ID different?
    *The paper was already accepted when we got the request to create a cohort. We had to create the cohort with the ID's that was included in the paper.

@ritikakundra
Copy link
Collaborator

ritikakundra commented Sep 5, 2024

https://private.cbioportal.mskcc.org/study/summary?id=hcc_clca_2024

  • What is _60500 for AFP?
  • What is _0.6 for CA19-9
    *< character imported as _
  • Can we confirm the PIVKA2 range?
    *Confirmed from supplementary files & reached out to author (pending).

@ritikakundra
Copy link
Collaborator

ritikakundra commented Sep 5, 2024

https://private.cbioportal.mskcc.org/study/summary?id=brca_fuscc_2020::

  • Can we change the NGS ID to String and make it a 0 priority?

@rmadupuri rmadupuri changed the title July- Public Release Studies September 2024: Public Release Studies Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants