Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BreastInvasiveCarcinoma dataset #7905

Merged
merged 19 commits into from
Sep 2, 2023
Merged

Conversation

Favourj-bit
Copy link
Contributor

This is a dataset that was generated by integrating the breast cancer (BRCA TCGA) dataset from the cBioPortal (cbioportal.org) and a biological network for node connections from Pathway Commons (www.pathwaycommons.org). The dataset contains the gene features of each patient and the overall survival time (in months) of each patient, which are the labels.

This is a dataset that was generated by integrating the breast cancer (BRCA TCGA) dataset from the cBioPortal (cbioportal.org) and a biological network for node connections from Pathway Commons (www.pathwaycommons.org). The dataset contains the gene features of each patient and the overall survival time (in months) of each patient, which are the labels.
@Favourj-bit Favourj-bit requested a review from wsad1 as a code owner August 19, 2023 10:26
@codecov
Copy link

codecov bot commented Aug 19, 2023

Codecov Report

Merging #7905 (1e557bb) into master (6847849) will decrease coverage by 0.73%.
The diff coverage is n/a.

❗ Current head 1e557bb differs from pull request most recent head 41ec8c5. Consider uploading reports for the commit 41ec8c5 to get more accurate results

@@            Coverage Diff             @@
##           master    #7905      +/-   ##
==========================================
- Coverage   90.25%   89.52%   -0.73%     
==========================================
  Files         459      459              
  Lines       26954    26951       -3     
==========================================
- Hits        24328    24129     -199     
- Misses       2626     2822     +196     

see 31 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@Favourj-bit Favourj-bit changed the title Added breast_cancer_graph_dataset Added breast_invasive_carcinoma_brca.py Aug 21, 2023
@rusty1s rusty1s changed the title Added breast_invasive_carcinoma_brca.py Add BreastInvasiveCarcinoma dataset Aug 21, 2023
Copy link
Member

@akihironitta akihironitta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding this dataset! I've just directly made some changes to your PR, but please have a look and feel free to revert them if necessary :)

Comment on lines 17 to 22
r"""The breast cancer (BRCA TCGA) dataset from `cBioPortal
<https://www.cbioportal.org>`_ and the biological network for node
connections from `Pathway Commons <https://www.pathwaycommons.org>`_.
The dataset contains the gene features of each patient in graph_features
and the overall survival time (in months) of each patient,
which are the labels.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you check the docstring change as I've just removed some redundant sentences? Also, it'd be ncie if you could describe what nodes and edges represent to help new users understand this dataset. (example: https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.datasets.KarateClub.html#torch_geometric.datasets.KarateClub)

@Favourj-bit
Copy link
Contributor Author

Hi @akihironitta
Thank you for your review. I noticed that you removed the link that directs to the repository where it shows how the data was created and modelling with the data. I think this is an important information that should be included, and during my discussion with @rusty1s , he mentioned i could include this in the docstring.
I'll be adding this back, but apart from that, I'm okay with the other changes

@rusty1s rusty1s enabled auto-merge (squash) September 2, 2023 11:43
@rusty1s rusty1s merged commit 9f33615 into pyg-team:master Sep 2, 2023
JakubPietrakIntel pushed a commit that referenced this pull request Sep 27, 2023
This is a dataset that was generated by integrating the breast cancer
(BRCA TCGA) dataset from the cBioPortal (cbioportal.org) and a
biological network for node connections from Pathway Commons
(www.pathwaycommons.org). The dataset contains the gene features of each
patient and the overall survival time (in months) of each patient, which
are the labels.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: Matthias Fey <matthias.fey@tu-dortmund.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants