Chaoyang

Dataset Information

The Chaoyang dataset is a pathological image dataset used for classification, focusing on the diagnosis of colorectal cancer. The dataset comprises colon slice images from Beijing Chaoyang Hospital affiliated with Capital Medical University, scanned under a 20x objective. It includes a variety of lesion categories, such as normal, serrated lesions, adenocarcinomas, and adenomas. The dataset contains a total of 6,160 images, with 4,021 in the training set and 2,139 in the test set. The dataset includes annotation noise common in real-world scenarios, enhancing the robustness of models developed for handling real-world data. Additionally, the test set is annotated consistently by three pathologists, ensuring the reliability of the test set labels. The dataset includes various types of colorectal lesions, providing a multi-class classification challenge that aids in improving the performance of pathology models in fine-grained classification tasks.

Pathological research on colorectal cancer involves a detailed analysis of the carcinogenic process of glandular epithelial cells in the colon wall and their histological characteristics, to facilitate early diagnosis and precise treatment. Lesion types include normal colon mucosa, serrated lesions (with potential for malignancy), adenomas (precancerous lesions), and adenocarcinomas (invasive cancer). Pathological diagnosis allows for the development of personalized treatment plans, improving patient survival rates and providing prognostic assessments.

Dataset Meta Information

Dimensions	Modality	Task Type	Anatomical Structures	Anatomical Area	Number of Categories	Data Volume	File Format
2D	Pathology	Classification	Colon	Colon	4	6160	JPG

Resolution Details

Dataset Statistics	size
min	(512, 512)
median	(512, 512)
max	(512, 512)

Label Information Statistics

Category	Normal	Serrated	Adenocarcinoma	Adenoma
Number of Cases	1,816	1,163	2,244	937
Percentage	29.48%	18.88%	36.43%	15.21%

Visualization

Examples of normal and serrated categories in the article.

A dataset of examples of articles with adenocarcinoma and adenoma categories in articles.

File Structure

chaoyang-dataset
│
├── train
│   ├── image1
│   ├── image2
│   └── ...
├── test
│   ├── image1
│   ├── image2
│   └── ...
├── train.json
├── test.json

Authors and Institutions

Chuang Zhu (School of Artificial Intelligence, Beijing University of Posts and Telecommunications)

Wenkai Chen (School of Artificial Intelligence, Beijing University of Posts and Telecommunications)

Ting Peng (School of Artificial Intelligence, Beijing University of Posts and Telecommunications)

Ying Wang (Beijing Chaoyang Hospital affiliated with Capital Medical University)

Mulan Jin (Beijing Chaoyang Hospital affiliated with Capital Medical University)

Source Information

Official Website: https://bupt-ai-cz.github.io/HSA-NRL/

Download Link: https://bupt-ai-cz.github.io/HSA-NRL/

Article Address: https://ieeexplore.ieee.org/abstract/document/9600806

Publication Date: 2021-11

Citation

@article{zhu2021hard,
  title={Hard sample aware noise robust learning for histopathology image classification},
  author={Zhu, Chuang and Chen, Wenkai and Peng, Ting and Wang, Ying and Jin, Mulan},
  journal={IEEE transactions on medical imaging},
  volume={41},
  number={4},
  pages={881--894},
  year={2021},
  publisher={IEEE}
}

Original introduction article is here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chaoyang.md

Chaoyang.md

Chaoyang

Dataset Information

Dataset Meta Information

Resolution Details

Label Information Statistics

Visualization

File Structure

Authors and Institutions

Source Information

Citation

Files

Chaoyang.md

Latest commit

History

Chaoyang.md

File metadata and controls

Chaoyang

Dataset Information

Dataset Meta Information

Resolution Details

Label Information Statistics

Visualization

File Structure

Authors and Institutions

Source Information

Citation