AI to identify cancerous cells on histological images
CatHacks VII 3rd place winner
Prior to the event we have learned that cancer is the second most common cause of death in United States, according to CDC. Particularly, WHO reported 570,000 women* diagnosed with cervical cancer# and 311,000 deaths from this disease in 2018. However, cervical cancer is completely preventable if caught on pre-cancerous or early stages. Pap smear is a gynecological test, which allows to detect HPV infection (leading cause of cervical cancer), pre-cancerous and cancerous lesions of the cervix. This test is considered a “gold standard” for cervical cancer detection and is used worldwide for routine screening. According to recent data, Pap smears have single handedly reduced incidence of cervical cancer by 75% over the past 60 years in USA. Although Pap tests allowed to reduce incidences of cervical cancer, there is still room for improvement as this field as the disease is completely preventable and with improvement of screening technologies cervical cancer can be completely eliminated. Particularly, development of automated Pap smear analysis would allow to interpret more smears in reduced time as well as decrease workload of doctors and lab personnel.
Inspired by these stats and trends in cervical cancer rates, we decided to develop a web application linked to a machine learning model, which will allow to analyze histological images of Pap smears. Particularly, we have constructed a database and trained a machine learning model to identify and count cells on Pap smear image as well as classify them by one of the five most common cell types. Also, the developed machine learning model can identify clumps of cells in the image, which is indicative of cervical cancer. Moreover, we have linked the model to a web application to improve user experience and add intuitive interface to the application. The developed web application and machine learning model allow healthcare professionals to create an account in the service, create patient profiles, upload Pap smear images, and obtain analysis results, which include cell identification, classification by cell type, cell count, and cell clump identification. Moreover, interface of the application has user-friendly and visually appealing appearance which will improve user experience with the app.
We have built our project using MERN stack for the web part of it - MongoDB as a database management system, React for front end, express.js for back end, and node.js as our server. We have used TailwindCSS and Headless UI for the developing the front end. The ML part of the project is developed using PyTorch, OpenCV, and Detectron2. We have used the Cervical Cancer largest dataset (SipakMed) dataset from Kaggle. Also, we have used Figma for the web pages design and Canva for the logo design. Finally, we have used Docker to simplify the development process.