Skip to content

This project is aimed at clustering pharmaceutical products from various sources and organizing them into coherent clusters based on their names, dosages, and forms. The goal is to create a structured and categorized dataset for further analysis or applications in the pharmaceutical domain.

Notifications You must be signed in to change notification settings

Ayush-Sharma410/Pharmaceutical_Product_Clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

Project Name

Pharmaceutical Product Clustering

Project Overview

This project is aimed at clustering pharmaceutical products from various sources and organizing them into coherent clusters based on their names, dosages, and forms. The goal is to create a structured and categorized dataset for further analysis or applications in the pharmaceutical domain.

Table of Contents

Dataset

The dataset used for this project consists of pharmaceutical product information, including:

  • Medicine names
  • Dosages
  • Forms
  • Sources

The data is sourced from various pharmaceutical suppliers, and it is organized into clusters based on the similarity of the medicine names.

Data Preprocessing

The data preprocessing steps include:

  • Cleaning and standardizing the medicine names
  • Handling missing data
  • Creating a consistent naming format
  • Assigning cluster labels to each product

Clustering

The clustering process involves:

  • Using RapidFuzz and PolyFuzz for matching similar product names
  • Creating clusters and subclusters for each product
  • Organizing the products into coherent groups

File Structure

The project's file structure includes:

  • The dataset in CSV format
  • Jupyter notebooks for data preprocessing and clustering
  • The final clustered dataset in CSV format
  • This README file

Usage

To use the project, follow these steps:

  1. Clone the GitHub repository to your local machine.
  2. Run the Jupyter notebooks for data preprocessing and clustering.
  3. Access the final clustered dataset for your analysis or applications.

Contributing

If you would like to contribute to this project, please follow these steps:

  1. Fork the project.
  2. Create a new branch for your feature or bug fix.
  3. Make your changes and commit them.
  4. Submit a pull request to the main project repository.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

This project is aimed at clustering pharmaceutical products from various sources and organizing them into coherent clusters based on their names, dosages, and forms. The goal is to create a structured and categorized dataset for further analysis or applications in the pharmaceutical domain.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published