Project Name

Pharmaceutical Product Clustering

Project Overview

This project is aimed at clustering pharmaceutical products from various sources and organizing them into coherent clusters based on their names, dosages, and forms. The goal is to create a structured and categorized dataset for further analysis or applications in the pharmaceutical domain.

Dataset

The dataset used for this project consists of pharmaceutical product information, including:

Medicine names
Dosages
Forms
Sources

The data is sourced from various pharmaceutical suppliers, and it is organized into clusters based on the similarity of the medicine names.

Data Preprocessing

The data preprocessing steps include:

Cleaning and standardizing the medicine names
Handling missing data
Creating a consistent naming format
Assigning cluster labels to each product

Clustering

The clustering process involves:

Using RapidFuzz and PolyFuzz for matching similar product names
Creating clusters and subclusters for each product
Organizing the products into coherent groups

File Structure

The project's file structure includes:

The dataset in CSV format
Jupyter notebooks for data preprocessing and clustering
The final clustered dataset in CSV format
This README file

Usage

To use the project, follow these steps:

Clone the GitHub repository to your local machine.
Run the Jupyter notebooks for data preprocessing and clustering.
Access the final clustered dataset for your analysis or applications.

Contributing

If you would like to contribute to this project, please follow these steps:

Fork the project.
Create a new branch for your feature or bug fix.
Make your changes and commit them.
Submit a pull request to the main project repository.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Clustering.ipynb		Clustering.ipynb
README.md		README.md
index.py		index.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Name

Project Overview

Table of Contents

Dataset

Data Preprocessing

Clustering

File Structure

Usage

Contributing

License

About

Releases

Packages

Languages

Ayush-Sharma410/Pharmaceutical_Product_Clustering

Folders and files

Latest commit

History

Repository files navigation

Project Name

Project Overview

Table of Contents

Dataset

Data Preprocessing

Clustering

File Structure

Usage

Contributing

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages