🚀 End-to-End Machine Learning MLOps Project

Building, Deploying, and Managing a Machine Learning Pipeline with CI/CD, Docker, and Cloud

🎯 Project Overview

This project demonstrates the implementation of a complete Machine Learning Operations (MLOps) pipeline by building and deploying a classifier to predict water quality as drinkable or not drinkable based on various water properties (e.g., pH, Turbidity, Chloramines).

During the model selection phase, Logistic Regression and K-Nearest Neighbors (KNN) models were trained and evaluated alongside a Decision Tree Classifier, with the Decision Tree ultimately selected for deployment based on its performance and interpretability.

This project integrates Exploratory Data Analysis, model training, model selection and hyperparameter tuning, deployment, containerization, CI/CD automation, and cloud deployment, simulating a real-world production environment.

🛠️ Tools and Technologies Used

Category	Tools
Programming	Python (numpy, pandas, scikit-learn)
Model Training	LogisticRegressionClassifier, KNN, DecisionTreeClassifier (scikit-learn)
Data Preprocessing	Pipelines (Imputation, Scaling)
API Development	FastAPI
Containerization	Docker, DockerHub
CI/CD	GitHub Actions
Cloud Deployment	Render
Version Control	Git, GitHub
Security	GitHub Secrets, DockerHub Access Tokens

📝 Project Workflow

🔹 1️⃣ Data Preprocessing

Imported dataset, handled missing values using mean imputation.
Built a data preprocessing pipeline to automate feature scaling and transformation.

🔹 2️⃣ Model Training

Trained a Decision Tree Classifier for binary classification.
Fine-tuned hyperparameters (max_depth, min_samples_split) using grid search.
Saved the trained model as a serialized file (.pkl) using joblib.

🔹 3️⃣ API Development

Built a FastAPI application to serve the trained model.
Exposed a /predict endpoint for predictions.
API accepts JSON input, preprocesses the data, and returns predictions in real-time.

🔹 4️⃣ Dockerization

Packaged the application into a Docker container for easy deployment.
Created a Dockerfile to ensure the app runs consistently across environments.
Pushed the container image to DockerHub for reuse and deployment.

🔹 5️⃣ CI/CD Pipeline

Configured GitHub Actions to automate:
- Building the Docker image.
- Pushing the image to DockerHub.
- Triggering redeployment on Render.

🔹 6️⃣ Cloud Deployment

Deployed the containerized application to Render, a cloud hosting platform.
Configured environment variables for security (e.g., DockerHub credentials).
Monitored application logs for debugging and performance insights.

📊 Key Features

End-to-End MLOps Workflow: Covers every step from data preprocessing to model deployment.
Cloud Deployment: Real-world deployment using Render.
Reproducibility: Automated CI/CD ensures consistent builds and deployments.
Scalability: Dockerized application enables horizontal scaling and portability.
Security: Managed sensitive credentials with GitHub Secrets and DockerHub tokens.

🛠️ How to Run the Project Locally

🔹 1️⃣ Clone the Repository

git clone https://github.com/camm93/WaterQualitySystem.git

cd WaterQualitySystem

🔹 2️⃣ Build and Run the Docker Container

docker build -t WaterQualitySystem .

docker run -d -p 8000:8000 WaterQualitySystem

🔹 3️⃣ Test the API

Use Postman or curl to test the /predict endpoint.

Example curl command:

curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{
  "pH": 7.13,
  "Dureza": 173.69,
  "Sólidos": 19309.57,
  "Cloraminas": 6.53,
  "Sulfatos": 372.54,
  "Conductividad": 295.39,
  "Carbono_orgánico": 7.27,
  "Trihalometanos": 88.79,
  "Turbidez": 3.40
}'

Expected Response:

{
    "prediction": "NO",
    "probability": [
        0.9414033798677441,
        0.058596620132255695
    ]
}

🛠️ Folder Structure

.
├── app.py                  # FastAPI application
├── model.pkl               # Serialized Decision Tree model
├── Dockerfile              # Docker container configuration
├── requirements.txt        # Python dependencies
├── README.md               # Project documentation
└── .github/workflows       # CI/CD configuration
    └── deploy.yml          # GitHub Actions workflow

📈 To Recap

Model Deployment: Transitioning from Jupyter notebooks to a production-ready API.
Docker: Containerizing applications for consistent deployments.
CI/CD: Automating builds, tests, and deployments using GitHub Actions.
Cloud Hosting: Deploying machine learning APIs to cloud platforms.
Security: Managing secrets and sensitive credentials for DockerHub and Render.

🚀 Potential Improvements

Add model monitoring (e.g., API latency, prediction drift) using Prometheus + Grafana.
Incorporate MLflow for experiment tracking and model versioning.
Deploy the app to AWS (EC2, Lambda) or GCP for real-world scalability.
Add unit tests for API endpoints using pytest.
Enhance the CI/CD pipeline with rollback mechanisms and more extensive test coverage.

💬 Contact

For questions or collaboration opportunities, feel free to reach out:

Email: crismur_93hotmail.com
LinkedIn: Cristian Murillo

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
data		data
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Readme.md		Readme.md
WaterQualityClassifier.ipynb		WaterQualityClassifier.ipynb
app.py		app.py
decision_tree_model.pkl		decision_tree_model.pkl
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 End-to-End Machine Learning MLOps Project

🎯 Project Overview

🛠️ Tools and Technologies Used

📝 Project Workflow

🔹 1️⃣ Data Preprocessing

🔹 2️⃣ Model Training

🔹 3️⃣ API Development

🔹 4️⃣ Dockerization

🔹 5️⃣ CI/CD Pipeline

🔹 6️⃣ Cloud Deployment

📊 Key Features

🛠️ How to Run the Project Locally

🔹 1️⃣ Clone the Repository

🔹 2️⃣ Build and Run the Docker Container

🔹 3️⃣ Test the API

🛠️ Folder Structure

📈 To Recap

🚀 Potential Improvements

💬 Contact

About

Releases

Packages

Languages

camm93/WaterQualitySystem

Folders and files

Latest commit

History

Repository files navigation

🚀 End-to-End Machine Learning MLOps Project

🎯 Project Overview

🛠️ Tools and Technologies Used

📝 Project Workflow

🔹 1️⃣ Data Preprocessing

🔹 2️⃣ Model Training

🔹 3️⃣ API Development

🔹 4️⃣ Dockerization

🔹 5️⃣ CI/CD Pipeline

🔹 6️⃣ Cloud Deployment

📊 Key Features

🛠️ How to Run the Project Locally

🔹 1️⃣ Clone the Repository

🔹 2️⃣ Build and Run the Docker Container

🔹 3️⃣ Test the API

🛠️ Folder Structure

📈 To Recap

🚀 Potential Improvements

💬 Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages