Predict productivity loss by analyzing social media usage patterns! This project explores how social media usage impacts productivity and provides a predictive model to assess productivity loss. It combines data exploration, preprocessing, model training, and an interactive web app interface for a smooth user experience.
🔗 Live Demo: Productivity Loss Predictor
🔗 GitHub Repo: Productivity Loss Predictor
📊 Data Source: Time Wasted on Social Media - Kaggle
- Predict Productivity Loss: Model the impact of social media use on productivity using machine learning.
- Extensive Data Processing: Clean, preprocess, and transform data for accurate prediction.
- Multiple Algorithms Tested: Tested both regression and classification algorithms to find the best model for predicting productivity loss.
- Voting Ensemble for Accuracy: Top 3 models are combined using a hard voting ensemble to maximize prediction accuracy.
- Streamlit Web Application: Provides an intuitive UI for end-users to make predictions and visualize results.
- Dockerized Deployment: Easily deployable Docker setup for streamlined application hosting.
- Requirements Installation: Install necessary packages from
dev.requirements.txt
andprod.requirements.txt
. - Data Exploration: Analyze dataset features and distributions.
- Data Cleaning: Handle missing data, remove duplicates, and filter outliers.
- Data Preprocessing: Normalize numerical data, encode categorical data, and perform feature scaling.
- Model Preparation: Train and evaluate multiple models to identify the top performers.
- Model Selection and Pipeline Creation: Assemble the top models into a voting classifier.
- Pipeline Testing: Ensure accuracy and reliability of the entire model pipeline.
- Model Exporting: Save the final model and unique values using
pickle
for easy deployment.
To create a robust prediction model, several machine learning algorithms were tested, both in regression and classification modes. The final model is a voting classifier, which leverages the strengths of the top 3 models identified during testing.
Regression Algorithms:
- Linear Regression
- Lasso CV (L1 regularization to reduce overfitting)
- Ridge CV (L2 regularization)
- Support Vector Machine Regression
Classification Algorithms:
- Decision Tree Classifier
- Random Forest Classifier
- Support Vector Machine Classifier
- K-Neighbors Classifier
- Gradient Boosting Classifier
Based on accuracy metrics (especially the r2 score), the top 3 models chosen for the ensemble were:
- Decision Tree Classifier
- Gradient Boosting Classifier
- Random Forest Classifier
These models were combined using a hard voting classifier, enabling the final model to leverage their combined strengths.
data/
├── time-wasted-on-social-media.csv # Dataset used for training
output/
├── model_pipeline.pkl # Exported model pipeline including preprocessor and model
├── model_unique_values.pkl # Saved unique values for consistent data processing
webapp/
├── __init__.py
├── app.py # Streamlit app for UI and user interaction
.dockerignore
.gitignore
Dockerfile
docker-compose.yml
requirements/
├── dev.requirements.txt # Development dependencies
requirements.txt # Optimized production dependencies
training.ipynb # Jupyter Notebook with training pipeline and model exploration
- Dockerfile: Configured for streamlined deployment of the Streamlit app with all dependencies.
- Exposes Port:
8501
(Access the app onlocalhost:8501
)
- Python 3.8+
- Docker
- Streamlit
-
Clone the repository:
git clone https://github.com/ketanip/productivity-loss-predictor.git cd productivity-loss-predictor
-
Install dependencies: For development:
pip install -r requirements/dev.requirements.txt
For production:
pip install -r requirements/prod.requirements.txt
-
Run the app locally:
streamlit run webapp/app.py
-
Docker Deployment: Build and run the Docker container:
docker build -t productivity-loss-predictor . docker run -p 8501:8501 productivity-loss-predictor
docker compose up --build
Contributions are welcome! To contribute:
- Fork the repository.
- Create a feature branch (
git checkout -b feature-branch
). - Commit your changes (
git commit -m 'Add new feature'
). - Push to the branch (
git push origin feature-branch
). - Open a pull request.
This project is licensed under the MIT License.
For questions or issues, please open an issue in the repository or contact the project maintainer through GitHub.