- Medium Article - "Data Science Projects: What Comes After Jupyter Notebooks?"
- Project Live Presentation
- Problem Context
- Project Organization
- How to Run it Locally
- Deployment with Heroku
- To Do
- References
- Contacts
Today, as prices rise for basic necessities, we need to have a way to check beforehand what we will spend our money on. For medical insurance we can take a look at various features to arrive at a price for customers. This is what we'll do in this project, from prediction to deployment.
Our data was obtained from this Kaggle problem on Medical Cost - Insurance Forecast, in which we have the question of "Can you accurately predict insurance costs?"
For ease of access, the data was upload to GitHub here.
From this we see that we have the following information (adapted from the Kaggle problem description):
Column | Description |
---|---|
age | Age of primary beneficiary |
sex | Insurance contractor gender: female, male |
bmi | Body mass index, providing an understanding of body, weights that are relatively high or low relative to height, objective index of body weight |
children | Number of children covered by health insurance / Number of dependents |
smoker | If that person smokes or not |
region | The beneficiary's residential area in the US, northeast, southeast, southwest, northwest |
charges | Individual medical costs billed by health insurance |
The variable we want to predict is from the column charges.
This type of price prediction is very useful both for the company that sells insurance and for the person buying the insurance, so every part involved in the process can have a baseline for how much they would pay or receive for the transaction. This is where data science shines, in solving different bussiness problems.
π’ Our final Dockerfile
Generally, the commands here should be run within a terminal. To begin you need to clone this repo in your local computer and go into the project-insurance-forecast directory.
To clone this repository:
git clone https://github.com/diascarolina/project-insurance-forecast.git
or for cloning via SSH use:
git clone git@github.com:diascarolina/project-insurance-forecast.git
If you are unsure which method to use for cloning, the first one is enough.
If you are at the directory where you issued the cloning command, type the following on your terminal:
cd project-insurance-forecast
This will bring you into our project-insurance-forecast
directory.
The environment and dependency manager used in this project is Pipenv
. If you don't have it already installed, you can do it using (assuming Python is already installed in the system)
pip3 install pipenv
If it doens't work, you can try
pip install pipenv
Now, at the project directory, we can install the necessary libraries and dependencies from the Pipfile using:
pipenv install
If you want to run the notebook, use the following command to install the extra dependencies:
pipenv install --dev
Now activate the environment:
pipenv shell
Our project already has the .bin
file for the model, but if you want to retrain the model and resave the model you can do it by running
python train.py
To deploy the Flask app locally we can do it directly or we can do it using gunicorn
. To run it directly
python predict.py
or using gunicorn
(recommended)
gunicorn --bind 0.0.0.0:9696 predict:app
So, the project should then be running locally at http://localhost:9696.
To test the app using a POST request we have many options: run the make_requests.py
script, use curl
or we can use Postman. Let's see the first two.
To run the script to make the request:
python make_request.py
Using curl
(you can change the values of the parameters):
curl -X POST http://localhost:9696/predict \
-H 'Content-Type: application/json' \
-d '{"age": 19, "sex": "female", "bmi": 25, "children": 1, "smoker": "no", "region": "northwest"}'
That's it! If you want, you can explore it more and deploy the Streamlit app locally using
streamlit run streamlit_app.py
Finally, we can build and run the Docker image locally with the Dockerfile provided (next we'll do it with Docker Hub).
To build a Docker image called "insurance-forecast":
docker build -t insurance-forecast .
To run it:
docker run -it --rm -p 9696:9696 insurance-forecast
Or you can pull the image directly from Docker Hub (without having to build it first):
docker run -it --rm -p 9696:9696 diascaro/insurance-forecast
You can test it as the above with the make_requests.py
script and choosing the first option to test it locally.
The API was deployed to the cloud using Heroku. The reason for chosing Heroku is because it is free.
π’ Click here to access the main page of the app
To make a POST request to the URL https://insurance-forecast.herokuapp.com/predict
we can also use our make_requests.py
script, or curl
or use Postman. Again, let's see the first two methods.
Run the following script and choose the second option (2):
python make_request.py
Or using curl
(you can change the values of the parameters):
curl -X POST https://insurance-forecast.herokuapp.com/predict \
-H 'Content-Type: application/json' \
-d '{"age": 19, "sex": "female", "bmi": 25, "children": 1, "smoker": "no", "region": "northwest"}'
We also have an app using Streamlit, an open-source Python library used to facilitate the deployment of apps.
π’ Click here to access the Streamlit app
You don't need to make a request, you can fill the details directly on the app :D
-
Split the dataset into train, test and validation
-
Separate the files into folders for better organization
-
Try to deploy the Flask app and the Streamlit app into the same URL
- Image by Olya Kobruseva on Pexels
- Machine Learning Zoomcamp
- ML Zoomcamp: Midterm Project info
- Kaggle Original Problem
- Deploy Churn Service on Heroku
- Markdown to HTML
- Markdown CSS
Any tips or suggestions? Feel free to contact me!