Health Insurance Price Forecast

1 Problem Context

Today, as prices rise for basic necessities, we need to have a way to check beforehand what we will spend our money on. For medical insurance we can take a look at various features to arrive at a price for customers. This is what we'll do in this project, from prediction to deployment.

Our data was obtained from this Kaggle problem on Medical Cost - Insurance Forecast, in which we have the question of "Can you accurately predict insurance costs?"

For ease of access, the data was upload to GitHub here.

From this we see that we have the following information (adapted from the Kaggle problem description):

Column	Description
age	Age of primary beneficiary
sex	Insurance contractor gender: female, male
bmi	Body mass index, providing an understanding of body, weights that are relatively high or low relative to height, objective index of body weight $(kg/m^2)$ using the ratio of height to weight, ideally 18.5 to 24.9
children	Number of children covered by health insurance / Number of dependents
smoker	If that person smokes or not
region	The beneficiary's residential area in the US, northeast, southeast, southwest, northwest
charges	Individual medical costs billed by health insurance

The variable we want to predict is from the column charges.

This type of price prediction is very useful both for the company that sells insurance and for the person buying the insurance, so every part involved in the process can have a baseline for how much they would pay or receive for the transaction. This is where data science shines, in solving different bussiness problems.

2 Project Organization

🟢 Main notebook containing the EDA and the training and tuning of the Machine Learning models

🟢 The script to train and save the chosen model

🟢 The code for the Flask app capable of making predictions

🟢 You can use this script to test the requests for the predictions

🟢 Our final Dockerfile

🟢 The code for the Streamlit application

3 How to Run it Locally

Generally, the commands here should be run within a terminal. To begin you need to clone this repo in your local computer and go into the project-insurance-forecast directory.

To clone this repository:

git clone https://github.com/diascarolina/project-insurance-forecast.git

or for cloning via SSH use:

git clone git@github.com:diascarolina/project-insurance-forecast.git

If you are unsure which method to use for cloning, the first one is enough.

If you are at the directory where you issued the cloning command, type the following on your terminal:

cd project-insurance-forecast

This will bring you into our project-insurance-forecast directory.

The environment and dependency manager used in this project is Pipenv. If you don't have it already installed, you can do it using (assuming Python is already installed in the system)

pip3 install pipenv

If it doens't work, you can try

pip install pipenv

Now, at the project directory, we can install the necessary libraries and dependencies from the Pipfile using:

pipenv install

If you want to run the notebook, use the following command to install the extra dependencies:

pipenv install --dev

Now activate the environment:

pipenv shell

Our project already has the .bin file for the model, but if you want to retrain the model and resave the model you can do it by running

python train.py

To deploy the Flask app locally we can do it directly or we can do it using gunicorn. To run it directly

python predict.py

or using gunicorn (recommended)

gunicorn --bind 0.0.0.0:9696 predict:app

So, the project should then be running locally at http://localhost:9696.

To test the app using a POST request we have many options: run the make_requests.py script, use curl or we can use Postman. Let's see the first two.

To run the script to make the request:

python make_request.py

Using curl (you can change the values of the parameters):

curl -X POST http://localhost:9696/predict \
-H 'Content-Type: application/json' \
-d '{"age": 19, "sex": "female", "bmi": 25, "children": 1, "smoker": "no", "region": "northwest"}'

That's it! If you want, you can explore it more and deploy the Streamlit app locally using

streamlit run streamlit_app.py

Finally, we can build and run the Docker image locally with the Dockerfile provided (next we'll do it with Docker Hub).

To build a Docker image called "insurance-forecast":

docker build -t insurance-forecast .

To run it:

docker run -it --rm -p 9696:9696 insurance-forecast

Or you can pull the image directly from Docker Hub (without having to build it first):

docker run -it --rm -p 9696:9696 diascaro/insurance-forecast

You can test it as the above with the make_requests.py script and choosing the first option to test it locally.

4 Deployment with Heroku

The API was deployed to the cloud using Heroku. The reason for chosing Heroku is because it is free.

🟢 Click here to access the main page of the app

To make a POST request to the URL https://insurance-forecast.herokuapp.com/predict we can also use our make_requests.py script, or curl or use Postman. Again, let's see the first two methods.

Run the following script and choose the second option (2):

python make_request.py

Or using curl (you can change the values of the parameters):

curl -X POST https://insurance-forecast.herokuapp.com/predict \
-H 'Content-Type: application/json' \
-d '{"age": 19, "sex": "female", "bmi": 25, "children": 1, "smoker": "no", "region": "northwest"}'

Bonus: Streamlit App

We also have an app using Streamlit, an open-source Python library used to facilitate the deployment of apps.

🟢 Click here to access the Streamlit app

You don't need to make a request, you can fill the details directly on the app :D

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
img		img
static		static
templates		templates
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
Procfile		Procfile
README.md		README.md
insurance.csv		insurance.csv
make_requests.ipynb		make_requests.ipynb
make_requests.py		make_requests.py
model_randomforestregressor.bin		model_randomforestregressor.bin
notebook.ipynb		notebook.ipynb
predict.py		predict.py
setup.sh		setup.sh
streamlit_app.py		streamlit_app.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Health Insurance Price Forecast

Table of Contents

1 Problem Context

2 Project Organization

🟢 Main notebook containing the EDA and the training and tuning of the Machine Learning models

🟢 The script to train and save the chosen model

🟢 The code for the Flask app capable of making predictions

🟢 You can use this script to test the requests for the predictions

🟢 Our final Dockerfile

🟢 The code for the Streamlit application

3 How to Run it Locally

4 Deployment with Heroku

Bonus: Streamlit App

5 To Do

6 References

7 Contacts

About

Releases

Packages

Languages

License

diascarolina/project-insurance-forecast

Folders and files

Latest commit

History

Repository files navigation

Health Insurance Price Forecast

Table of Contents

1 Problem Context

2 Project Organization

🟢 Main notebook containing the EDA and the training and tuning of the Machine Learning models

🟢 The script to train and save the chosen model

🟢 The code for the Flask app capable of making predictions

🟢 You can use this script to test the requests for the predictions

🟢 Our final Dockerfile

🟢 The code for the Streamlit application

3 How to Run it Locally

4 Deployment with Heroku

Bonus: Streamlit App

5 To Do

6 References

7 Contacts

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages