Login/Sign-up to IBM Cloud: http://ibm.biz/jtcloud
Survey link:http://ibm.biz/jtcloud-survey
As shown above, this application leverages machine learning models to predict your insurance charges, and helps the customer understand how smoking or decreasing your BMI affects insurance premiums.
As we see the value of gross insurance premiums worldwide continue to skyrocket past 5 trillion dollars, we know that most of these costs are preventable. For example, just by eliminating smoking, and lowering your BMI by a few points could mean shaving thousands of dollars off of your premium charges. In this application, we study the effects of age, smoking, BMI, gender, and region to determine how much of a difference these factors can make on your insurance premium. By using our application, customers see the radical difference their lifestyle choices make on their insurance charges. By leveraging AI and machine learning, we help customers understand just how much smoking increases their premium, by predicting how much they will have to pay within seconds.
Using IBM AutoAI, you automate all the tasks involved in building predictive models for different requirements. You see how AutoAI generates great models quickly which save time and effort and aid in faster decision-making process. You create a model that from a data set that includes the age, sex, BMI, number-of-children, smoking preferences, region and charges to predict the health insurance premium cost that an individual pays.
When you have completed this code pattern, you understand how to:
- Setup, quickly, the services on IBM Cloud for building the model.
- Ingest the data and initiate the AutoAI process.
- Build different models using AutoAI and evaluate the performance.
- Choose the best model and complete the deployment.
- Generate predictions using the deployed model by making REST calls.
- Compare the process of using AutoAI and building the model manually.
- Visualize the deployed model using a front-end application.
- The user creates an IBM Watson Studio Service on IBM Cloud.
- The user creates an IBM Cloud Object Storage Service and adds that to Watson Studio.
- The user uploads the insurance premium data file into Watson Studio.
- The user creates an AutoAI Experiment to predict insurance premium on Watson Studio
- AutoAI uses Watson Machine Learning to create several models, and the user deploys the best performing model.
- The user uses the Flask web-application to connect to the deployed model and predict an insurance charge.
- IBM Watson Studio - IBM Watson® Studio helps data scientists and analysts prepare data and build models at scale across any cloud.
- IBM Watson Machine Learning - IBM Watson® Machine Learning helps data scientists and developers accelerate AI and machine-learning deployment.
- IBM Cloud Object Storage - IBM Cloud™ Object Storage makes it possible to store practically limitless amounts of data, simply and cost effectively.
- artificial-intelligence - Build and train models, and create apps, with a trusted AI-infused platform.
- Python - Python is an interpreted, high-level, general-purpose programming language.
This Cloud pattern assumes you have an IBM Cloud account. Go to the link below to sign up for a no-charge trial account - no credit card required.
- Download the data set
- Clone the repo
- Explore the data (optional)
- Create IBM Cloud services
- Create and Run AutoAI experiment
- Create a deployment and test your model
- Create a notebook from your model (optional)
- Run the application
We will use an insurance data set from Kaggle. You can find it here.
Click on the Download
button, and you should see
that you will download a file named insurance-premium-prediction.zip
. Once you unzip the file, you should see insurance.csv
.
This is the data set we will use for the remainder of the example. Remember that this example is purely educational, and you
could use any data set you want - we just happened to choose this one.
Clone this repo onto your computer in the destination of your choice:
git clone https://github.com/Anam-Mahmood/predict-insurance-charges-with-autoai.git
This gives you access to the notebooks in the notebooks
directory. To explore the data before creating a model,
you can look at the Claim Amount Exploratory notebook, and create a IBM Cloud Object Storage service, and paste your credentials in the notebook to run it. This step is purely optional.
If you want to run the notebook that is explored below, go to notebooks/Claim Amount Exploratory.ipynb
.
-
Within Watson Studio, you explore the data before you create any machine learning models. You want to understand the data, and find any trends between what you are trying to predict (insurance premiums charges) and the data's features.
-
Once you import, you see the data into a data frame, and call the
df_claim.head()
function, you see the first 5 rows of the data set. You see the features to beage
,sex
,bmi
,children
,smoker
, andregion
.
- To check if there is a strong relationship between
bmi
andcharges
you create a scatter plot using the seaborn and matplotlib libraries. You see that there is no strong correlation betweenbmi
andcharges
, as shown below.
- To check if there is a strong relationship between
sex
andcharges
you create a box plot. You see that the average claims for males and females are similar, whereas males have a bigger proportion of the higher claims.
- To check if there is a strong relationship between being a
smoker
andcharges
you create a box plot. You see that if you are a smoker, your claims are much higher on average.
- Let's see if the
smoker
group is well represented. As you see, below, it is. There are around 300 smokers, and around 1000 non-smokers.
- To check if there is a strong relationship between being a
age
andcharges
you create a scatter plot. You see that claim amounts increase with age, and tend to form groups around 12,000, 30,000, and 40,000.
If you want to see all of the code, and run the notebook yourself, check the data folder above.
-
Login to your IBM Cloud account: http://ibm.biz/insuranceChargesUsingAutoAI
-
Within your IBM Cloud account, click on the top search bar to search for cloud services and offerings. Type in “Watson Studio” and then click on Watson Studio under “Catalog Results”.
-
This takes you to the Watson Studio service page. Select a region, “Lite” plan (Free) and give your service a unique name. Click on “Create” and this creates a Watson Studio instance for you.
-
Once the service instance is ready, you will be redirected to the Watson Studio page. Click on the “Get Started” button to launch Watson Studio in a new tab. This might take few minutes to set up the service.
-
Under the heading “work with data” you will find a link that says, “Create a project”. Click on “Create a project”. Next, click on “Create an empty project”.
-
On the new project page, give your project a name. You will also need to associate an IBM Cloud Object Storage instance to store the data set.
-
Under “Select Storage Service”, click on the “Add” button. This takes you to the IBM Cloud Object Store service page. Leave the service on the “Lite” tier and then click the “Create” button at the bottom of the page. You are prompted to name the service and choose the resource group. Once you give a name, click “Create”.
-
Once the instance is created, you’re taken back to the project page. Click on “refresh” and you should see your newly created Cloud Object Storage instance under Storage.
-
Click the “Create” button at the bottom right of the page to create your project.
-
Click on the “Add to project” button on the top right corner. From the pop-up window select “Data”.
-
In the column, on the right, click on “browse”. Navigate to the folder where you downloaded the data set to and select “insurance.csv”
-
Watson Studio takes a couple of seconds to load the data, and then you should see the import has completed. To make sure it has worked properly, you can click on “Assets” on the top of the page, and you should see your insurance file under “Data Assets”.
-
Once you have added the data set, click on the “Add to project” button on the top right corner. This time select “AutoAI experiment”.
-
On the New AutoAI Experiment page, give a name to your project.
-
Next, you need to add a Watson Machine Learning instance. On the right side of the screen, click on “Associate a Machine Learning service instance”.
-
On the Associate service page, click on the “New Service” button on the right side of the screen. From the pop-up screen, select “Machine Learning”.
-
Select the appropriate region. It is recommended to build your machine learning service instance in the same region that you created your Watson Studio service. Select the “Lite” (free) plan. Give your instance a unique name. Click on “Create”.
-
Once you create your machine learning service, on the next page, check the box with your machine learning service instance. Next, click on “associate service” on the right corner.
-
Once the service is successfully associated, you will be redirected to new AutoAI experiment page. Click on “Reload” on the right side of the screen. You should see your newly created machine learning instance. Click on “Create” on the bottom right part of your screen to create your first AutoAI experiment!
-
After you create your experiment, you are taken to a page to add a data source to your project. Click on “Select from project” and then add the insurance.csv file. Click on Select asset to confirm your data source.
-
Next, you see that AutoAI processes your data, and you see a What do you want to predict section. Select the expenses as the Prediction column.
-
Next, let's explore the AutoAI settings to see what you can customize when running your experiment. Click on Experiment settings.First, you see the data source tab, which lets you omit certain columns from your experiment. You choose to leave all columns. You can also select the training data split. It defaults to 85% training data. The data source tab also shows which metric you optimize for. For the regression, it is RMSE (Root Mean Squared Error), and for other types of experiments, such as Binary Classification, AutoAI defaults to Accuracy. Either way, you can change the metric from this tab depending on your use case.
-
Click on the Prediction tab from within the Experiment settings. There you can select from Binary Classification, Regression, and Multiclass Classification.
-
Lastly, you can see the Runtime tab from the Experiment settings this shows you other experiment details you may want to change depending on your use case.
-
Once you are happy with your settings, ensure you are predicting for the expense’s column, and click on the run Run Experiment button on the bottom-right corner of the screen.
-
Next, your AutoAI experiment runs on its own. You see a progress map on the right side of the screen which shows which stage of the experiment is running. This may be Hyper Parameter Optimization, feature engineering, or some other stage.
-
You have different pipelines that are created, and you see the rankings of each model. Each model is ranked based on the metric that you selected. In the specific case that is the RMSE(Root mean squared error). Given that you want that number to be as small as possible, you can see that in the experiment, the model with the smallest RMSE is at the top of the leaderboard.
- Once the experiment is done, you see Experiment completed under the Progress map on the right-hand side of the screen.
- Now that AutoAI has successfully generated eight different models, you can rank the models by different metrics, by clicking on the drop-down next to “Rank by:” on the top right corner of the screen, such as explained variance, root mean squared error, R-Squared, and mean absolute error. Each time you select a different metric, the models are re-ranked by that metric.
-
In our case, we have RMSE as the experiment's metric. You see the smallest RMSE value is 4444.108, from Pipeline 4. Click on “Pipeline 4”.
-
On the left-hand side, you can see different “Model Evaluation Measures”. For this particular model, you can view the metrics, such as explained variance, RMSE, and other metrics.
-
On the left-hand side, you can also see “Feature Transformations”, and “Feature Importance”.
-
On the left-hand side, click on “Feature Importance”. You can see here that the most important predictor of the insurance premium is whether you are a “smoker” or a “non-smoker”. This is by far the most important feature, with “bmi” coming in as the second most important. This makes sense, given that many companies offer discounts for employees who do not smoke.
-
Once you are ready to deploy one of the models, click on “Save As” at the top-right corner of the model you want to deploy. Save it as a “Model” and name your model as you want. Click on “Create” Note: We show you how to save it as a notebook in step 6.
-
Once the model is successfully saved, click on the “View in project” in the green notification on the right side of the screen. Alternatively, you can also find your model saved in the “Assets” tab under “Models”.
-
Next, you are taken to a screen that has the overview of the model you just saved. Click on “Promote to deployment space” on the top right corner of your screen. Alternatively, if you’re doing it from the Assets tab, then under the “Models” section, click on the 3 dots on the right side of your screen and click “promote”.
-
On the Promote to space page, you need a target space to promote your model. Click on “New space +” on the right side of your screen.
-
Next, on the Create a deployment space screen, give your space a name, make sure the right cloud object storage is selected, and select your machine learning service instance. For this experiment, selecting the machine learning service is mandatory as we need to build a prediction model. Then click on “Create”.
-
Once the space is ready, click on “Close” in the pop-up and you will be redirected to the promote to space page. You see your newly created space under the “Target space”. Once you’re happy with your selections, click on “Promote”.
-
Once the model is successfully promoted, you will see a green notification box, click on “deployment space” in the notification. Alternatively, you can also find your deployment spaces when you click on the hamburger sign on the top left most side on your screen.
-
You will be redirected to the deployments page, where you will find your promoted model. Hover over the row, to see a rocket shaped icon, click on the icon to deploy you model.
-
In the dialog box, select “Online” as your deployment type, give your deployment a name and click “Create”.
-
Click on the “Deployments” tab to see the status of your deployment.
-
Once the deployment is completed, click on the name your deployment.
-
On this page you find the API references, endpoint and code snippets to help you integrate your model with your applications.
-
To test your model, click on the “Test” tab. You can select a row from the data set and enter the data in the fields. Enter the age, sex, bmi, children, smoker and region and then click on the “Predict” button at the bottom.
-
To validate the prediction, you check the data file that you used to train the model. As you can see, the model predicted a premium of 18524.04, when you enter age 19, bmi: 27.9, children: 0, smoker: yes, region: southwest. This is relatively close to the model's prediction, so we know the model is working properly.
If you want to run the notebook that you explore below, go to [`https://github.com/IBM/predict-insurance-charges-with-autoai/blob/master/notebooks/Insurance%20Premium%20Predictor%20-%20P8%20notebook.ipynb).
With AutoAI's latest features, the code that is run to create these models is no more a black box. One or more of these models can be saved as a Jupyter notebook and the Python code can be run and enhanced from within.
-
Click on
Save As
at the top-right corner of the model, and clickNotebook
. -
This opens a new tab (be sure to enable pop-up for this website) titled
New Notebook
where in you can edit the default name if you choose to and then click onCreate
. This might take a few minutes to load for the first time.
- Alternatively, you can also create the notebook from the
Pipeline leaderboard
view (shown above) by clicking on theSave as
option against the model you want to save followed by selectingNotebook
. The steps are very similar to the first method discussed above.
- Once the notebook has been created, it is listed under the
Notebooks
section within theAssets
tab. - Clicking on the notebook from the list opens the Jupyter notebook where the code in Python is available.
- If the notebook is locked, click on the pencil icon on the right tab to be able to run/edit the notebook.
- Select
Cell
option from the menu list and clickRun All
. This begins executing all steps in a sequence. Unless an error is encountered, the entire notebook content is executed.
While understanding the content within the notebook requires prior knowledge of machine learning using python, we encourage you to browse through this tutorial to learn the basics of how regression models are built in python.
In this step, you do a high-level analyses of the notebook that is generated.
-
AutoAI uses sckikit-learn for creating machine learning models and for executing the steps in pipelines.
-
autoai-lib is used to transform data while being processed in the pipeline.
-
Following snippet highlights sample code of how auto-ai is used in transforming numerical data and how scikit-learn is used in setting these transformations in a pipeline.
-
Here you see the Python code that went into setting up Random Forest as the algorithm of choice for regression.
-
Calling the fit method on the pipeline, returns an estimator which is then used to predict a value. The code below shows each of these steps.
-
Finally, the Python code that was generated to validate the results and analyse the model performance is seen below. KFold-cross validation techniques have been applied to evaluate the model. The notebook can also be edited to apply other validation techniques and can be re-evaluated.
More information on the implementation considerations of AutoAI can be found here
The driver code to run the application can be found under the web-app folder within the git repository that was cloned from Step 1. To run and test your deployed model through this Python-based user-interface,
you need to replace the following information within web-app/app.py
:
- Your Watson Machine Learning (which is associated with this deployed model)
Instance ID
andapikey
. - Your deployed model's deployment URL, so you can make a POST request.
- Your IBM Cloud IAM token, to authorize yourself.
Now, you go into detail on how to gather these credentials. If you already know how to do this, you can skip the steps below, and go straight to running the application.
- Generate an IBM Cloud apikey by going to
cloud.ibm.com
and then from the top-right part of the screen click onManage
->IAM
.
- Next, click on
API keys
from the left side-bar. Next click onCreate an IBM Cloud API key
.
- Name the key as you wish, and then click
Create
.
- Once the key is created, click on the
Download
button.
-
From inside Watson Studio (Or Cloud Pak for Data), click on
Deployment Spaces
. -
From there, click on the name of the deployment in which you deployed your model to.
-
Next, click on on the name of the model.
-
Next, click on the deployment of the model.
-
From there, you will be taken to the deployment API reference page - on the right hand side you can see the
Deployment ID
. Go ahead and copy that and keep it handy - you will need to paste that into yourapp.py
page.
-
From the command line, type
curl -V
to verify if cURL is installed in your system. If cURL is not installed, refer to this instructions to get it installed. -
Execute the following cURL command to generate your access token, but replace the apikey with the apikey you got from step 7.1 above.
curl -X POST 'https://iam.cloud.ibm.com/oidc/token' -H 'Content-Type: application/x-www-form-urlencoded' -d 'grant_type=urn:ibm:params:oauth:grant-type:apikey&apikey=<api-key-goes-here>'
As shown in the image below, the apikey can be copy and pasted from the downloaded file from the end of step 7.1. The curl request would look something like this after the apikey is pasted in:
curl -X POST 'https://iam.cloud.ibm.com/oidc/token' -H 'Content-Type: application/x-www-form-urlencoded' -d 'grant_type=urn:ibm:params:oauth:grant-type:apikey&apikey=aSULp7nFTJl-jGx*******aQXfA6dxMlpuQ9QsOW'
-
Install python.org Windows distro 3.8.3 from http://python.org - make sure to add the /python38/scripts folder path to the $PATH environment, if you do not, you will get errors trying to run flask (flask.exe is installed to the scripts folder)
-
Remove powershell alias for curl and install curl from python3.8
PS C:/> remove-item alias:curl
PS C:/> pip3 install curl
-
- Execute curl to get secure token from IBM IAM. Please note that the token expires after 60 minutes. If you get an internal server error from the main query page (The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application), it may be due to the token expiring. Also note that in powershell the continuation character is ‘
curl -X POST 'https://iam.cloud.ibm.com/oidc/token' -H 'Content-Type: application/x-www-form-urlencoded' -d 'grant_type=urn:ibm:params:oauth:grant-type:apikey&apikey=<apikey>'
- Copy and paste the access token into the header in the
web-app/app.py
file. Replace the line" TODO: ADD YOUR IAM ACCESS TOKEN FROM IBM CLOUD HERE"
with your token.
- Modify the
app.py
file within theweb-app
directory to change the POST request with your deployment ID. The finished line should look like the following:
response_scoring = requests.post("https://us-south.ml.cloud.ibm.com/ml/v4/deployments/18c7f626-04d2-4d1e-9b9b-bf2e6/predictions?version=2020-09-01", json=payload_scoring, headers=header)
- Once you've updated the token and the deployment id, your code should look similar to this. If it does, save it!
- Great job! You are ready to run the application!
Note, this app is tested on this version of Python 3.8.2
Within the web-app
directory, run the following command:
pip3 install flask flask-wtf urllib3 requests
Next, run the following command to start the flask application.
flask run
- Install flask and dependencies
PS C:/> pip3 install flask flask-wtf urllib3 requests
Verify modules have been installed in the 'python38/scripts' folder
- Run 'web-ap/app.py' from the local directory using flask
PS C:/> set FLASK_APP=app.py
PS C:/> flask run
-
Go to
127.0.0.1:5000
in your browser to view the application. Go ahead and fill in the form, and click on thePredict
button to see your predicted charges based on your data. -
As is expected, if you are a smoker, this drastically increase the insurance charges.
-
You can add a Dashboard which is a lean version of Cognos Dashboard available on IBM cloud from "Add to Project" option in your Watson Studio project.
-
You can start finding patterns in your data by easily visualizing various data points. This can get your exploration started within few minutes and with no coding involved
-
From visualizing this data you can see the relation in the data points, how Gender, BMI, # of children and smoking might influence the insurance premium.
-
Dashboards are very interactive and makes it easy to play with data.
-
You can also pivot and summarize your measures to quickly look at all your measures
-
Stop working in Silos and share your findings with your team in two clicks.
- Fraud Prediction Using AutoAI
- Use AutoAI to predict Customer Churn tutorial
- Predict Loan Default with AutoAI tutorial
This code pattern is licensed under the Apache Software License, Version 2. Separate third-party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 (DCO) and the Apache Software License, Version 2.