Skip to content

Latest commit

 

History

History
322 lines (303 loc) · 14.4 KB

File metadata and controls

322 lines (303 loc) · 14.4 KB

Airline Satisfaction Prediction App

Badge Source

Authors

Table of Contents

Business Problem

An airline brand has been receiving a fair amount of unsatisfactory sentiment towards our flight services. We want to identify the root causes that our passengers are having for these sentiments and overall increase airline satisfaction for our particular brand. Over a certain period we recorded surveys on our passengers to provide more details about their experience by asking specific questions that may hint us in what we can improve on. In order to achieve this we must first create a machine learning model that accurately predicts a passenger satisfactory level using their response for certain customer service categories. We want to increase customer retention and believe if we make the customer happy they are more likely to use our services again.

Data Source

Methods

  • Exploratory Data Analysis
  • Multivariate Analysis
  • Visualizations
  • Modeling
  • Reporting
  • App Deployment

Tech Stack

  • R (Data Cleansing and Exploratory Analysis)
  • Python (Machine Learning Modeling and App preparation)
  • GitHub Pages (R Markdown Deployment onto Web)
  • Microsoft Office (Reporting & Presentation)
  • Streamlit (Interface for model)

Quick Glance at the Results

Correlation Matrix between numeric features.

Confusion Matrix of Random Forest Classifier.

Random Forest Feature Importance Plot.

Top 3 models on the testing set (with default parameters)

Model Accuracy Sensitivity (Recall) Specificity
Logistic Regression 87.5% 90.4% 83.7%
Random Forest 96.5% 98.2% 94.2%
Gradient Boosting 95.4% 97.1% 93%

  • Final Model used: Random Forest Classifier
  • Why choose Random Forest Classifier compared to the other models: The reason why Random Forest Classifier was the chosen model was that it provided better metrics not only in terms of accuracy but also in other metrics such as sensitivity, specificity, and precision. The precision score gave about 97.47% and overall is a better metric when it comes down to classifying a target with imbalance classes. OUr target variable, satisfaction level, had more unsatisfactory/neutral compared to satisfactory passengers based on our surveys. Also Random Forest is able to provide feature importance based on the splitting of various trees by determining which split/node provides the overall greatest decrease in gini index. This what provides further insights on our passengers view and what impacts satisfaction level the most. However, using Logistic Regression or Boosting would have been sufficient for analysis since there was not a huge difference in our metric scores.
  • Metric used: Specificity
  • Why choose Specificity as a metric: Our response variable in what we are trying to predict satisfaction level had imbalance classes. This creates a problem for our machine learning algorithm since they cannot learn each class at the same level. Therefore, our machine learning model might learn unsatisfactory/neutral passengers better since we were given more observations on them. Since we want to determine passenger with satisfactory level accurately this is only given when our specificity score is the greatest. If you look at the confusion matrix above we can see 1: represent satisfactory and 0: represents unsatisfactory/neutral passengers. Therefore, we want to increase our true positive, the lower right corner of the confusion matrix, which is our specificity score referred to as our recall score.

Lessons Learned and Recommendation

  • In this project I learned how to leverage feature importance using our Random Forest and Gradient Boosting models to determine what influences our response the most. Its important to note that some features might provide a negative influence to our response variable or a positive one. Since the goal for this project is to increase satisfactory level, we want to identify not only the top important features but also the ones that provide positive influence. For example, Cleanliness is a feature given and some logic would say as we decrease cleanliness so would satisfaction levels. This can also be said in reverse if we increase cleanliness then you would expect customers to be more satisfied with their experience, thus this feature provides a positive influence. A negative influence would say if we increase a feature then satisfaction level would decrease or vice-versa.

Limitation and what can be Improved

  • Some limitations that were provided or not not considered in this project are other external factors. For example our observations in the data set did not provide a ticket fare amount for each passenger. In some sense a person who paid a higher fare for their ticket will be given a higher level of service through out their travel which overall increase their satisfaction level for the particular airline.
  • Another piece of information lacking in our data was given a passenger companionship. For example, did the passenger travel alone or did they travel with friends/family. Understanding this would provide further insights on the reason why the passenger was traveling, for vacation, business, or emergency. These scenarios might have an overall effect on their airline experience since external factors might have already contributed to their mood.
  • A final question we can ask is whether a person income level has an influence on their satisfaction level of their airline. If we were given a passengers income, we can further classify the passenger into several brackets such as lower, middle, and high class groups. Since money over all gives you access to better features and services their satisfaction level might be easier met.

Run Locally

First, Open your Command line or Terminal and head to a directory where you want to save the project.

Initialize git

          
          git init
          
        

Clone the Project

          
          git clone https://github.com/luisosorio3214/Airline-Satisfaction-Prediction-App.git
          
        

Head to project directory

          
          cd Airline-Satisfaction-Prediction-App
          
        

Create a virtual environment using venv

          
          python -m venv "env_name"
          
        

Activate virtual environment

          For Window Users
          
            env_name\Scripts\activate
          
          For Mac Users
          
            source env_name/bin/activate
          
        

Install required dependencies from requirements.txt file

          
          pip install -r requirements.txt
          
        

Start the streamlit server locally

          
          streamlit run app.py
          
        

If you are having issues with streamlit, please follow this tutorial on how to set up streamlit.

Explore the R Markdown Notebook

To explore the R notebook file click here.

Report and Presentation

The Report and Presentation was done collaboratively with other students at Long Beach State University. I express my gratitude and say thank you for the work they provided.

To read the Full Report of the Analysis click here.

To see the Full Presentation given click here.

Deployment on streamlit

To deploy this project on streamlit share, follow these steps:

  1. Make sure you have a github repository with full project files including the requirements.txt file
  2. Go to streamlit share
  3. Login with Github, Google, etc.
  4. click on new button
  5. Select the GitHub repo, branch, python file with the streamlit codes
  6. Click Save and Deploy

App deployed on Streamlit

Video to gif tool

Contribution

Pull requests are welcome! For major changes, please open an issue first to discuss what you would like to change or contribute.

License

MIT License

Copyright (c) 2022 Stern Semasuka

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Learn more about MIT license