Skip to content

The objective of the project is to predict the hotel booking status of the guest if it'll be canceled or not based on the various features like ADR (Average Daily Rate), booking changes, lead time, type of the hotel booked, and more. The Exploratory Data Analysis and Statistical Analysis is done for insights and feature engineering. Four Machine…

License

Notifications You must be signed in to change notification settings

SumitM0432/Hotel-Booking-Cancellation-Prediction-and-SA

Repository files navigation

EDA-SA-and-Hotel-Booking-Cancellation-Prediction

The objective of the project is to predict the hotel booking status of the guest if it'll be cancelled or not based on the various features like ADR (Average Daily Rate), booking changes, lead time, type of the hotel booked, and more. The type of hotels given in the dataset is Resort Hotels and City Hotels.

Files

- config.py - Configuration File
- cross_val.py - Cross Validation File to make Stratifiedkfolds (5 folds)
- preproc.py - Preprocessing the Data
- model_dispatcher.py - ML and DL Models
- train.py - Main Run file
- predict.py - Run Trained model for prediction

DATASET

The dataset used in the project is taken from kaggle - Dataset

The article on the dataset - Article

EDA and SA

The Exploratory Data Analysis and Statistical Analysis is done to get the insight about the data and answer some questions for example "Which is the busiest month of the year ?", "What is the average price of the room per person per night ?" and more. The correlation heatmap is plotted as well to see the most important features and the threshold correlation is taken is 0.04 and the features are taken based on that.

The EDA and SA notebook is given in the notebooks folder of the repository. [use the jupyter nbviewer to see the interactive graphs.]

MODELLING

Four Models are trained for the prediction named Logistic Regression, Random Forest, XGBoost and Deep Neural Network. The models are trained and validated on a 5 fold Cross-validation set [Stratified k folds]. You can see the hyperparameters and the structure of the models in the model dispatcher file.

Some trained models are saved in the models folder. you can train your model or use the trained model to predict using the predict.py.

How to run

[TRAIN]

  1. Download the repository.
  2. open the terminal and cd to the repository.
  3. type:
    python train.py --folds 0 --model logistic_regression

[PREDICTION]

  1. Download the repository.
  2. open the terminal and cd to the repository.
  3. Check the name of the trained models in the models folder.
  4. type:
    python predict.py --folds 0 --saved_model xgboost_fold_0.bin

Note: you can change the folds from 0 to 4.

For training the models name can be seen in the models dictionary in the model_dispatcher.py, but I can give them here.

  • logistic_regression
  • xgboost
  • random_forest
  • dnn

For predicting check the saved model name in the models folder and use the name as shown above.

About

The objective of the project is to predict the hotel booking status of the guest if it'll be canceled or not based on the various features like ADR (Average Daily Rate), booking changes, lead time, type of the hotel booked, and more. The Exploratory Data Analysis and Statistical Analysis is done for insights and feature engineering. Four Machine…

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published