EDA-SA-and-Hotel-Booking-Cancellation-Prediction

The objective of the project is to predict the hotel booking status of the guest if it'll be cancelled or not based on the various features like ADR (Average Daily Rate), booking changes, lead time, type of the hotel booked, and more. The type of hotels given in the dataset is Resort Hotels and City Hotels.

Kaggle Notebook

Files

- config.py - Configuration File
- cross_val.py - Cross Validation File to make Stratifiedkfolds (5 folds)
- preproc.py - Preprocessing the Data
- model_dispatcher.py - ML and DL Models
- train.py - Main Run file
- predict.py - Run Trained model for prediction

DATASET

The dataset used in the project is taken from kaggle - Dataset

The article on the dataset - Article

EDA and SA

The Exploratory Data Analysis and Statistical Analysis is done to get the insight about the data and answer some questions for example "Which is the busiest month of the year ?", "What is the average price of the room per person per night ?" and more. The correlation heatmap is plotted as well to see the most important features and the threshold correlation is taken is 0.04 and the features are taken based on that.

The EDA and SA notebook is given in the notebooks folder of the repository. [use the jupyter nbviewer to see the interactive graphs.]

MODELLING

Four Models are trained for the prediction named Logistic Regression, Random Forest, XGBoost and Deep Neural Network. The models are trained and validated on a 5 fold Cross-validation set [Stratified k folds]. You can see the hyperparameters and the structure of the models in the model dispatcher file.

Some trained models are saved in the models folder. you can train your model or use the trained model to predict using the predict.py.

How to run

[TRAIN]

Download the repository.
open the terminal and cd to the repository.
type:
python train.py --folds 0 --model logistic_regression

[PREDICTION]

Download the repository.
open the terminal and cd to the repository.
Check the name of the trained models in the models folder.
type:
python predict.py --folds 0 --saved_model xgboost_fold_0.bin

Note: you can change the folds from 0 to 4.

For training the models name can be seen in the models dictionary in the model_dispatcher.py, but I can give them here.

logistic_regression
xgboost
random_forest
dnn

For predicting check the saved model name in the models folder and use the name as shown above.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EDA-SA-and-Hotel-Booking-Cancellation-Prediction

Kaggle Notebook

Files

DATASET

The dataset used in the project is taken from kaggle - Dataset

The article on the dataset - Article

EDA and SA

MODELLING

How to run

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
__pycache__		__pycache__
input		input
models		models
notebooks		notebooks
LICENSE		LICENSE
README.md		README.md
config.py		config.py
cross_val.py		cross_val.py
model_dispatcher.py		model_dispatcher.py
predict.py		predict.py
preproc.py		preproc.py
train.py		train.py

License

SumitM0432/Hotel-Booking-Cancellation-Prediction-and-SA

Folders and files

Latest commit

History

Repository files navigation

EDA-SA-and-Hotel-Booking-Cancellation-Prediction

Kaggle Notebook

Files

DATASET

The dataset used in the project is taken from kaggle - Dataset

The article on the dataset - Article

EDA and SA

MODELLING

How to run

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages