The notebook shows two use cases. The data is found in a text file holding a set of reviews about the German wings airline (airline code ‘4U’).
The goal of this use case is to show that based on the customer review data how to predict the target variable 'Recommended'.
The goal of the second use case is to find topics in the reviews.
The notebook covers the following topics:
-
- loading the customer review data from txt to dataframe
-
- plotting distribution of the variables
- plotting histograms
- showing relationship between target variables to other variables
- text analysis
- plotting word frequencies
- topic modeling with LDA
-
- tfidf
- count features
- dimensionality reduction using truncated SVD
- word embedding
- glove
-
- logloss
- roc curve
- precision recall curve
-
- logistic regression
- navie bayes
- gradient boost machine
- xgboost
- deep learning
- lstm
- gru
- bidirectional lstm