The codes done by YingKi, QiRui and myself over the span of 2 weeks.
Hypothesis #1: Hotel location affects customer satisfaction
Hypothesis #2: 5-star hotels are rated positively on 'rooms' as compared to 3-star hotels
Hypothesis #3: 3-star hotels are rated negatively on 'value' as compared to 5-star hotels
Python Version: 3.8
Packages: numpy, pandas, matplotlib, seaborn, wordcloud, scipy, statsmodels
Dataset: provided by the school (Institute of Data)
Dataset collected in 2013, contains 3 different tabs; namely (i) Sentiment Data, (ii) Sentiment Mentions and (iii) STB Rating.
(i) Sentiment Data: shows the GRI (Global Review Index) score which computed by a special algorithm of the ratings (ranging from positive, neutral, negative and no review) given for each concept (a total of 9 concepts which affects the GRI score).
(ii) Sentiment Mentions: shows the total mentions (positive and negative) for each concept (such as room, hotel, location, etc)
(iii) STB Rating: Hotel star category
In this repository, we combined the data given in tabs (i) and (iii). Such that we can decide to accept or reject the three hypothesis that were mentioned above.
- Customer Satisfaction Score = % of positive reviews - % of negative reviews
From our findings, it shows that:
-
- H0 is rejected, although location has a correlation with customer satisfaction of 0.4. But it is not as impactful as Service, Room or Cleanliness
-
- 5-star hotel customers care about both room satisfaction and value
-
- with more data from each hotel group, it would might be easier to advise each hotel group the area that they can improve on (depending on the current trend and analysis done)