This project was one of the requirements within my postgraduate module called Web Social Media Analytics and Visualization. This project are mainly focused on two part which are Part A: Statistical Analysis on a Popular trends on Twitter and Part B: text mining on an event/campaign happening.
Bear in mind that this project uses API to get the data. Therefore, if anyone wants to use the "NewsAPI" or the official Twitter API. The user needs to create an account on both API to obtain the API keys. (Don't worry, It's free!)
NewsAPI: https://newsapi.org/
Twitter API: https://developer.twitter.com/en/docs/twitter-api
A full explaination of each part are as of below:
In the statistical analysis part of the project, identifying the popoular trends in Twitter was performed where the data is being extracting using the official Twitter API. Afterwards, a specific trend called "2022 Spring Statement Tax Plan" issued by the UK Government was chosen to perform an in-depth statistical analysis where questions such as "WHen does the tweet gets popular?", "What are the devices used to tweet?", and "What sources can be trust?" are answered using these statistical analysis.
A graph analysis on Facebook Dataset SNAP by Stanford University website is performed using "GraphX" library in Python to evalute the Centrality Measures and Community analysis.
In the text mining part of the project, a different API called "TwythonStreamer" is used to fetch real-time tweets regarding a certain topic. The topic chosen in this project was "Elon Musk" as at that time, Elon Musk just bought the social media platform "Twitter". A sentiment analysis was performed to evaluate the public opinion on this matter, word cloud and word frequency was performed as well. Additonally, the "NewsAPI" is used to extract Articles regarding "Elon Musk" where pre-processing and Latent Semantic Indexing (LSI) is done to perform Topic Modelling.
The python file has been labelled in order, and hence for easier readibility please refer to them in order.
The dataset for the statistical analysis and text mining is provided within the github project.
But the data for the graph analysis is provided by the SNAP dataset: https://snap.stanford.edu/data/ego-Facebook.html
This project is coded in Python using the PyCharm IDE.
If anyone wants to use a part of the code. Please reference it. Thanks.