Text Mining and Statistical Analysis on Web Social Media Platform (Twitter) using Python

This project was one of the requirements within my postgraduate module called Web Social Media Analytics and Visualization. This project are mainly focused on two part which are Part A: Statistical Analysis on a Popular trends on Twitter and Part B: text mining on an event/campaign happening.

IMPORTANT INFO!!!

Bear in mind that this project uses API to get the data. Therefore, if anyone wants to use the "NewsAPI" or the official Twitter API. The user needs to create an account on both API to obtain the API keys. (Don't worry, It's free!)

NewsAPI: https://newsapi.org/

Twitter API: https://developer.twitter.com/en/docs/twitter-api

A full explaination of each part are as of below:

Part A: Statistical Analysis

In the statistical analysis part of the project, identifying the popoular trends in Twitter was performed where the data is being extracting using the official Twitter API. Afterwards, a specific trend called "2022 Spring Statement Tax Plan" issued by the UK Government was chosen to perform an in-depth statistical analysis where questions such as "WHen does the tweet gets popular?", "What are the devices used to tweet?", and "What sources can be trust?" are answered using these statistical analysis.

A graph analysis on Facebook Dataset SNAP by Stanford University website is performed using "GraphX" library in Python to evalute the Centrality Measures and Community analysis.

Part B: Text Mining

In the text mining part of the project, a different API called "TwythonStreamer" is used to fetch real-time tweets regarding a certain topic. The topic chosen in this project was "Elon Musk" as at that time, Elon Musk just bought the social media platform "Twitter". A sentiment analysis was performed to evaluate the public opinion on this matter, word cloud and word frequency was performed as well. Additonally, the "NewsAPI" is used to extract Articles regarding "Elon Musk" where pre-processing and Latent Semantic Indexing (LSI) is done to perform Topic Modelling.

Proceeding with the files

The python file has been labelled in order, and hence for easier readibility please refer to them in order.

Dataset

The dataset for the statistical analysis and text mining is provided within the github project.

But the data for the graph analysis is provided by the SNAP dataset: https://snap.stanford.edu/data/ego-Facebook.html

Misc

This project is coded in Python using the PyCharm IDE.

If anyone wants to use a part of the code. Please reference it. Thanks.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Dataset		Dataset
1. Statistical Analysis (Part A).py		1. Statistical Analysis (Part A).py
2. Text Mining - Sentiment Analysis (Part B).py		2. Text Mining - Sentiment Analysis (Part B).py
3. Text Mining - News API (Part B).py		3. Text Mining - News API (Part B).py
4. Graph Analysis.py		4. Graph Analysis.py
README.md		README.md
twitter_credentials.json		twitter_credentials.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Mining and Statistical Analysis on Web Social Media Platform (Twitter) using Python

IMPORTANT INFO!!!

Part A: Statistical Analysis

Part B: Text Mining

Proceeding with the files

Dataset

Misc

About

Releases

Packages

Languages

notrichbish/text-mining-and-statistical-analysis-on-web-social-media

Folders and files

Latest commit

History

Repository files navigation

Text Mining and Statistical Analysis on Web Social Media Platform (Twitter) using Python

IMPORTANT INFO!!!

Part A: Statistical Analysis

Part B: Text Mining

Proceeding with the files

Dataset

Misc

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages