Skip to content

Snigda0402/Education-trends-on-Twitter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Education trends on Twitter

Objective :

By analyzing a massive collection of education-related tweets, the project explores whether higher tweet volumes correspond to significant trends in the education sector.

Skills/Tools Used :

  1. Python Programming Language
  2. PySpark
  3. Google Cloud Platform
  4. Big Data Analysis

Project Overview :

  • Performing twitterer identification, location analysis, timeline analysis and tweets uniqueness.

1. Data Collection and Preprocessing:

  • The dataset was given by the University that I am studying in (University of Chicago). It consists of ~100 million Tweets (~500GB). These tweets are collected on the topics of education, schools, universities, learning, knowledge sharing, etc., but only a fraction of them would be directly related to either primary, secondary or higher education.
  • Combine individual JSON files and process them for analysis.
  • Discard irrelevant tweets to focus on education-related content.

2. Exploratory Data Analysis (EDA):

  • Conduct a comprehensive EDA to identify key variables suitable for profiling Twitter users.
  • Identify fields that provide insights into message volume, retweets, and more.
  • Discard poorly populated variables to streamline analysis.

3. Perform Analysis on following topics :

  • Author identification
  • Geographical Distribution Analysis
  • Timeline Analysis
  • Message Uniqueness Analysis