This project leverages Apache Spark's map-reduce capabilities to perform comprehensive analysis on a large Twitter dataset. Key insights include hashtag trends, tweet patterns, and user activity, providing valuable data-driven insights into social media interactions without relying on Spark DataFrame API or Spark SQL.
- Extract and analyze the top 5 trending hashtags for each month to identify popular topics and discussions over time.
- Analyze the number of tweets related to each IPL team over different months to understand engagement and interest patterns.
- Identify the top 5 tweets for each hashtag, ranked by the number of followers of the users who posted them, to highlight influential tweets and users.
- Create an inverted index of hashtags and their associated tweets. This allows retrieval of tweets containing specific hashtags, sorted by month.
- Identify the most active users in the dataset, providing insights into key contributors and influencers.
- Determine the platforms and sources most commonly used by users in specific demographics, revealing preferences and trends in platform usage.
- Identify the top 10 users who frequently tweet about MS Dhoni, showcasing engaged fans and their interactions.
Contributions are always welcome!
Please see contributing.md
for ways to get started.
Distributed under the MIT License. See LICENSE
for more information.
- Kizhakkeppattu Rahul Govindkumar
- Email: krahulgovind@gmail.com
- Github: https://github.com/rahulorihiki