Skip to content

This project leverages Apache Spark's map-reduce capabilities to perform comprehensive analysis on a large Twitter dataset. Key insights include hashtag trends, tweet patterns, and user activity, providing valuable data-driven insights into social media interactions without relying on Spark DataFrame API or Spark SQL.

License

Notifications You must be signed in to change notification settings

rahulorihiki/Twitter_Big_Data_Analysis_using_MapReduce

Repository files navigation

Twitter Big Data Analysis using Map-Reduce

This project leverages Apache Spark's map-reduce capabilities to perform comprehensive analysis on a large Twitter dataset. Key insights include hashtag trends, tweet patterns, and user activity, providing valuable data-driven insights into social media interactions without relying on Spark DataFrame API or Spark SQL.

Table of Contents

  1. Introduction
  2. Installation
  3. Features
  4. Contributing
  5. License
  6. Contact Information

Installation

Features

  • Extract and analyze the top 5 trending hashtags for each month to identify popular topics and discussions over time.
  • Analyze the number of tweets related to each IPL team over different months to understand engagement and interest patterns.
  • Identify the top 5 tweets for each hashtag, ranked by the number of followers of the users who posted them, to highlight influential tweets and users.
  • Create an inverted index of hashtags and their associated tweets. This allows retrieval of tweets containing specific hashtags, sorted by month.
  • Identify the most active users in the dataset, providing insights into key contributors and influencers.
  • Determine the platforms and sources most commonly used by users in specific demographics, revealing preferences and trends in platform usage.
  • Identify the top 10 users who frequently tweet about MS Dhoni, showcasing engaged fans and their interactions.

Contributing

Contributions are always welcome!

Please see contributing.md for ways to get started.

License

Distributed under the MIT License. See LICENSE for more information.

Contact Information

About

This project leverages Apache Spark's map-reduce capabilities to perform comprehensive analysis on a large Twitter dataset. Key insights include hashtag trends, tweet patterns, and user activity, providing valuable data-driven insights into social media interactions without relying on Spark DataFrame API or Spark SQL.

Resources

License

Stars

Watchers

Forks