Hadoop_mastodon_project

Data Processing and Analysis Project

Overview

This project is designed to automate the collection, processing, and analysis of data from the Mastodon social media platform using Hadoop MapReduce, HBase, and Apache Airflow. The pipeline extracts information from Mastodon, processes it with MapReduce, stores the results in HBase, and orchestrates the entire workflow with Apache Airflow.

Installation

To set up the project, follow these steps:

Install Hadoop : https://learnubuntu.com/install-hadoop/
Install HBase : https://www.linkedin.com/pulse/how-install-apache-hbase-ubuntu-dr-virendra-kumar-shrivastava
Install Airflow : https://hevodata.com/learn/install-airflow/

pip install -r requirements.txt

Data Extraction From Mastadon

python3 data_extraction.py

MapReduce Processing

python3 mapreduce.py

HBase Storage

python3 insert_hbase.py

Airflow

python3 mastodon_project.py

GDPR

Our project places a high priority on ensuring the privacy and GDPR compliance of Mastodon users. We only gather and handle personal data with explicit consent or when legally required, strictly for defined purposes. Robust security measures safeguard the data, and users retain the rights to access, correct, delete, or restrict processing. Our data retention practices adhere to legal standards, and our transparent privacy policy provides comprehensive details on all aspects.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
LICENSE		LICENSE
MapReduce.py		MapReduce.py
README.md		README.md
data_extraction.py		data_extraction.py
engagement_analysis.py		engagement_analysis.py
insert_hbase.py		insert_hbase.py
language_analysis.py		language_analysis.py
mapper.py		mapper.py
mapreduce.py		mapreduce.py
mastadon_project.py		mastadon_project.py
media_attachement_analysis.py		media_attachement_analysis.py
requirements.txt		requirements.txt
tag_mentions_analysis.py		tag_mentions_analysis.py
url_analysis.py		url_analysis.py
user_analysis.py		user_analysis.py
user_growth.py		user_growth.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hadoop_mastodon_project

Data Processing and Analysis Project

Overview

Project points

Installation

Data Extraction From Mastadon

MapReduce Processing

HBase Storage

Airflow

GDPR

License

About

Releases

Packages

Languages

License

aminscientist/hadoop_mastodon_project

Folders and files

Latest commit

History

Repository files navigation

Hadoop_mastodon_project

Data Processing and Analysis Project

Overview

Project points

Installation

Data Extraction From Mastadon

MapReduce Processing

HBase Storage

Airflow

GDPR

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages