CoreSentiment - Stock Market Prediction Tool With Applied Sentiment Analysis

Project Description:

CoreSentiment is a stock market prediction tool developed for a data consulting organization. It leverages sentiment analysis by using Wikipedia page views as a proxy for public interest in companies. The project assumes that an increase in page views indicates a positive sentiment and correlates with a rise in stock prices, while a decrease in views suggests declining interest and potential stock price drops.

The project involves downloading hourly page view data, extracting relevant fields and page names, and loading the data into a database for analysis. The objective is to identify which company garnered the highest number of page views during the specified hour, providing insights into potential stock movements.

Task Summary:

Download and extract the zip file containing the pageviews data for just one hour
Fetch the needed fields and pagenames only
Load the fetched data into a database of your choice
Perform a simple analysis to get the company with the highest pageviews for that hour

Project Task:

Create the first version of a DAG pulling the Wikipedia pageview counts by downloading, extracting, and reading the pageview data for any one hour duration on any date in October 2024 (e.g 4pm data for 10th of October, 2024).
Select just five companies (Amazon, Apple, Facebook, Google, and Microsoft) from the data extracted in order to initially track and validate the hypothesis.

Project Architecture:

The project is built using Apache Airflow, a platform to programmatically author, schedule, and monitor workflows.
The project uses a PostgreSQL database to store the extracted data.
The project uses Docker to run the Airflow and Database environments.
The project uses the wiki_pageviews_dag DAG to download, extract, and read the pageview data for any one hour duration on any date in October 2024.

DAG Graph:

Results/Executive Summary:

The company with the highest pageviews in the selected hour (October 10, 2024, at 4pm) is Facebook with 532 page views.

Project Breakdown:

Data Ingestion:
- Download and extract the Wikipedia pageviews data for a selected hour on a chosen date (e.g., 10th of October, 2024 at 4pm).
Data Processing:
- Filter the extracted pageviews based on five companies: Amazon, Apple, Facebook, Google, and Microsoft.
- Filtered data is structured in a csv file
Data Storage:
- Load the processed data into a Postgres database for further analysis.
Data Analysis:
- Identify which company had the highest pageviews in the selected hour.

How to Run the Project:

Install the required dependencies by running the following command:
```
pip install -r requirements.txt
```
Setup the Airflow and Database environments using docker by running the following command:
```
   docker-compose up -d
```
Access the Airflow UI by visiting http://localhost:8080/ in your browser.
Turn on the pageviews_dag DAG.
Trigger the DAG by clicking on the Trigger DAG button.
Check the logs and the database to see the results.
To view using Streamlit, change into the dags cd dags, and run the following command:
```
streamlit run streamlit.py
```
or if you have make installed:
```
make streamlit
```
To stop the Airflow and Database environments:
```
docker-compose down
```

Note: The project has CI/CD setup using GitHub Actions. The CI/CD pipeline runs the tests and checks the code quality on every PR to the main branch.

Recommendations:

Add a time range to the DAG to fetch data for multiple hours.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
dags		dags
docs/imgs		docs/imgs
.env.example		.env.example
.gitignore		.gitignore
Makefile		Makefile
Notes.md		Notes.md
README.md		README.md
docker-compose.yaml		docker-compose.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CoreSentiment - Stock Market Prediction Tool With Applied Sentiment Analysis

Project Description:

Task Summary:

Project Task:

Project Architecture:

DAG Graph:

Results/Executive Summary:

Project Breakdown:

How to Run the Project:

Recommendations:

About

Releases

Packages

Languages

blessedmadukoma/CDE-airflow-capstone-project

Folders and files

Latest commit

History

Repository files navigation

CoreSentiment - Stock Market Prediction Tool With Applied Sentiment Analysis

Project Description:

Task Summary:

Project Task:

Project Architecture:

DAG Graph:

Results/Executive Summary:

Project Breakdown:

How to Run the Project:

Recommendations:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages