🚀End-to-End Data Science Project

This repository contains three main projects focusing on data engineering, ETL pipelines, and data analysis. The project demonstrates implementation of ML pipelines, ETL processes, and pandas-based data analysis.

This repository contains three distinct data science components:

End-to-End Flask ML Application
ETL Pipeline with Airflow
Data Analysis with Pandas

1. End-to-End Flask ML Application

A complete machine learning pipeline implemented with Flask, incorporating MLflow and DagsHub for experiment tracking.

Workflows--> ML Pipeline

Data Ingestion
Data Validation
Data Transformation-- Feature Engineering,Data Preprocessing
Model Trainer
Model Evaluation- MLFLOW,Dagshub

flowchart TB
    A[Data Ingestion] --> B[Data Validation]
    B --> C[Data Transformation]
    C --> D[Model Trainer]
    D --> E[Model Evaluation]
    
    subgraph ML Pipeline
    A
    B
    C
    D
    E
    end
    
    style A fill:#f9f,stroke:#333
    style B fill:#bbf,stroke:#333
    style C fill:#ddf,stroke:#333
    style D fill:#fdd,stroke:#333
    style E fill:#dfd,stroke:#333

Implementation Steps

Configure settings in config.yaml
Define data schema in schema.yaml
Set model parameters in params.yaml
Update entity definitions
Modify configuration manager in src/config
Enhance pipeline components
Update the pipeline orchestration
Refine main.py implementation

Project Structure

src/
├── datascience/
│   ├── components/
│   │   ├── data_ingestion.py
│   │   ├── data_transformation.py
│   │   ├── data_validation.py
│   │   ├── model_eval.py
│   │   └── model_trainer.py
│   ├── config/
│   ├── constants/
│   ├── entity/
│   ├── pipeline/
│   └── utils/`

Flask-Model Screenshots

Setup Instructions

Clone the repository
Install dependencies:

pip install -r requirements.txt

Run the application:

python app.py

2. ETL Pipeline with Airflow

An automated ETL pipeline that fetches weather data from an API and stores it in PostgreSQL database.

Workflow

flowchart LR
    A[Create Table] --> B[Extract Weather API Data]
    B --> C[Transform Data]
    C --> D[Load to PostgreSQL]
    
    subgraph Airflow DAG
    A
    B
    C
    D
    end
    
    style A fill:#f96,stroke:#333
    style B fill:#69f,stroke:#333
    style C fill:#9cf,stroke:#333
    style D fill:#6f9,stroke:#333

ETL Pipeline Screenshots

DAG View in Airflow

Pipeline Execution Log

Result in Database in DBeaver

Pipeline Components

Data Extraction: Weather API integration Data Transformation: Processing weather information Data Loading: PostgreSQL database storage

Setup Instructions

Install Astro CLI
Ensure Docker Desktop is running
Start the Airflow instance:

astro dev start

If timeout occurs:

astro dev start --wait 15m

3. Data Analysis with Pandas

Jupyter notebook containing data analysis tasks using Pandas.

Features

CSV data loading and manipulation Statistical analysis Data filtering and grouping Categorical data analysis

Setup Instructions

Open PythonAssignment.ipynb in Jupyter Notebook/Lab Select appropriate kernel Run cells sequentially

Future Work

Flask ML Application

Add real-time prediction capabilities
Implement A/B testing framework
Enhanced model monitoring

ETL Pipeline

Add more data sources
Implement data quality checks
Add alerting system

Data Analysis

Automated reporting
Interactive visualizations
Advanced statistical analysis

License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.

Contact Information

For questions or collaboration opportunities:

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.astro		.astro
.github/workflows		.github/workflows
End_to_End_Flask_App.egg-info		End_to_End_Flask_App.egg-info
config		config
dags		dags
research		research
src		src
templates		templates
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
PythonAssignment.ipynb		PythonAssignment.ipynb
README.md		README.md
airflow_settings.yaml		airflow_settings.yaml
app.py		app.py
docker-compose.yml		docker-compose.yml
main.py		main.py
packages.txt		packages.txt
params.yaml		params.yaml
requirements.txt		requirements.txt
schema.yaml		schema.yaml
setup.py		setup.py
template.py		template.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀End-to-End Data Science Project

This repository contains three distinct data science components:

1. End-to-End Flask ML Application

Workflows--> ML Pipeline

Implementation Steps

Project Structure

Flask-Model Screenshots

Setup Instructions

2. ETL Pipeline with Airflow

Workflow

ETL Pipeline Screenshots

DAG View in Airflow

Pipeline Execution Log

Result in Database in DBeaver

Pipeline Components

Setup Instructions

3. Data Analysis with Pandas

Features

Setup Instructions

Future Work

License

Contact Information

About

Releases

Packages

Languages

License

rahulsamant37/End-to-End-Flask-App

Folders and files

Latest commit

History

Repository files navigation

🚀End-to-End Data Science Project

This repository contains three distinct data science components:

1. End-to-End Flask ML Application

Workflows--> ML Pipeline

Implementation Steps

Project Structure

Flask-Model Screenshots

Setup Instructions

2. ETL Pipeline with Airflow

Workflow

ETL Pipeline Screenshots

DAG View in Airflow

Pipeline Execution Log

Result in Database in DBeaver

Pipeline Components

Setup Instructions

3. Data Analysis with Pandas

Features

Setup Instructions

Future Work

License

Contact Information

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages