Project Apache Spark in Scala

The main purpose of the project is to practice Apache Spark in Scala.

Architecture

Overview

Project Overview:

This project leverages Scala to implement an Extract, Transform, and Load (ETL) pipeline. Data is extracted from various sources (CSV files and PostgreSQL databases), undergoes transformations and analysis, and is then loaded into three distinct sinks (CSV, Parquet, and PostgreSQL).

Data Sources:

Multiple CSV files PostgreSQL databases: Transaction Poland Transaction France Transaction China Transaction USA

Data Transformations and Analysis:

The specific transformations and analysis steps are not explicitly mentioned in the image or description. However, the project likely involves data cleaning, filtering, aggregation, and potentially more complex operations depending on the data's nature and intended use.

Data Sinks:

CSV files Parquet files PostgreSQL databases

Instruction for building and running Scala application

Copy the project from GitHub
Open project
Build "postgres" Docker image cd PostgresSQL && docker build -t postgres .
Start the Docker container
Check PostgreSQL connection: docker exec -it postgres psql -U postgres -d postgres \dt
Run Scala application

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.bsp		.bsp
Datasource		Datasource
PostgreSQL		PostgreSQL
project		project
src		src
.gitignore		.gitignore
README.md		README.md
build.sbt		build.sbt
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Apache Spark in Scala

Architecture

Overview

Instruction for building and running Scala application

About

Releases

Packages

Languages

skalskibukowa/Project-Spark-Scala-ETL

Folders and files

Latest commit

History

Repository files navigation

Project Apache Spark in Scala

Architecture

Overview

Instruction for building and running Scala application

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages