Data Migration from PostgreSQL to SQLServer using PySpark

This project is to design, implement and execute an ETL pipeline using PySpark to migrate data from a PostgreSQL database to a SQL Server database. The pipeline should be designed to handle large amounts of data and ensure data integrity during the migration process. The ETL pipeline should include steps for extracting data from the PostgreSQL database, transforming the data to match the schema of the SQL Server database, and loading the data into the SQL Server database. The ultimate goal is to have the data in the SQL Server database accurately reflect the data in the PostgreSQL database with minimal data loss and minimal disruption to ongoing operations.

Architecture

Run Locally

Clone the project

git clone https://github.com/ArkanNibrastama/Data-Migration-PostgreSQL-to-SQLServer-use-PySpark.git

Make a database on PostgreSQL and import data from dataset folder
Install all the dependencies
```
pip install -r reuquirements.txt
```

Fill the blank variable with your own data
example:

uid = '{YOUR USER ID ON POSTGRESQL}'
pwd = '{YOUR PSSWORD}'
host = 'localhost'
port = '5432' #this is the default port
db = '{YOUR DB NAME}'
driver = "org.postgresql.Driver"
url = f"jdbc:postgresql://{host}:5432/{db}?user={uid}&password={pwd}"

Finally, you can run the program on your local computer

Full explanation

To make better understand of this repository, you can check my linkedin post about this project Data Migration : PostgreSQL to SQL Server using PySpark.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
dataset		dataset
jdbc driver		jdbc driver
src		src
README.md		README.md
etl.ipynb		etl.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Migration from PostgreSQL to SQLServer using PySpark

Architecture

Run Locally

Full explanation

About

Releases

Packages

Languages

ArkanNibrastama/Data-Migration-PostgreSQL-to-SQLServer

Folders and files

Latest commit

History

Repository files navigation

Data Migration from PostgreSQL to SQLServer using PySpark

Architecture

Run Locally

Full explanation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages