Skip to content

MaertHaekkinen/dataeng

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dataeng

Repository for the Data Engineering Course

Course Homepage

Syllabus

Introduction

Lecture

  • What is (Big) Data?
  • The Role of Data Engineer
  • From Data Warehouse to Data Lakes

Practice

  • Setup Docker
  • Introduction to Jupyter Notebooks

Part 1: Data Modelling and Query Languages

Lecture

  • Relational Data
  • NoSQL
    • Document
    • Graph
  • Data Warehousing
    • Star and Snowflake schemas
  • Data Vault

Practice

  • Modelling and Querying Relational data: MySQL
  • Modelling and Querying Document data: MongoDB
  • Modelling and Querying Key-Value data: Redis
  • Modelling and Querying Graph data: Cypher

Extras

  • Modelling and Querying RDF data: SPARQL
  • Domain Driven Design: a summary
  • Event Sourcing: a summary

Part 2: (Big) Data Pipelines

Lecture

  • Big Data Systems Architectures
  • ETL and Data Pipelines
    • Best Practices and Anti-Patterns
  • Batch vs Streaming Processing
  • Data Replication
  • Data Partitioning
  • Transactions

Practice

  • Data Ingestion with Apache Kafka
  • Data Pipelines with Apache Airflow
  • Data Processing with Kafka Streams/KSQL

Extras

  • Data Pipelines with Luigi
  • Data Pipelines with Apachi Nifi
  • Data Processing with Apache Flink

Part 3: Data Wrangling

Lecture

  • Data Cleansing
  • Data Augumentation

Practice

  • Cleansing examples using OpenRefine
  • Augumentation examples using Pandas and Tensorflow

Contributing

Lecturers

About

Repository fo Data Engineering Course

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published