From 29652b1efdfd570dd8730e9a8b81165a80a5e2cf Mon Sep 17 00:00:00 2001 From: Mikhail Kumachev Date: Wed, 6 Mar 2024 21:33:46 +0100 Subject: [PATCH] feat: Digest 35 --- _includes/tags.md | 1 + ...024-03-06-35_Get_your_data_just-in-time.md | 53 +++++++++++++++++++ 2 files changed, 54 insertions(+) create mode 100644 _posts/2024-03-06-35_Get_your_data_just-in-time.md diff --git a/_includes/tags.md b/_includes/tags.md index b20e517..5360e51 100644 --- a/_includes/tags.md +++ b/_includes/tags.md @@ -53,6 +53,7 @@ [topic:monitoring]: https://img.shields.io/badge/topic-monitoring-CC0A65 "topic: monitoring" [topic:pandas]: https://img.shields.io/badge/topic-pandas-F28328 "topic: Pandas" [topic:pipelines]: https://img.shields.io/badge/topic-pipelines-92A87F "topic: pipelines" +[topic:postgresql]: https://img.shields.io/badge/topic-postgresql-99BADF "topic: PostgreSQL" [topic:practices]: https://img.shields.io/badge/topic-practices-7AC6AA "topic: practices" [topic:presto]: https://img.shields.io/badge/topic-presto-5AABB6 "Presto" [topic:pulsar]: https://img.shields.io/badge/topic-pulsar-7D340B "topic: Apache Pulsar" diff --git a/_posts/2024-03-06-35_Get_your_data_just-in-time.md b/_posts/2024-03-06-35_Get_your_data_just-in-time.md new file mode 100644 index 0000000..d7838aa --- /dev/null +++ b/_posts/2024-03-06-35_Get_your_data_just-in-time.md @@ -0,0 +1,53 @@ +--- +layout: post +title: "#35. Get your data just-in-time" +tags: architecture data-mesh data-warehouse deltalake postgresql python +--- + +*Topics: Architecture, data mesh, data warehouse, Delta Lake, PostgreSQL, Python* + + + +--- + +[Seamlessly Migrate Your Apache Parquet Data Lake to Delta Lake](https://www.databricks.com/blog/seamlessly-migrate-your-apache-parquet-data-lake-delta-lake) — Dipankar Kushari, Uday Satapathy @ Databricks Engineering Blog + +*Databricks is a company behind the Delta Lake format. The paper explains some drawbacks of building a data lake using Apache Parquet and explains how Delta Lake can solve such problems and how to migrate.* + +![level:beginner] ![topic:deltalake] + +--- + +[How Meta built the infrastructure for Threads](https://engineering.fb.com/2023/12/19/core-infra/how-meta-built-the-infrastructure-for-threads/) — Laine Campbell, Chunqiang (CQ) Tang @ Engineering at Meta + +*It's always interesting to read/watch real system's design with explanation. Especially when you are trying to design them in mind.* + +![level:medium] ![topic:architecture] + +--- + +[Python 3.13 gets a JIT](https://tonybaloney.github.io/posts/python-gets-a-jit.html) — Anthony Shaw + +*It's not only data engineering, but probably will change the whole development landscape: Python gets JIT, which opens a door to massive performance improvements in future!* + +![level:medium] ![topic:python] + +--- + +[How we built our customer data warehouse all on Postgres](https://tembo.io/blog/tembo-data-warehouse) — Adam Hendel @ Tembo + +*Very interesting experience of usage of Postgres as a warehouse. I'm saying not only about storage, but the whole system including orchestration 🤯. Of course authors had to write some code in Rust, but the concept looks very interesting and maybe even promising.* + +![level:medium] ![topic:data-warehouse] ![topic:postgresql] + +--- + +[Data Domains — Where do I start?](https://towardsdatascience.com/data-domains-where-do-i-start-a6d52fef95d1) — Piethein Strengholt + +*Good and practical article about data domains. Not only from a data team perspective but also from a development perspective.* + +![level:medium] ![topic:architecture] ![topic:data-mesh] + +--- + +{% include tags.md %}