Templates of blogs prepared

nacisimsek · Jun 8, 2024 · a6c7904 · a6c7904
1 parent 1c04c1d
commit a6c7904
Show file tree

Hide file tree

Showing 34 changed files with 52 additions and 55 deletions.
diff --git a/content/posts/20240603-kafka-topics/index.md b/content/posts/20240603-kafka-topics/index.md
@@ -1,16 +1,15 @@
 ---
-title: "Hive Setup and Operations"
-summary: "This article is about how to deploy Hive services on Hadoop Cluster, which components it has, how the data is stored and managed in Hive, how the calculation is done via MapReduce, and how Yarn manage the resources"
-description: "This article is about how to deploy Hive services on Hadoop Cluster, which components it has, how the data is stored and managed in Hive, how the calculation is done via MapReduce, and how Yarn manage the resources"
-categories: ["Docker","Hadoop","Data Engineering"]
-tags: ["tutorial", "hdfs", "hive", "mapreduce", "postgres", "catalog"]
-date: 2024-06-01
+title: "Kafka Topics and Operations"
+summary: "This article is about how to operate on Kafka topics, their management, and configure important parameters"
+description: "This article is about how to operate on Kafka topics, their management, and configure important parameters"
+categories: ["Kafka","Data Engineering","Deployment"]
+tags: ["tutorial", "kafka", "topics", "configuration", "kubernetes", "docker"]
+date: 2024-06-03
 draft: false
 showauthor: false
 authors:
   - nunocoracao
 ---
-# Hive Deployment and Operations
-
-In this article, we will be deploying Hive services on Hadoop cluster
+# Kafka Topics and Operations
 
+This article is about how to operate on Kafka topics, their management, and configure important parameters
diff --git a/content/posts/20240604-kafka-python-operations/background.png b/content/posts/20240604-kafka-python-operations/background.png
diff --git a/content/posts/20240604-kafka-python-operations/featured.png b/content/posts/20240604-kafka-python-operations/featured.png
diff --git a/content/posts/20240604-kafka-python-operations/index.md b/content/posts/20240604-kafka-python-operations/index.md
@@ -1,16 +1,16 @@
 ---
-title: "Hive Setup and Operations"
+title: "Kafka Python Operations"
 summary: "This article is about how to deploy Hive services on Hadoop Cluster, which components it has, how the data is stored and managed in Hive, how the calculation is done via MapReduce, and how Yarn manage the resources"
 description: "This article is about how to deploy Hive services on Hadoop Cluster, which components it has, how the data is stored and managed in Hive, how the calculation is done via MapReduce, and how Yarn manage the resources"
 categories: ["Docker","Hadoop","Data Engineering"]
 tags: ["tutorial", "hdfs", "hive", "mapreduce", "postgres", "catalog"]
-date: 2024-06-01
+date: 2024-06-04
 draft: false
 showauthor: false
 authors:
   - nunocoracao
 ---
-# Hive Deployment and Operations
+# Kafka Python Operations
 
 In this article, we will be deploying Hive services on Hadoop cluster
 
diff --git a/content/posts/20240605-spark-deploy/background.png b/content/posts/20240605-spark-deploy/background.png
diff --git a/content/posts/20240605-spark-deploy/featured.png b/content/posts/20240605-spark-deploy/featured.png
diff --git a/content/posts/20240605-spark-deploy/index.md b/content/posts/20240605-spark-deploy/index.md
@@ -1,16 +1,15 @@
 ---
-title: "Hive Setup and Operations"
+title: "Deploy Spark Cluster"
 summary: "This article is about how to deploy Hive services on Hadoop Cluster, which components it has, how the data is stored and managed in Hive, how the calculation is done via MapReduce, and how Yarn manage the resources"
 description: "This article is about how to deploy Hive services on Hadoop Cluster, which components it has, how the data is stored and managed in Hive, how the calculation is done via MapReduce, and how Yarn manage the resources"
-categories: ["Docker","Hadoop","Data Engineering"]
-tags: ["tutorial", "hdfs", "hive", "mapreduce", "postgres", "catalog"]
-date: 2024-06-01
+categories: ["Docker","Spark","Data Engineering"]
+tags: ["tutorial", "spark", "hive", "mapreduce", "postgres", "catalog"]
+date: 2024-06-05
 draft: false
 showauthor: false
 authors:
   - nunocoracao
 ---
-# Hive Deployment and Operations
-
-In this article, we will be deploying Hive services on Hadoop cluster
+# Deploy Spark Cluster
 
+In this article, we will be deploying Spark Cluster on local, docker env, and Kubernetes
diff --git a/content/posts/20240606-spark-cleanup-data/background.png b/content/posts/20240606-spark-cleanup-data/background.png
diff --git a/content/posts/20240606-spark-cleanup-data/featured.png b/content/posts/20240606-spark-cleanup-data/featured.png
diff --git a/content/posts/20240606-spark-cleanup-data/index.md b/content/posts/20240606-spark-cleanup-data/index.md
@@ -1,16 +1,16 @@
 ---
-title: "Hive Setup and Operations"
+title: "Use PySpark for Data Clean up"
 summary: "This article is about how to deploy Hive services on Hadoop Cluster, which components it has, how the data is stored and managed in Hive, how the calculation is done via MapReduce, and how Yarn manage the resources"
 description: "This article is about how to deploy Hive services on Hadoop Cluster, which components it has, how the data is stored and managed in Hive, how the calculation is done via MapReduce, and how Yarn manage the resources"
 categories: ["Docker","Hadoop","Data Engineering"]
 tags: ["tutorial", "hdfs", "hive", "mapreduce", "postgres", "catalog"]
-date: 2024-06-01
+date: 2024-06-06
 draft: false
 showauthor: false
 authors:
   - nunocoracao
 ---
-# Hive Deployment and Operations
+# Use PySpark for Data Clean up
 
-In this article, we will be deploying Hive services on Hadoop cluster
+In this article, we will be cleaning up a dirty data by using PySpark
 
diff --git a/content/posts/20240607-spark-dataframe/background.png b/content/posts/20240607-spark-dataframe/background.png
diff --git a/content/posts/20240607-spark-dataframe/featured.png b/content/posts/20240607-spark-dataframe/featured.png
diff --git a/content/posts/20240607-spark-dataframe/index.md b/content/posts/20240607-spark-dataframe/index.md
@@ -1,16 +1,16 @@
 ---
-title: "Hive Setup and Operations"
+title: "Spark DataFrame Operations"
 summary: "This article is about how to deploy Hive services on Hadoop Cluster, which components it has, how the data is stored and managed in Hive, how the calculation is done via MapReduce, and how Yarn manage the resources"
 description: "This article is about how to deploy Hive services on Hadoop Cluster, which components it has, how the data is stored and managed in Hive, how the calculation is done via MapReduce, and how Yarn manage the resources"
 categories: ["Docker","Hadoop","Data Engineering"]
 tags: ["tutorial", "hdfs", "hive", "mapreduce", "postgres", "catalog"]
-date: 2024-06-01
+date: 2024-06-07
 draft: false
 showauthor: false
 authors:
   - nunocoracao
 ---
-# Hive Deployment and Operations
+# Spark DataFrame Operations
 
-In this article, we will be deploying Hive services on Hadoop cluster
+In this article, we will be practicing Spark DataFrame operations
 
diff --git a/content/posts/20240608-spark-submit/background.png b/content/posts/20240608-spark-submit/background.png
diff --git a/content/posts/20240608-spark-submit/featured.png b/content/posts/20240608-spark-submit/featured.png
diff --git a/content/posts/20240608-spark-submit/index.md b/content/posts/20240608-spark-submit/index.md
@@ -1,16 +1,16 @@
 ---
-title: "Hive Setup and Operations"
+title: "Submitting Spark Application"
 summary: "This article is about how to deploy Hive services on Hadoop Cluster, which components it has, how the data is stored and managed in Hive, how the calculation is done via MapReduce, and how Yarn manage the resources"
 description: "This article is about how to deploy Hive services on Hadoop Cluster, which components it has, how the data is stored and managed in Hive, how the calculation is done via MapReduce, and how Yarn manage the resources"
 categories: ["Docker","Hadoop","Data Engineering"]
 tags: ["tutorial", "hdfs", "hive", "mapreduce", "postgres", "catalog"]
-date: 2024-06-01
+date: 2024-06-08
 draft: false
 showauthor: false
 authors:
   - nunocoracao
 ---
-# Hive Deployment and Operations
+# Submitting Spark Application
 
-In this article, we will be deploying Hive services on Hadoop cluster
+In this article, we will be submitting Spark application to the Spark cluster we previously deployed
 
diff --git a/content/posts/20240609-spark-optimization/background.png b/content/posts/20240609-spark-optimization/background.png
diff --git a/content/posts/20240609-spark-optimization/featured.png b/content/posts/20240609-spark-optimization/featured.png
diff --git a/content/posts/20240609-spark-optimization/index.md b/content/posts/20240609-spark-optimization/index.md
@@ -1,16 +1,16 @@
 ---
-title: "Hive Setup and Operations"
+title: "Optimizing Spark Applications"
 summary: "This article is about how to deploy Hive services on Hadoop Cluster, which components it has, how the data is stored and managed in Hive, how the calculation is done via MapReduce, and how Yarn manage the resources"
 description: "This article is about how to deploy Hive services on Hadoop Cluster, which components it has, how the data is stored and managed in Hive, how the calculation is done via MapReduce, and how Yarn manage the resources"
 categories: ["Docker","Hadoop","Data Engineering"]
 tags: ["tutorial", "hdfs", "hive", "mapreduce", "postgres", "catalog"]
-date: 2024-06-01
+date: 2024-06-09
 draft: false
 showauthor: false
 authors:
   - nunocoracao
 ---
-# Hive Deployment and Operations
+# Optimizing Spark Applications
 
 In this article, we will be deploying Hive services on Hadoop cluster
 
diff --git a/content/posts/20240610-spark-json-process/background.png b/content/posts/20240610-spark-json-process/background.png
diff --git a/content/posts/20240610-spark-json-process/featured.png b/content/posts/20240610-spark-json-process/featured.png
diff --git a/content/posts/20240610-spark-json-process/index.md b/content/posts/20240610-spark-json-process/index.md
@@ -1,16 +1,15 @@
 ---
-title: "Hive Setup and Operations"
+title: "Processing Complex Nested JSON File with Spark"
 summary: "This article is about how to deploy Hive services on Hadoop Cluster, which components it has, how the data is stored and managed in Hive, how the calculation is done via MapReduce, and how Yarn manage the resources"
 description: "This article is about how to deploy Hive services on Hadoop Cluster, which components it has, how the data is stored and managed in Hive, how the calculation is done via MapReduce, and how Yarn manage the resources"
 categories: ["Docker","Hadoop","Data Engineering"]
 tags: ["tutorial", "hdfs", "hive", "mapreduce", "postgres", "catalog"]
-date: 2024-06-01
+date: 2024-06-10
 draft: false
 showauthor: false
 authors:
   - nunocoracao
 ---
-# Hive Deployment and Operations
-
-In this article, we will be deploying Hive services on Hadoop cluster
+# Processing Complex Nested JSON File with Spark
 
+In this article, we will be processing complex nested JSON file with Apache Spark
diff --git a/content/posts/20240611-spark-streaming/background.png b/content/posts/20240611-spark-streaming/background.png
diff --git a/content/posts/20240611-spark-streaming/featured.png b/content/posts/20240611-spark-streaming/featured.png
diff --git a/content/posts/20240611-spark-streaming/index.md b/content/posts/20240611-spark-streaming/index.md
@@ -1,16 +1,16 @@
 ---
-title: "Hive Setup and Operations"
+title: "Spark Streaming Hands On from/to Kafka"
 summary: "This article is about how to deploy Hive services on Hadoop Cluster, which components it has, how the data is stored and managed in Hive, how the calculation is done via MapReduce, and how Yarn manage the resources"
 description: "This article is about how to deploy Hive services on Hadoop Cluster, which components it has, how the data is stored and managed in Hive, how the calculation is done via MapReduce, and how Yarn manage the resources"
 categories: ["Docker","Hadoop","Data Engineering"]
 tags: ["tutorial", "hdfs", "hive", "mapreduce", "postgres", "catalog"]
-date: 2024-06-01
+date: 2024-06-11
 draft: false
 showauthor: false
 authors:
   - nunocoracao
 ---
-# Hive Deployment and Operations
+# Spark Streaming Hands On from/to Kafka
 
-In this article, we will be deploying Hive services on Hadoop cluster
+In this article, we will be developing a Spark Streaming application which will read data from Kafka, process, and write back to Kafka
 
diff --git a/content/posts/20240612-airflow-nginx-minio/background.png b/content/posts/20240612-airflow-nginx-minio/background.png
diff --git a/content/posts/20240612-airflow-nginx-minio/featured.png b/content/posts/20240612-airflow-nginx-minio/featured.png
diff --git a/content/posts/20240612-airflow-nginx-minio/index.md b/content/posts/20240612-airflow-nginx-minio/index.md
@@ -1,16 +1,16 @@
 ---
-title: "Hive Setup and Operations"
+title: "Airflow Introduction Pipeline"
 summary: "This article is about how to deploy Hive services on Hadoop Cluster, which components it has, how the data is stored and managed in Hive, how the calculation is done via MapReduce, and how Yarn manage the resources"
 description: "This article is about how to deploy Hive services on Hadoop Cluster, which components it has, how the data is stored and managed in Hive, how the calculation is done via MapReduce, and how Yarn manage the resources"
 categories: ["Docker","Hadoop","Data Engineering"]
 tags: ["tutorial", "hdfs", "hive", "mapreduce", "postgres", "catalog"]
-date: 2024-06-01
+date: 2024-06-12
 draft: false
 showauthor: false
 authors:
   - nunocoracao
 ---
-# Hive Deployment and Operations
+# Airflow Introduction Pipeline
 
-In this article, we will be deploying Hive services on Hadoop cluster
+In this article, we will be deploying Apache Airflow, and create a sample pipeline which fetches data from a webserver and write into MinIO bucket.
 
diff --git a/content/posts/20240613-elasticsearch-kibana/background.png b/content/posts/20240613-elasticsearch-kibana/background.png
diff --git a/content/posts/20240613-elasticsearch-kibana/featured.png b/content/posts/20240613-elasticsearch-kibana/featured.png
diff --git a/content/posts/20240613-elasticsearch-kibana/index.md b/content/posts/20240613-elasticsearch-kibana/index.md
@@ -1,16 +1,16 @@
 ---
-title: "Hive Setup and Operations"
+title: "Elasticsearch Indexing and Kibana Dashboard with PySpark"
 summary: "This article is about how to deploy Hive services on Hadoop Cluster, which components it has, how the data is stored and managed in Hive, how the calculation is done via MapReduce, and how Yarn manage the resources"
 description: "This article is about how to deploy Hive services on Hadoop Cluster, which components it has, how the data is stored and managed in Hive, how the calculation is done via MapReduce, and how Yarn manage the resources"
 categories: ["Docker","Hadoop","Data Engineering"]
 tags: ["tutorial", "hdfs", "hive", "mapreduce", "postgres", "catalog"]
-date: 2024-06-01
+date: 2024-06-13
 draft: false
 showauthor: false
 authors:
   - nunocoracao
 ---
-# Hive Deployment and Operations
+# Elasticsearch Indexing and Kibana Dashboard with PySpark
 
-In this article, we will be deploying Hive services on Hadoop cluster
+In this article, we will be sinking data to ElasticSearch by PySpark and create a dashboard on Kibana
 
diff --git a/content/posts/20240614-debezium-cdc-flink/background.png b/content/posts/20240614-debezium-cdc-flink/background.png
diff --git a/content/posts/20240614-debezium-cdc-flink/featured.png b/content/posts/20240614-debezium-cdc-flink/featured.png
diff --git a/content/posts/20240614-debezium-cdc-flink/index.md b/content/posts/20240614-debezium-cdc-flink/index.md
@@ -1,16 +1,16 @@
 ---
-title: "Hive Setup and Operations"
+title: "Change Data Capture (CDC) Pipeline Implementation"
 summary: "This article is about how to deploy Hive services on Hadoop Cluster, which components it has, how the data is stored and managed in Hive, how the calculation is done via MapReduce, and how Yarn manage the resources"
 description: "This article is about how to deploy Hive services on Hadoop Cluster, which components it has, how the data is stored and managed in Hive, how the calculation is done via MapReduce, and how Yarn manage the resources"
 categories: ["Docker","Hadoop","Data Engineering"]
 tags: ["tutorial", "hdfs", "hive", "mapreduce", "postgres", "catalog"]
-date: 2024-06-01
+date: 2024-06-14
 draft: false
 showauthor: false
 authors:
   - nunocoracao
 ---
-# Hive Deployment and Operations
+# Change Data Capture (CDC) Pipeline Implementation
 
-In this article, we will be deploying Hive services on Hadoop cluster
+In this article, we will be implementing a pipeline with PostgreSQL, Debezium CDC, Kafka, MinIO and the Spark.