Basic coding for various languages
On Stackoverflow, I enjoy asking and answering topics about [tag:apache-kafka] and [tag:apache-spark]. Below are some answers which included a deeper research or some reusable code snippets:
- Ordering guarantees for idempotent Producer
- Downsides of having too many partitions
- Diagramm to explain difference in partition assignment strategy
- Implementing the recommendation on "Retrying Async Commits" in Kafka - Definitive Guide
- Using class AdminClient to delete ConsumerGroups
- Transaction API in Kafka Producer
- Background of cleanup.policy=delete on Kafka topics
- KafkaProducer modes in Scala
- Custom Partitioner for KafkaProducer
- Usage of keys in Kafka message
- KafkaProducer Callback Exception
- Deep dive into Log Compaction configurations
- KafkaConsumer CommitFailedException
- How to get Kafka messages based on timestamp
- kafka.group.id in Spark 3.x
- Offset Management in StructuredStreaming with Kafka
- Run two Structured Streams writing to multiple Kafka topics
- Dataframe to Kafka
- How to print DataSource options (e.g. startingOffsets) for a streaming Dataframe?
- Custom Partitioner RDD
- Streaming DataFrame to Hive
- Dataset to HBase
- Dataframe to HBase
- Structured Streaming - Read and flatten nested JSON
- StreamingQueryListener in Structured Streaming
- Avoid state getting too big in mapGroupsWithState
- Write two streaming DFs
- Stream-Static Join: How to refresh (unpersist/persist) static Dataframe periodically
- IllegalStateException: _spark_metadata/0 doesn't exist while compacting batch 9
- Catalyst Optimizer - How to get the SQL representation for the query logic of a (derived) Spark DataFrame?
- Why Spark applies a broadcast join for a file with size larger than the autoBroadcastJoinThreshold?
- Spark is pushing down a filter even when the column is not in the dataframe
- Unable to overwrite default value of “spark.sql.shuffle.partitions” with Spark Structured Streaming