[HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor … #4264

guanziyue · 2021-12-09T05:25:13Z

Tips

Thank you very much for contributing to Apache Hudi.
Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.

What is the purpose of the pull request

Fix problem mentioned in https://issues.apache.org/jira/browse/HUDI-2875.

Brief change log

Add a graceful exit for BoundedInMemoryExecutor.
let's first totally stop BoundedInMemoryExecutor and then close HoodieMergeHandle in SparkMergeHelper.
Add a UT to test above case

Verify this pull request

Committer checklist

Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

vinothchandar · 2021-12-15T02:24:49Z

curious to know what causes concurrent writers to a given HoodieParquetWriter? or is there some internal parquet shared state that is non-thread safe here?

guanziyue · 2021-12-15T11:08:56Z

Hi vinothchandar:
Concurrent writing to HoodieParquetWriter occurs at following code
https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/SparkMergeHelper.java#L103
When speculation is triggered, we call mergeHandle.close which calls parquetWriter close method. At the same time, boundedInMemoryExecutor is still working, so write method of mergeHandle is called at same time which call write method of parquetWriter.
And parquetWriter does have a state which is not thread safe. It holds BytesInput which is used as internal data storage in parquet column format, it is not thread safe and its life cycle is managed by parquetWriter. So parquet writer must transfer its state in a serializable way. When it is being written, a reset command may not totally clear it as expected. Such data structure is reused within JVM. A non-cleared bytesInput may return wrong result in following usage.

nsivabalan

@n3nash @bvaradar : can you folks review the changes in this patch. changes are very minimal, but does involve some core classes which has not been touched in recent times.

nsivabalan · 2022-02-09T04:50:13Z

hudi-common/src/main/java/org/apache/hudi/common/util/queue/BoundedInMemoryQueue.java

@@ -204,6 +204,7 @@ private boolean expectMoreRecords() {
   * singleton iterator for this queue.
   */
  private Option<O> readNextRecord() {
+    checkIfInterrupted();


do we need to check if interrupted for every record processing? will there be any negative impact on perf due to this. This is all in memory, and should not be an issue. but wanted to see if this is absolutely necessary.

I do this so that consumer can stop asap when queue received a termination signal. I just have a simple test, it seems that there is no difference on perf if I do a sampling here to determine whether to check it or not. Or we can simply remove it here. As far as I know, add sync to parquet writer is much enough to solve this problem. The modification of BoundedInMemoryQueue is to make memory executor stop gracefully. We may pick one of them. Glad to follow any suggestion from you

nsivabalan · 2022-02-09T04:51:43Z

Thanks @guanziyue for the fix. if I am not wrong, you have this fix already in your prod env and has been running smoothly w/o issues.

nsivabalan · 2022-02-11T19:16:45Z

@guanziyue : we are looking to get this in for 0.11. wanted to keep you in the loop.

guanziyue · 2022-02-18T07:56:55Z

Thanks @guanziyue for the fix. if I am not wrong, you have this fix already in your prod env and has been running smoothly w/o issues.

Actually, I don't have an integral test of hudi. But I do try this in my prod env whic use hudi in a specific way.

nsivabalan · 2022-02-25T18:55:07Z

@yihua : do you mind taking a look. LGTM.

yihua

The fix LGTM. @guanziyue could you add a unit test to emulate the concurrent usage of HoodieParquetWriter, e.g., starting 10-20 threads concurrently using the same instance to write parquet files to stress test it?

alexeykudinkin · 2022-03-10T22:39:53Z

...-client/hudi-client-common/src/main/java/org/apache/hudi/io/storage/HoodieParquetWriter.java

@@ -84,7 +84,7 @@ public HoodieParquetWriter(String instantTime,
  }

  @Override
-  public void writeAvroWithMetadata(R avroRecord, HoodieRecord record) throws IOException {
+  public synchronized void writeAvroWithMetadata(R avroRecord, HoodieRecord record) throws IOException {


This is problematic in a few ways:

This method is invoked in the hot-path, taking a lock in it would impact its performance considerably

Taking locks without timeouts exposes us to potential dead-locks

Have removed the lock and use another way to guarantee parquet wrtier is used in a thread safe way

alexeykudinkin · 2022-03-10T22:41:28Z

...nt/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/SparkMergeHelper.java

@@ -101,13 +101,13 @@ public void runMerge(HoodieTable<T, JavaRDD<HoodieRecord<T>>, JavaRDD<HoodieKey>
    } catch (Exception e) {
      throw new HoodieException(e);
    } finally {
+      if (null != wrapper) {


Can you please rename wrapper to executor as well? wrapper is a complete misnomer

Done
Revert it and may do this in another PR...

alexeykudinkin · 2022-03-10T22:45:53Z

@guanziyue thank you for taking the time to troubleshoot this concurrency issues and implement the fix!

I echo @vinothchandar concerns and i think we're taking a step a bit too far -- ParquetWriter is not assumed to be thread-safe, neither do i believe we should make it such.

Instead, i believe we should just resolve the problem with its concurrent access (which you already did) and make sure we make it clear that ParquetWriter is not thread-safe so its usage need to be properly guarded externally.

alexeykudinkin · 2022-03-10T22:46:47Z

...nt/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/SparkMergeHelper.java

@@ -101,13 +101,13 @@ public void runMerge(HoodieTable<T, JavaRDD<HoodieRecord<T>>, JavaRDD<HoodieKey>
    } catch (Exception e) {
      throw new HoodieException(e);
    } finally {
+      if (null != wrapper) {


Can you please also add a comment here to elaborating why this particular order is crucial

guanziyue · 2022-03-11T06:42:18Z

@guanziyue thank you for taking the time to troubleshoot this concurrency issues and implement the fix!

I echo @vinothchandar concerns and i think we're taking a step a bit too far -- ParquetWriter is not assumed to be thread-safe, neither do i believe we should make it such.

Instead, i believe we should just resolve the problem with its concurrent access (which you already did) and make sure we make it clear that ParquetWriter is not thread-safe so its usage need to be properly guarded externally.

Hi @alexeykudinkin, may I know if your concern is "adding a lock to parquetWriter" or "adding a lock to hot path"? I'm afraid that it is difficult to come up with a method to guarantee this problem is totally solved except adding a signal to hot path. Producer need to check if current thread is interrupted and response to it in a reasonable time or consumer need to immediately reject any writing just after close method is called, which also need a lock on hot path. For producer solution, we can have a lock-free check. For consumer, we may use volatile rather than a lock? But either of them is adding something to hot path.

vinothchandar

At this point, the main thing this PR solves is the following?

Add a graceful exit for BoundedInMemoryExecutor.

Could we also rebase from master? There are two executor services now and I think it ll subsume most of your rename changes here.

vinothchandar · 2022-03-11T11:55:41Z

...nt/src/main/java/org/apache/hudi/table/action/bootstrap/ParquetBootstrapMetadataHandler.java

-        wrapper.shutdownNow();
+      reader.close();
+      if (null != executor) {
+        executor.shutdownNow();


any reason we don't awaitTermination here?

Finished. And add waiting of termination for other shutdownNow usage.

vinothchandar · 2022-03-11T11:56:22Z

hudi-common/src/main/java/org/apache/hudi/common/util/queue/BoundedInMemoryExecutor.java

@@ -47,7 +48,7 @@
 public class BoundedInMemoryExecutor<I, O, E> {

  private static final Logger LOG = LogManager.getLogger(BoundedInMemoryExecutor.class);
-
+  private static final long TERMINATE_WAITING_TIME = 60L;


rename: TERMINATE_TIMEOUT_SECS ?

vinothchandar · 2022-03-11T11:57:02Z

...nt/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/SparkMergeHelper.java

@@ -77,7 +77,7 @@ public void runMerge(HoodieTable<T, JavaRDD<HoodieRecord<T>>, JavaRDD<HoodieKey>
      readSchema = mergeHandle.getWriterSchemaWithMetaFields();
    }

-    BoundedInMemoryExecutor<GenericRecord, GenericRecord, Void> wrapper = null;
+    BoundedInMemoryExecutor<GenericRecord, GenericRecord, Void> executor = null;


In future, could we do these renames in a separate PR?

@vinothchandar i actually asked to do it, while @guanziyue was fixing things here.

We talked about it, and surely I see your point that this expands the surface for reviewer, but am afraid that small clean ups like that if not done right away, they're just sentenced to not be done ever (since someone will need to consciously set time to come back and clean this up)

guanziyue · 2022-03-11T16:45:15Z

subsume
I will rebase from master in next commit. The master branch just changed soon after my last rebase operation.
All in all, this PR initially want to solve the concurrent use of mergeHandle which is equivalent to using parquet writer concurrently. I applied two changes before.

Add a graceful exit for BoundedInMemoryExecutor + change the order of method call in SparkMergeHelper.
Cons: a. will add a check of thread status on hot path. b. Any other future use of Parquet writer may have risk to suffer same problem.
Add a lock in ParquetWriter.
Pros: reduce the risk of misuse in the future.
Cons: add a lock on hot path which may influence perf. (I did a simple profiling. It is hard to observe a negative impact actually)
Either one can totally solve this problem. But both of them may have drawback.
And I will focus on this problem and make rename of executor another PR if needed.

alexeykudinkin · 2022-03-11T21:53:15Z

@guanziyue it's not only performance, it's also creating additional surface that someone could dead-lock on.

My point is simple -- we should fix the root-cause, but we should not re-purpose the components that were not intended to be thread-safe and over-engineer them into ones.

guanziyue · 2022-03-13T16:00:19Z

Removed lock from parquet writer and avoid concurrent usage of merge handle by using correct order to call relevant methods. Add a test for it.

alexeykudinkin · 2022-03-14T19:40:21Z

...spark-client/src/test/java/org/apache/hudi/execution/TestBoundedInMemoryExecutorInSpark.java

+      when(hoodieWriteConfig.getWriteBufferLimitBytes()).thenReturn(1024 * 1024);
+
+      Iterator<GenericRecord> unboundedRecordIter = new Iterator<GenericRecord>() {
+        private final Random random = new Random();


Let's make test reproducible:

Please add static seed for Random

Don't use UUID.randomUUID (you can take a look how UUIDs are pseudo-randomly generated in HoodieTestDataGenerator

Let's actually abstract all test data generation w/in HoodieTestDataGenerator

hudi-common/src/main/java/org/apache/hudi/common/util/queue/BoundedInMemoryExecutor.java

alexeykudinkin · 2022-03-14T19:53:40Z

hudi-common/src/main/java/org/apache/hudi/common/util/queue/BoundedInMemoryExecutor.java

+        Thread.currentThread().interrupt();
+        return false;
+      }
+      // if current thread has been interrupted before awaitTermination was called.


If i understood your intention correctly, you want to give a chance to executors to shutdown properly before proceeding, right?

You don't need to duplicate code for this you can just call interrupted instead of isInterrupted and that would clear the interrupted state flag, making sure that following awaitTermination invocations won't fail (line 181, 182)

Thanks for your suggestion. Refine the code and make the comment more clear.

alexeykudinkin · 2022-03-14T19:58:01Z

...spark-client/src/test/java/org/apache/hudi/execution/TestBoundedInMemoryExecutorInSpark.java

+  }
+
+  @Test
+  public void testExecutorTermination() throws ExecutionException, InterruptedException {


Appreciate your effort in putting up this test, but i usually suggest to avoid checking in non-deterministic tests -- non-deterministic tests is paving the way for flakiness of them given that we don't control tests execution environment (in CI and where not)

Let's instead either try to rewrite it as deterministic test (both positive, negative) by controlling the execution with CountDownLatch, CyclicBarrier OR keep just the positive case (which has to pass and should not fail)

Do think about this. This test cannot achieve the goal to find out if a code change can reproduce this problem. Make this UT more simple which just validate termination and awaiting works well.

nsivabalan · 2022-03-29T01:15:32Z

rebased w/ latest master and pushed an update.

…call of HoodieWriteHandle

guanziyue · 2022-03-29T22:45:19Z

@hudi-bot run azure

hudi-bot · 2022-03-30T00:03:54Z

CI report:

b1b66d9 Azure: FAILURE Azure: SUCCESS

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

alexeykudinkin · 2022-05-05T20:34:40Z

LGTM, @nsivabalan @yihua can you please help land that one?

…exit gracefully (apache#4264)

boneanxs · 2023-03-29T07:39:34Z

hudi-common/src/main/java/org/apache/hudi/common/util/queue/BoundedInMemoryExecutor.java

+  public boolean awaitTermination() {
+    // if current thread has been interrupted before awaitTermination was called, we still give
+    // executor a chance to proceeding. So clear the interrupt flag and reset it if needed before return.
+    boolean interruptedBefore = Thread.interrupted();


Hi @guanziyue Since we always call shutdownNow before awaitTermination, here why we need to clear interrupt status and await producer&consumer terminated? They should already shutdown by shutdownNow before?

* [HUDI-3972] Fixing hoodie.properties/tableConfig for no preCombine field with writes (apache#5424) Fixed instantiation of new table to set the null for preCombine if not explicitly set by the user. * [HUDI-3478] Claim RFC 51 For CDC (apache#5437) * [MINOR] Update alter rename command class type for pattern matching (apache#5381) * [HUDI-3977] Flink hudi table with date type partition path throws HoodieNotSupportedException (apache#5432) * Claim RFC 52 for Introduce Secondary Index to Improve HUDI Query Performance (apache#5441) * [HUDI-3945] After the async compaction operation is complete, the task should exit. (apache#5391) Co-authored-by: y00617041 <yangxuan42@huawei.com> * [HUDI-3815] Fix docs description of metadata.compaction.delta_commits default value error (apache#5368) Co-authored-by: pusheng.li01 <pusheng.li01@liulishuo.com> * [HUDI-3943] Some description fixes for 0.10.1 docs (apache#5447) * [MINOR] support different cleaning policy for flink (apache#5459) * [HUDI-3758] Fix duplicate fileId error in MOR table type with flink bucket hash Index (apache#5185) * fix duplicate fileId with bucket Index * replace to load FileGroup from FileSystemView * [MINOR] Fix CI by ignoring SparkContext error (apache#5468) Sets spark.driver.allowMultipleContexts = true when constructing Spark conf in UtilHelpers * [HUDI-3862] Fix default configurations of HoodieHBaseIndexConfig (apache#5308) Co-authored-by: xicm <xicm@asiainfo.com> * [HUDI-3978] Fix use of partition path field as hive partition field in flink (apache#5434) * Fix partition path fields as hive sync partition fields error * [MINOR] Update DOAP for release 0.11.0 (apache#5467) * [HUDI-3211][RFC-44] Add RFC for Hudi Connector for Presto (apache#4563) * Add RFC doc Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com> * Add note regarding catalog naming Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com> * [MINOR] Update RFC status (apache#5486) * [HUDI-4005] Update release scripts to help validation (apache#5479) * [HUDI-4031] Avoid clustering update handling when no pending replacecommit (apache#5487) * [HUDI-3667] Run unit tests of hudi-integ-tests in CI (apache#5078) * [MINOR] Optimize code logic (apache#5499) * [HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor exit gracefully (apache#4264) * [HUDI-4042] Support truncate-partition for Spark-3.2 (apache#5506) * [HUDI-4017] Improve spark sql coverage in CI (apache#5512) Add GitHub actions tasks to run spark sql UTs under spark 3.1 and 3.2. * [HUDI-3675] Adding post write termination strategy to deltastreamer continuous mode (apache#5073) - Added a postWriteTerminationStrategy to deltastreamer continuous mode. One can enable by setting the appropriate termination strategy using DeltastreamerConfig.postWriteTerminationStrategyClass. If not, continuous mode is expected to run forever. - Added one concrete impl for termination strategy as NoNewDataTerminationStrategy which shuts down deltastreamer if there is no new data to consume from source for N consecutive rounds. * [HUDI-3849] AvroDeserializer supports AVRO_REBASE_MODE_IN_READ configuration (apache#5287) * [MINOR] Fixing class not found when using flink and enable metadata table (apache#5527) * [MINOR] fixing flaky tests in deltastreamer tests (apache#5521) * [HUDI-4055]refactor ratelimiter to avoid stack overflow (apache#5530) * [MINOR] Fixing close for HoodieCatalog's test (apache#5531) * [MINOR] Fixing close for HoodieCatalog's test * [HUDI-4053] Flaky ITTestHoodieDataSource.testStreamWriteBatchReadOpti… (apache#5526) * [HUDI-4053] Flaky ITTestHoodieDataSource.testStreamWriteBatchReadOptimized Co-authored-by: xicm <xicm@asiainfo.com> * [HUDI-3995] Making perf optimizations for bulk insert row writer path (apache#5462) - Avoid using udf for key generator for SimpleKeyGen and NonPartitionedKeyGen. - Fixed NonPartitioned Key generator to directly fetch record key from row rather than involving GenericRecord. - Other minor fixes around using static values instead of looking up hashmap. * [HUDI-4044] When reading data from flink-hudi to external storage, the … (apache#5516) Co-authored-by: aliceyyan <aliceyyan@tencent.com> * [HUDI-4003] Try to read all the log file to parse schema (apache#5473) * [HUDI-4038] Avoid calling `getDataSize` after every record written (apache#5497) - getDataSize has non-trivial overhead in the current ParquetWriter impl, requiring traversal of already composed Column Groups in memory. Instead we can sample these calls to getDataSize to amortize its cost. Co-authored-by: sivabalan <n.siva.b@gmail.com> * [HUDI-4079] Supports showing table comment for hudi with spark3 (apache#5546) * [HUDI-4085] Fixing flakiness with parquet empty batch tests in TestHoodieDeltaStreamer (apache#5559) * [HUDI-3963][Claim RFC number 53] Use Lock-Free Message Queue Improving Hoodie Writing Efficiency. (apache#5562) Co-authored-by: yuezhang <yuezhang@freewheel.tv> * [HUDI-4018][HUDI-4027] Adding integ test yamls for immutable use-cases. Added delete partition support to integ tests (apache#5501) - Added pure immutable test yamls to integ test framework. Added SparkBulkInsertNode as part of it. - Added delete_partition support to integ test framework using spark-datasource. - Added a single yaml to test all non core write operations (insert overwrite, insert overwrite table and delete partitions) - Added tests for 4 concurrent spark datasource writers (multi-writer tests). - Fixed readme w/ sample commands for multi-writer. * [HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink (apache#5528) * [HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink * [MINOR] Fix a NPE for Option (apache#5461) * [HUDI-4078][HUDI-FLINK]BootstrapOperator contains the pending compact… (apache#5545) * [HUDI-4078][HUDI-FLINK]BootstrapOperator contains the pending compaction files * [HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink (apache#5574) * [HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink * [HUDI-4072] Fix NULL schema for empty batches in deltastreamer (apache#5543) * [HUDI-4097] add table info to jobStatus (apache#5529) Co-authored-by: wqwl611 <wqwl611@gmail.com> * [HUDI-3980] Suport kerberos hbase index (apache#5464) - Add configurations in HoodieHBaseIndexConfig.java to support kerberos hbase connection. Co-authored-by: xicm <xicm@asiainfo.com> * [HUDI-4001] Filter the properties should not be used when create table for Spark SQL (apache#5495) * fix hive sync no partition table error (apache#5585) * [HUDI-3123] consistent hashing index: basic write path (upsert/insert) (apache#4480) 1. basic write path(insert/upsert) implementation 2. adapt simple bucket index * [HUDI-4098] Metadata table heartbeat for instant has expired, last heartbeat 0 (apache#5583) * [HUDI-4103] [HUDI-4001] Filter the properties should not be used when create table for Spark SQL * [HUDI-3654] Preparations for hudi metastore. (apache#5572) * [HUDI-3654] Preparations for hudi metastore. Co-authored-by: gengxiaoyu <gengxiaoyu@bytedance.com> * [HUDI-4104] DeltaWriteProfile includes the pending compaction file slice when deciding small buckets (apache#5594) * [HUDI-4101] BucketIndexPartitioner should take partition path for better dispersion (apache#5590) * [HUDI-4087] Support dropping RO and RT table in DropHoodieTableCommand (apache#5564) * [HUDI-4087] Support dropping RO and RT table in DropHoodieTableCommand * Set hoodie.query.as.ro.table in serde properties * [HUDI-4110] Clean the marker files for flink compaction (apache#5604) * [MINOR] Fixing spark long running yaml for non-partitioned (apache#5607) * [minor] Some code refactoring for LogFileComparator and Instant instantiation (apache#5600) * [HUDI-4109] Copy the old record directly when it is chosen for merging (apache#5603) * Clean the marker files for flink compaction (apache#5611) Co-authored-by: 854194341@qq.com <loukey_7821> * [HUDI-3942] [RFC-50] Improve Timeline Server (apache#5392) * [HUDI-4111] Bump ANTLR runtime version in Spark 3.x (apache#5606) * Revert "[HUDI-3870] Add timeout rollback for flink online compaction (apache#5314)" (apache#5622) This reverts commit 6f9b02d. * [HUDI-4116] Unify clustering/compaction related procedures' output type (apache#5620) * Unify clustering/compaction related procedures' output type * Address review comments * [HUDI-4114] Remove the unnecessary fs view sync for BaseWriteClient#initTable (apache#5617) No need to #sync actively because the table instance is instantiated freshly, its view manager has empty fiew instantces, the fs view would be synced lazily when is it requested. * [HUDI-4119] the first read result is incorrect when Flink upsert- Kafka connector is used in HUDi (apache#5626) * HUDI-4119 the first read result is incorrect when Flink upsert- Kafka connector is used in HUDi Co-authored-by: aliceyyan <aliceyyan@tencent.com> * [HUDI-4130] Remove the upgrade/downgrade for flink #initTable (apache#5642) * [HUDI-3985] Refactor DLASyncTool to support read hoodie table as spark datasource table (apache#5532) * [MINOR] Minor fixes to exception log and removing unwanted metrics flush in integ test (apache#5646) * [HUDI-4122] Fix NPE caused by adding kafka nodes (apache#5632) * [MINOR] remove unused gson test dependency (apache#5652) * [HUDI-3858] Shade javax.servlet for Spark bundle jar (apache#5295) Co-authored-by: yuezhang <yuezhang@freewheel.tv> * [HUDI-4100] CTAS failed to clean up when given an illegal MANAGED table definition (apache#5588) * [HUDI-3890] fix rat plugin issue with sql files (apache#5644) * [HUDI-4051] Allow nested field as primary key and preCombineField in spark sql (apache#5517) * [HUDI-4051] Allow nested field as preCombineField in spark sql * relax validation for primary key * [HUDI-4129] Initializes a new fs view for WriteProfile#reload (apache#5640) Co-authored-by: zhangyuang <zhangyuang@corp.netease.com> * [HUDI-4142] Claim RFC-54 for new table APIs (apache#5665) * [HUDI-3933] Add UT cases to cover different key gen (apache#5638) * [MINOR] Removing redundant semicolons and line breaks (apache#5662) * [HUDI-4134] Fix Method naming consistency issues in FSUtils (apache#5655) * [HUDI-4084] Add support to test async table services with integ test suite framework (apache#5557) * Add support to test async table services with integ test suite framework * Make await time for validation configurable * [HUDI-4138] Fix the concurrency modification of hoodie table config for flink (apache#5660) * Remove the metadata cleaning strategy for flink, that means the multi-modal index may be affected * Improve the HoodieTable#clearMetadataTablePartitionsConfig to only update table config when necessary * Remove the modification of read code path in HoodieTableConfig * [HUDI-2473] Fixing compaction write operation in commit metadata (apache#5203) * [HUDI-4145] Archives the metadata file in HoodieInstant.State sequence (apache#5669) * [HUDI-4135] remove netty and netty-all (apache#5663) * [HUDI-2207] Support independent flink hudi clustering function * [HUDI-4132] Fixing determining target table schema for delta sync with empty batch (apache#5648) * [MINOR] Fix a potential NPE and some finer points of hudi cli (apache#5656) * [HUDI-4146] Claim RFC-55 for Improve Hive/Meta sync class design and hierachies (apache#5682) * [HUDI-3193] Decouple hudi-aws from hudi-client-common (apache#5666) Move HoodieMetricsCloudWatchConfig to hudi-client-common Co-authored-by: Sivabalan Narayanan <n.siva.b@gmail.com> Co-authored-by: Yann Byron <biyan900116@gmail.com> Co-authored-by: KnightChess <981159963@qq.com> Co-authored-by: Danny Chan <yuzhao.cyz@gmail.com> Co-authored-by: huberylee <shibei.lh@foxmail.com> Co-authored-by: watermelon12138 <49849410+watermelon12138@users.noreply.github.com> Co-authored-by: y00617041 <yangxuan42@huawei.com> Co-authored-by: Ibson <pushengli@163.com> Co-authored-by: pusheng.li01 <pusheng.li01@liulishuo.com> Co-authored-by: LiChuang <64473732+CodeCooker17@users.noreply.github.com> Co-authored-by: Gary Li <yanjia.gary.li@gmail.com> Co-authored-by: 吴祥平 <408317717@qq.com> Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com> Co-authored-by: xicm <36392121+xicm@users.noreply.github.com> Co-authored-by: xicm <xicm@asiainfo.com> Co-authored-by: Wangyh <763941163@qq.com> Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com> Co-authored-by: Todd Gao <todd.gao.2013@gmail.com> Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com> Co-authored-by: qianchutao <72595723+qianchutao@users.noreply.github.com> Co-authored-by: guanziyue <30882822+guanziyue@users.noreply.github.com> Co-authored-by: Jin Xing <jinxing.corey@gmail.com> Co-authored-by: cxzl25 <cxzl25@users.noreply.github.com> Co-authored-by: BruceLin <brucekellan@gmail.com> Co-authored-by: ForwardXu <forwardxu315@gmail.com> Co-authored-by: aliceyyan <104287562+aliceyyan@users.noreply.github.com> Co-authored-by: aliceyyan <aliceyyan@tencent.com> Co-authored-by: Lanyuanxiaoyao <lanyuanxiaoyao@gmail.com> Co-authored-by: Alexey Kudinkin <alexey@infinilake.com> Co-authored-by: YueZhang <69956021+zhangyue19921010@users.noreply.github.com> Co-authored-by: yuezhang <yuezhang@freewheel.tv> Co-authored-by: Bo Cui <cuibo0108@163.com> Co-authored-by: Xingcan Cui <xcui@wealthsimple.com> Co-authored-by: wqwl611 <67826098+wqwl611@users.noreply.github.com> Co-authored-by: wqwl611 <wqwl611@gmail.com> Co-authored-by: 董可伦 <dongkelun01@inspur.com> Co-authored-by: 陈浩 <bettermouse94@gmail.com> Co-authored-by: Yuwei XIAO <ywxiaozero@gmail.com> Co-authored-by: Shawy Geng <gengxiaoyu1996@gmail.com> Co-authored-by: gengxiaoyu <gengxiaoyu@bytedance.com> Co-authored-by: luokey <854194341@qq.com> Co-authored-by: Zhaojing Yu <yuzhaojing@bytedance.com> Co-authored-by: wangxianghu <wangxianghu@apache.org> Co-authored-by: uday08bce <uday08bce@gmail.com> Co-authored-by: YuangZhang <z_yuang@foxmail.com> Co-authored-by: zhangyuang <zhangyuang@corp.netease.com> Co-authored-by: felixYyu <felix2003@live.cn> Co-authored-by: Heap <35054152+h1ap@users.noreply.github.com> Co-authored-by: liujinhui <965147871@qq.com> Co-authored-by: luoyajun <luoyajun1010@gmail.com> Co-authored-by: 冯健 <fengjian428@gmail.com> Co-authored-by: Rajesh Mahindra <rmahindra@gmail.com>

…che#37) * [MINOR] Update alter rename command class type for pattern matching (apache#5381) * [HUDI-3977] Flink hudi table with date type partition path throws HoodieNotSupportedException (apache#5432) * Claim RFC 52 for Introduce Secondary Index to Improve HUDI Query Performance (apache#5441) * [HUDI-3945] After the async compaction operation is complete, the task should exit. (apache#5391) Co-authored-by: y00617041 <yangxuan42@huawei.com> * [HUDI-3815] Fix docs description of metadata.compaction.delta_commits default value error (apache#5368) Co-authored-by: pusheng.li01 <pusheng.li01@liulishuo.com> * [HUDI-3943] Some description fixes for 0.10.1 docs (apache#5447) * [MINOR] support different cleaning policy for flink (apache#5459) * [HUDI-3758] Fix duplicate fileId error in MOR table type with flink bucket hash Index (apache#5185) * fix duplicate fileId with bucket Index * replace to load FileGroup from FileSystemView * [MINOR] Fix CI by ignoring SparkContext error (apache#5468) Sets spark.driver.allowMultipleContexts = true when constructing Spark conf in UtilHelpers * [HUDI-3862] Fix default configurations of HoodieHBaseIndexConfig (apache#5308) Co-authored-by: xicm <xicm@asiainfo.com> * [HUDI-3978] Fix use of partition path field as hive partition field in flink (apache#5434) * Fix partition path fields as hive sync partition fields error * [MINOR] Update DOAP for release 0.11.0 (apache#5467) * [HUDI-3211][RFC-44] Add RFC for Hudi Connector for Presto (apache#4563) * Add RFC doc Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com> * Add note regarding catalog naming Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com> * [MINOR] Update RFC status (apache#5486) * [HUDI-4005] Update release scripts to help validation (apache#5479) * [HUDI-4031] Avoid clustering update handling when no pending replacecommit (apache#5487) * [HUDI-3667] Run unit tests of hudi-integ-tests in CI (apache#5078) * [MINOR] Optimize code logic (apache#5499) * [HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor exit gracefully (apache#4264) * [HUDI-4042] Support truncate-partition for Spark-3.2 (apache#5506) * [HUDI-4017] Improve spark sql coverage in CI (apache#5512) Add GitHub actions tasks to run spark sql UTs under spark 3.1 and 3.2. * [HUDI-3675] Adding post write termination strategy to deltastreamer continuous mode (apache#5073) - Added a postWriteTerminationStrategy to deltastreamer continuous mode. One can enable by setting the appropriate termination strategy using DeltastreamerConfig.postWriteTerminationStrategyClass. If not, continuous mode is expected to run forever. - Added one concrete impl for termination strategy as NoNewDataTerminationStrategy which shuts down deltastreamer if there is no new data to consume from source for N consecutive rounds. * [HUDI-3849] AvroDeserializer supports AVRO_REBASE_MODE_IN_READ configuration (apache#5287) * [MINOR] Fixing class not found when using flink and enable metadata table (apache#5527) * [MINOR] fixing flaky tests in deltastreamer tests (apache#5521) * [HUDI-4055]refactor ratelimiter to avoid stack overflow (apache#5530) * [MINOR] Fixing close for HoodieCatalog's test (apache#5531) * [MINOR] Fixing close for HoodieCatalog's test * [HUDI-4053] Flaky ITTestHoodieDataSource.testStreamWriteBatchReadOpti… (apache#5526) * [HUDI-4053] Flaky ITTestHoodieDataSource.testStreamWriteBatchReadOptimized Co-authored-by: xicm <xicm@asiainfo.com> * [HUDI-3995] Making perf optimizations for bulk insert row writer path (apache#5462) - Avoid using udf for key generator for SimpleKeyGen and NonPartitionedKeyGen. - Fixed NonPartitioned Key generator to directly fetch record key from row rather than involving GenericRecord. - Other minor fixes around using static values instead of looking up hashmap. * [HUDI-4044] When reading data from flink-hudi to external storage, the … (apache#5516) Co-authored-by: aliceyyan <aliceyyan@tencent.com> * [HUDI-4003] Try to read all the log file to parse schema (apache#5473) * [HUDI-4038] Avoid calling `getDataSize` after every record written (apache#5497) - getDataSize has non-trivial overhead in the current ParquetWriter impl, requiring traversal of already composed Column Groups in memory. Instead we can sample these calls to getDataSize to amortize its cost. Co-authored-by: sivabalan <n.siva.b@gmail.com> * [HUDI-4079] Supports showing table comment for hudi with spark3 (apache#5546) * [HUDI-4085] Fixing flakiness with parquet empty batch tests in TestHoodieDeltaStreamer (apache#5559) * [HUDI-3963][Claim RFC number 53] Use Lock-Free Message Queue Improving Hoodie Writing Efficiency. (apache#5562) Co-authored-by: yuezhang <yuezhang@freewheel.tv> * [HUDI-4018][HUDI-4027] Adding integ test yamls for immutable use-cases. Added delete partition support to integ tests (apache#5501) - Added pure immutable test yamls to integ test framework. Added SparkBulkInsertNode as part of it. - Added delete_partition support to integ test framework using spark-datasource. - Added a single yaml to test all non core write operations (insert overwrite, insert overwrite table and delete partitions) - Added tests for 4 concurrent spark datasource writers (multi-writer tests). - Fixed readme w/ sample commands for multi-writer. * [HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink (apache#5528) * [HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink * [MINOR] Fix a NPE for Option (apache#5461) * [HUDI-4078][HUDI-FLINK]BootstrapOperator contains the pending compact… (apache#5545) * [HUDI-4078][HUDI-FLINK]BootstrapOperator contains the pending compaction files * [HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink (apache#5574) * [HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink * [HUDI-4072] Fix NULL schema for empty batches in deltastreamer (apache#5543) * [HUDI-4097] add table info to jobStatus (apache#5529) Co-authored-by: wqwl611 <wqwl611@gmail.com> * [HUDI-3980] Suport kerberos hbase index (apache#5464) - Add configurations in HoodieHBaseIndexConfig.java to support kerberos hbase connection. Co-authored-by: xicm <xicm@asiainfo.com> * [HUDI-4001] Filter the properties should not be used when create table for Spark SQL (apache#5495) * fix hive sync no partition table error (apache#5585) * [HUDI-3123] consistent hashing index: basic write path (upsert/insert) (apache#4480) 1. basic write path(insert/upsert) implementation 2. adapt simple bucket index * [HUDI-4098] Metadata table heartbeat for instant has expired, last heartbeat 0 (apache#5583) * [HUDI-4103] [HUDI-4001] Filter the properties should not be used when create table for Spark SQL * [HUDI-3654] Preparations for hudi metastore. (apache#5572) * [HUDI-3654] Preparations for hudi metastore. Co-authored-by: gengxiaoyu <gengxiaoyu@bytedance.com> * [HUDI-4104] DeltaWriteProfile includes the pending compaction file slice when deciding small buckets (apache#5594) * [HUDI-4101] BucketIndexPartitioner should take partition path for better dispersion (apache#5590) * [HUDI-4087] Support dropping RO and RT table in DropHoodieTableCommand (apache#5564) * [HUDI-4087] Support dropping RO and RT table in DropHoodieTableCommand * Set hoodie.query.as.ro.table in serde properties * [HUDI-4110] Clean the marker files for flink compaction (apache#5604) * [MINOR] Fixing spark long running yaml for non-partitioned (apache#5607) * [minor] Some code refactoring for LogFileComparator and Instant instantiation (apache#5600) * [HUDI-4109] Copy the old record directly when it is chosen for merging (apache#5603) * Clean the marker files for flink compaction (apache#5611) Co-authored-by: 854194341@qq.com <loukey_7821> * [HUDI-3942] [RFC-50] Improve Timeline Server (apache#5392) * [HUDI-4111] Bump ANTLR runtime version in Spark 3.x (apache#5606) * Revert "[HUDI-3870] Add timeout rollback for flink online compaction (apache#5314)" (apache#5622) This reverts commit 6f9b02d. * [HUDI-4116] Unify clustering/compaction related procedures' output type (apache#5620) * Unify clustering/compaction related procedures' output type * Address review comments * [HUDI-4114] Remove the unnecessary fs view sync for BaseWriteClient#initTable (apache#5617) No need to #sync actively because the table instance is instantiated freshly, its view manager has empty fiew instantces, the fs view would be synced lazily when is it requested. * [HUDI-4119] the first read result is incorrect when Flink upsert- Kafka connector is used in HUDi (apache#5626) * HUDI-4119 the first read result is incorrect when Flink upsert- Kafka connector is used in HUDi Co-authored-by: aliceyyan <aliceyyan@tencent.com> * [HUDI-4130] Remove the upgrade/downgrade for flink #initTable (apache#5642) * [HUDI-3985] Refactor DLASyncTool to support read hoodie table as spark datasource table (apache#5532) * [MINOR] Minor fixes to exception log and removing unwanted metrics flush in integ test (apache#5646) * [HUDI-4122] Fix NPE caused by adding kafka nodes (apache#5632) * [MINOR] remove unused gson test dependency (apache#5652) * [HUDI-3858] Shade javax.servlet for Spark bundle jar (apache#5295) Co-authored-by: yuezhang <yuezhang@freewheel.tv> * [HUDI-4100] CTAS failed to clean up when given an illegal MANAGED table definition (apache#5588) * [HUDI-3890] fix rat plugin issue with sql files (apache#5644) * [HUDI-4051] Allow nested field as primary key and preCombineField in spark sql (apache#5517) * [HUDI-4051] Allow nested field as preCombineField in spark sql * relax validation for primary key * [HUDI-4129] Initializes a new fs view for WriteProfile#reload (apache#5640) Co-authored-by: zhangyuang <zhangyuang@corp.netease.com> * [HUDI-4142] Claim RFC-54 for new table APIs (apache#5665) * [HUDI-3933] Add UT cases to cover different key gen (apache#5638) * [MINOR] Removing redundant semicolons and line breaks (apache#5662) * [HUDI-4134] Fix Method naming consistency issues in FSUtils (apache#5655) * [HUDI-4084] Add support to test async table services with integ test suite framework (apache#5557) * Add support to test async table services with integ test suite framework * Make await time for validation configurable * [HUDI-4138] Fix the concurrency modification of hoodie table config for flink (apache#5660) * Remove the metadata cleaning strategy for flink, that means the multi-modal index may be affected * Improve the HoodieTable#clearMetadataTablePartitionsConfig to only update table config when necessary * Remove the modification of read code path in HoodieTableConfig * [HUDI-2473] Fixing compaction write operation in commit metadata (apache#5203) * [HUDI-4145] Archives the metadata file in HoodieInstant.State sequence (apache#5669) * [HUDI-4135] remove netty and netty-all (apache#5663) * [HUDI-2207] Support independent flink hudi clustering function * [HUDI-4132] Fixing determining target table schema for delta sync with empty batch (apache#5648) * [MINOR] Fix a potential NPE and some finer points of hudi cli (apache#5656) * [HUDI-4146] Claim RFC-55 for Improve Hive/Meta sync class design and hierachies (apache#5682) * [HUDI-3193] Decouple hudi-aws from hudi-client-common (apache#5666) Move HoodieMetricsCloudWatchConfig to hudi-client-common * [HUDI-4145] Archives the metadata file in HoodieInstant.State sequence (part2) (apache#5676) * [HUDI-4040] Bulk insert Support CustomColumnsSortPartitioner with Row (apache#5502) * Along the lines of RDDCustomColumnsSortPartitioner but for Row * [HUDI-4023] Decouple hudi-spark from hudi-utilities-slim-bundle (apache#5641) * [HUDI-4124] Add valid check in Spark Datasource configs (apache#5637) Co-authored-by: wangzixuan.wzxuan <wangzixuan.wzxuan@bytedance.com> * [HUDI-3963][RFC-53] Use Lock-Free Message Queue Disruptor Improving Hoodie Writing Efficiency (apache#5567) Co-authored-by: yuezhang <yuezhang@freewheel.tv> * [HUDI-4162] Fixed some constant mapping issues. (apache#5700) Co-authored-by: y00617041 <yangxuan42@huawei.com> * [HUDI-4161] Make sure partition values are taken from partition path (apache#5699) * [MINOR] Fix the issue when handling conf hoodie.datasource.write.operation=bulk_insert in sql mode (apache#5679) Co-authored-by: Rex An <bonean131@gmail.com> * [HUDI-4151] flink split_reader supports rocksdb (apache#5675) * [HUDI-4151] flink split_reader supports rocksdb * [HUDI-4160] Make database regex of MaxwellJsonKafkaSourcePostProcessor optional (apache#5697) * [MINOR] Fix Hive and meta sync config for sql statement (apache#5316) * [HUDI-4166] Added SimpleClient plugin for integ test (apache#5710) * [HUDI-3551] Add the Oracle Cloud Infrastructure (oci) Object Storage URI scheme (apache#4952) * [HUDI-3551] Fix testStorageSchemes for oci storage (apache#5711) * [HUDI-4086] Use CustomizedThreadFactory in async compaction and clustering (apache#5563) Co-authored-by: 苏承祥 <sucx@tuya.com> * [HUDI-4163] Catch general exception instead of IOException while fetching rollback plan during rollback (apache#5703) If the avro file is corrupted, an InvalidAvroMagicException throws. * [HUDI-4149] Drop-Table fails when underlying table directory is broken (apache#5672) * [HUDI-4107] Added --sync-tool-classes config option in HoodieMultiTableDeltaStreamer (apache#5597) * added --sync-tool-classes config option in multitable delta streamer * added a testcase to assert if syncClientToolClassNames is getting picked to the deltastreamer execution context * [HUDI-4174] Add hive conf dir option for flink sink (apache#5725) * [HUDI-4011] Add hudi-aws-bundle (apache#5674) Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com> * [HUDI-3670] free temp views in sql transformers (apache#5080) * [HUDI-4167] Remove the timeline refresh with initializing hoodie table (apache#5716) The timeline refresh on table initialization invokes the fs view #sync, which has two actions now: 1. reload the timeline of the fs view, so that the next fs view request is based on this timeline metadata 2. if this is a local fs view, clear all the local states; if this is a remote fs view, send request to sync the remote fs view But, let's see the construction, the meta client is instantiated freshly so the timeline is already the latest, the table is also constructed freshly, so the fs view has no local states, that means, the #sync is unnecessary totally. In this patch, the metadata lifecycle and data set fs view are kept in sync, when the fs view is refreshed, the underneath metadata is also refreshed synchronouly. The freshness of the metadata follows the same rules as data fs view: 1. if the fs view is local, the visibility is based on the client table metadata client's latest commit 2. if the fs view is remote, the timeline server would #sync the fs view and metadata together based on the lagging server local timeline From the perspective of client, no need to care about the refresh action anymore no matter whether the metadata table is enabled or not. That make the client logic more clear and less error-prone. Removes the timeline refresh has another benefit: if avoids unncecessary #refresh of the remote fs view, if all the clients send request to #sync the remote fs view, the server would encounter conflicts and the client encounters a response error. * [HUDI-4179] Cluster with sort cloumns invalid (apache#5739) * [HUDI-4183] Fix using HoodieCatalog to create non-hudi tables (apache#5743) * [HUDI-4187] Fix partition order in aws glue sync (apache#5731) * [HUDI-4168] Add Call Procedure for marker deletion (apache#5738) * Add Call Procedure for marker deletion * [HUDI-4190] Include hbase-protocol for shading in the bundles (apache#5750) * [HUDI-4192] HoodieHFileReader scan top cells after bottom cells throw NullPointerException (apache#5755) SeekTo top cells avoid NullPointerException * [HUDI-4188] Fix flaky ITTestDataSTreamWrite.testWriteCopyOnWrite (apache#5749) * [HUDI-4195] Bulk insert should use right keygen for non-partitioned table (apache#5759) * [HUDI-4101] When BucketIndexPartitioner take partition path for dispersion may cause the fileID of the task to not be loaded correctly (apache#5763) Co-authored-by: john.wick <john.wick@vipshop.com> * [HUDI-4176] Fixing `TableSchemaResolver` to avoid repeated `HoodieCommitMetadata` parsing (apache#5733) As has been outlined in HUDI-4176, we've hit a roadblock while testing Hudi on a large dataset (~1Tb) having pretty fat commits where Hudi's commit metadata could reach into 100s of Mbs. Given the size some of ours commit metadata instances Spark's parsing and resolving phase (when spark.sql(...) is involved, but before returned Dataset is dereferenced) starts to dominate some of our queries' execution time. - Rebased onto new APIs to avoid excessive Hadoop's Path allocations - Eliminated hasOperationField completely to avoid repeatitive computations - Cleaning up duplication in HoodieActiveTimeline - Added caching for common instances of HoodieCommitMetadata - Made tableStructSchema lazy; * [HUDI-4140] Fixing hive style partitioning and default partition with bulk insert row writer with SimpleKeyGen and virtual keys (apache#5664) Bulk insert row writer code path had a gap wrt hive style partitioning and default partition when virtual keys are enabled with SimpleKeyGen. This patch fixes the issue. * [HUDI-4197] Fix Async indexer to support building FILES partition (apache#5766) - When async indexer is invoked only with "FILES" partition, it fails. Fixing it to work with Async indexer. Also, if metadata table itself is not initialized, and if someone is looking to build indexes via AsyncIndexer, first they are expected to index "FILES" partition followed by other partitions. In general, we have a limitation of building only one index at a time w/ AsyncIndexer and hence. Have added guards to ensure these conditions are met. * [HUDI-4171] Fixing Non partitioned with virtual keys in read path (apache#5747) - When Non partitioned key gen is used with virtual keys, read path could break since partition path may not exist. * [MINOR] Mark AWSGlueCatalogSyncClient experimental (apache#5775) * [MINOR][RFC-53] Fix typos (apache#5764) Co-authored-by: yuezhang <yuezhang@freewheel.tv> * [HUDI-4200] Fixing sorting of keys fetched from metadata table (apache#5773) - Key fetched from metadata table especially from base file reader is not sorted. and hence may result in throwing NPE (key prefix search) or unnecessary seeks to starting of Hfile (full key look ups). Fixing the same in this patch. This is not an issue with log blocks, since sorting is taking care within HoodieHfileDataBlock. - Commit where the sorting was mistakenly reverted [HUDI-3760] Adding capability to fetch Metadata Records by prefix apache#5208 * [HUDI-4198] Fix hive config for AWSGlueClientFactory (apache#5768) * HiveConf needs to load fs conf to allow instantiation via AWSGlueClientFactory * Resolve metastore uri config before loading fs conf * Skip hiveql due to CI issue Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com> * [HUDI-4178] Addressing performance regressions in Spark DataSourceV2 Integration (apache#5737) There are multiple issues with our current DataSource V2 integrations: b/c we advertise Hudi tables as V2, Spark expects it to implement certain APIs which are not implemented at the moment, instead we're using custom Resolution rule (in HoodieSpark3Analysis) to instead manually fallback to V1 APIs. This commit fixes the issue by reverting DSv2 APIs and making Spark use V1, except for schema evaluation logic. * [MINOR][DOCS] Update the README.md file in hudi-examples (apache#5803) * [MINOR] FlinkStateBackendConverter add more exception message (apache#5809) * [MINOR] FlinkStateBackendConverter add more exception message * [HUDI-4213] Infer keygen clazz for Spark SQL (apache#5815) * [HUDI-4139]improvement for flink write operator name to identify tables easily (apache#5744) Co-authored-by: yanenze <yanenze@keytop.com.cn> * [HUDI-3889] Do not validate table config if save mode is set to Overwrite (apache#5619) Co-authored-by: xicm <xicm@asiainfo.com> * [HUDI-4221] Fixing getAllPartitionPaths perf hit w/ FileSystemBackedMetadata (apache#5829) * [HUDI-4223] Fix NullPointerException from getLogRecordScanner when reading metadata table (apache#5840) When explicitly specifying the metadata table path for reading in spark, the "hoodie.metadata.enable" is overwritten to true for proper read behavior. * [HUDI-4205] Fix NullPointerException in HFile reader creation (apache#5841) Replace SerializableConfiguration with SerializableWritable for broadcasting the hadoop configuration before initializing HFile readers * [HUDI-4224] Fix CI issues (apache#5842) - Upgrade junit to 5.7.2 - Downgrade surefire and failsafe to 2.22.2 - Fix test failures that were previously not reported - Improve azure pipeline configs Co-authored-by: liujinhui1994 <965147871@qq.com> Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com> * [MINOR] fix AvroSchemaConverter duplicate branch in 'switch' (apache#5813) * Strip extra spaces when creating new configuration (apache#5849) Co-authored-by: superche <superche@tencent.com> * [HUDI-3682] testReaderFilterRowKeys fails in TestHoodieOrcReaderWriter (apache#5790) TestReaderFilterRowKeys needs to get the key from RECORD_KEY_METADATA_FIELD, but the writer in current UT does not populate the meta field and the schema does not contains meta fields. This fix writes data with schema which contains meta fields and calls writeAvroWithMetadata for writing. Co-authored-by: xicm <xicm@asiainfo.com> * [HUDI-3863] Add UT for drop partition column in deltastreamer testsuite (apache#5727) * [HUDI-4006] failOnDataLoss on delta-streamer kafka sources (apache#5718) add new config key hoodie.deltastreamer.source.kafka.enable.failOnDataLoss when failOnDataLoss=false (current behaviour, the default), log a warning instead of seeking to earliest silently when failOnDataLoss is set, fail explicitly * [HUDI-4207] HoodieFlinkWriteClient.getOrCreateWriteHandle throws an e… (apache#5788) Adding more logs to assist in debugging with HoodieFlinkWriteClient.getOrCreateWriteHandle throwing exception * [MINOR] Fix typo of DisruptorExecutor in RFC 53 (apache#5860) * [minor] Following HUDI-4207, remote the new wrapper #init method (apache#5865) * [HUDI-4255] Make the flink merge and replace handle intermediate file visible (apache#5866) * [HUDI-3499] Add Call Procedure for show rollbacks (apache#5848) * Add Call Procedure for show rollbacks * fix * add ut for show_rollback_detail and exception handle Co-authored-by: superche <superche@tencent.com> * [HUDI-4218] [HUDI-4218] Expose the real exception information when an exception occurs in the tableExists method (apache#5827) * [HUDI-4217] improve repeat init object in ExpressionPayload (apache#5825) * [HUDI-4214] improve repeat init write schema in ExpressionPayload (apache#5820) * [HUDI-4214] improve repeat init write schema in ExpressionPayload * [HUDI-4265] Deprecate useless targetTableName parameter in HoodieMultiTableDeltaStreamer (apache#5883) * [HUDI-4165] Support Create/Drop/Show/Refresh Index Syntax for Spark SQL (apache#5761) * Support Create/Drop/Show/Refresh Index Syntax for Spark SQL * [HUDI-3507] Support export command based on Call Produce Command (apache#5901) * [HUDI-4275] Refactor rollback inflight instant for clustering/compaction to reuse some code (apache#5894) * [MINOR] Add "spillable_map_path" in FlinkCompactionConfig. To avoid the disk space of "/tmp" full when compacting offline. (apache#5905) * [HUDI-4277] supoort flink table source with computed column (apache#5897) Co-authored-by: chenshizhi <chenshizhi@bilibili.com> * fix remove redundant Variable (apache#5806) * [HUDI-4259] Flink create avro schema not conformance to standards (apache#5878) * flink create avro schema not conformance to standards Co-authored-by: 854194341@qq.com <loukey_7821> * [HUDI-4258] Fix when HoodieTable removes data file before the end of Flink job (apache#5876) * [HUDI-4258] Fix when HoodieTable removes data file before the end of Flink job * [MINOR] Update DOAP with 0.11.1 Release (apache#5908) * [HUDI-4173] Fix wrong results if the user read no base files hudi table by glob paths (apache#5723) * [HUDI-4251] Fix the problem that the command 'commits sync' description does not match. (apache#5881) * [HUDI-4177] Fix hudi-cli rollback with rollbackUsingMarkers method call (apache#5734) * Fix hudi-cli rollback with rollbackUsingMarkers method call * Add test for hudi-cli rollbackUsingMarkers Co-authored-by: Shawn Chang <yxchang@amazon.com> * [HUDI-4270] Bootstrap op data loading missing (apache#5888) * [HUDI-3475] Initialize hudi table management module. * udate * Revert master (apache#5925) * Revert "udate" This reverts commit 092e35c. * Revert "[HUDI-3475] Initialize hudi table management module." This reverts commit 4640a3b. * [HUDI-4279] Strength the remote fs view lagging check when latest commit refresh is enabled (apache#5917) Signed-off-by: LinMingQiang <1356469429@qq.com> * [minor] following 4270, add unit tests for the keys lost case (apache#5918) * [HUDI-3508] Add call procedure for FileSystemViewCommand (apache#5929) * [HUDI-3508] Add call procedure for FileSystemView * minor Co-authored-by: jiimmyzhan <jiimmyzhan@tencent.com> * [HUDI-4299] Fix problem about hudi-example-java run failed on idea. (apache#5936) * [HUDI-4290] Fix fetchLatestBaseFiles to filter replaced filegroups (apache#5941) * [HUDI-4290] Fix fetchLatestBaseFiles to filter replaced filegroups * Separate out incremental sync fsview test with clustering * [HUDI-3509] Add call procedure for HoodieLogFileCommand (apache#5949) Co-authored-by: zhanshaoxiong <jiimmyzhan@tencent.com> * [HUDI-4273] Support inline schedule clustering for Flink stream (apache#5890) * [HUDI-4273] Support inline schedule clustering for Flink stream * delete deprecated clustering plan strategy and add clustering ITTest * [HUDI-3735] TestHoodieSparkMergeOnReadTableRollback is flaky (apache#5874) * [HUDI-4260] Change KEYGEN_CLASS_NAME without default value (apache#5877) * Change KEYGEN_CLASS_NAME without default value Co-authored-by: 854194341@qq.com <loukey_7821> * [HUDI-3512] Add call procedure for StatsCommand (apache#5955) Co-authored-by: zhanshaoxiong <shaoxiong0001@@gmail.com> * [TEST][DO_NOT_MERGE]fix random failed for ci (apache#5948) * Revert "[TEST][DO_NOT_MERGE]fix random failed for ci (apache#5948)" (apache#5971) This reverts commit e8fbd4d. * [HUDI-4319] Fixed Parquet's `PLAIN_DICTIONARY` encoding not being applied when bulk-inserting (apache#5966) * Fixed Dictionary encoding config not being properly propagated to Parquet writer (making it unable to apply it, substantially bloating the storage footprint) * [HUDI-4296] Fix the bug that TestHoodieSparkSqlWriter.testSchemaEvolutionForTableType is flaky (apache#5973) * [HUDI-3502] Support hdfs parquet import command based on Call Produce Command (apache#5956) * [MINOR] Remove -T option from CI build (apache#5972) * [HUDI-5246] Bumping mysql connector version due to security vulnerability (apache#5851) * [HUDI-4309] Spark3.2 custom parser should not throw exception (apache#5947) * [HUDI-4316] Support for spillable diskmap configuration when constructing HoodieMergedLogRecordScanner (apache#5959) * [HUDI-4315] Do not throw exception in BaseSpark3Adapter#toTableIdentifier (apache#5957) * [HUDI-3504] Support bootstrap command based on Call Produce Command (apache#5977) * [HUDI-4311] Fix Flink lose data on some rollback scene (apache#5950) * [HUDI-4291] Fix flaky TestCleanPlanExecutor#testKeepLatestFileVersions (apache#5930) * [HUDI-3506] Add call procedure for CommitsCommand (apache#5974) * [HUDI-3506] Add call procedure for CommitsCommand Co-authored-by: superche <superche@tencent.com> * [HUDI-4325] fix spark sql procedure cause ParseException with semicolon (apache#5982) * [HUDI-4325] fix saprk sql procedure cause ParseException with semicolon * [HUDI-4333] fix HoodieFileIndex's listFiles method log print skipping percent NaN (apache#5990) * [HUDI-4332] The current instant may be wrong under some extreme conditions in AppendWriteFunction. (apache#5988) * [HUDI-4320] Make sure `HoodieStorageConfig.PARQUET_WRITE_LEGACY_FORMAT_ENABLED` could be specified by the writer (apache#5970) Fixed sequence determining whether Parquet's legacy-format writing property should be overridden to only kick in when it has not been explicitly specified by the caller * [HUDI-1176] Upgrade hudi to log4j2 (apache#5366) * Move to log4j2 cr: https://code.amazon.com/reviews/CR-71010705 * Upgrade unit tests to log4j2 * update exclusion Co-authored-by: Brandon Scheller <bschelle@amazon.com> * [HUDI-4334] close SparkRDDWriteClient after usage in Create/Delete/RollbackSavepointsProcedure (apache#5994) * [HUDI-1575] Claim RFC-56: Early Conflict Detection For Multi-writer (apache#6002) Co-authored-by: yuezhang <yuezhang@yuezhang-mac.freewheelmedia.net> * [MINOR] Make CLI 'commit rollback' using rollbackUsingMarkers false as default (apache#5174) Co-authored-by: yuezhang <yuezhang@freewheel.tv> * [HUDI-4331] Allow loading external config file from class loader (apache#5987) Co-authored-by: Wenning Ding <wenningd@amazon.com> * [HUDI-4336] Fix records overwritten bug with binary primary key (apache#5996) * [MINOR] Following apache#2070, Fix BindException when running tests on shared machines. (apache#5951) * [HUDI-4346] Fix params not update BULKINSERT_ARE_PARTITIONER_RECORDS_SORTED (apache#5999) * [HUDI-4285] add ByteBuffer#rewind after ByteBuffer#get in AvroDeseria… (apache#5907) * [HUDI-4285] add ByteBuffer#rewind after ByteBuffer#get in AvroDeserializer * add ut Co-authored-by: wangzixuan.wzxuan <wangzixuan.wzxuan@bytedance.com> * [HUDI-3984] Remove mandatory check of partiton path for cli command (apache#5458) * [HUDI-3634] Could read empty or partial HoodieCommitMetaData in downstream if using HDFS (apache#5048) Add the differentiated logic of creating immutable file in HDFS by first creating the file.tmp and then renaming the file * [HUDI-3953]Flink Hudi module should support low-level source and sink api (apache#5445) Co-authored-by: jerryyue <jerryyue@didiglobal.com> * [HUDI-4353] Column stats data skipping for flink (apache#6026) * [HUDI-3505] Add call procedure for UpgradeOrDowngradeCommand (apache#6012) Co-authored-by: superche <superche@tencent.com> * [HUDI-3730] Improve meta sync class design and hierarchies (apache#5854) * [HUDI-3730] Improve meta sync class design and hierarchies (apache#5754) * Implements class design proposed in RFC-55 Co-authored-by: jian.feng <fengjian428@gmial.com> Co-authored-by: jian.feng <jian.feng@shopee.com> * [HUDI-3511] Add call procedure for MetadataCommand (apache#6018) * [HUDI-3730] Add ConfigTool#toMap UT (apache#6035) Co-authored-by: voonhou.su <voonhou.su@shopee.com> * [MINOR] Improve variable names (apache#6039) * [HUDI-3116]Add a new HoodieDropPartitionsTool to let users drop table partitions through a standalone job. (apache#4459) Co-authored-by: yuezhang <yuezhang@freewheel.tv> * [HUDI-4360] Fix HoodieDropPartitionsTool based on refactored meta sync (apache#6043) * [HUDI-3836] Improve the way of fetching metadata partitions from table (apache#5286) Co-authored-by: xicm <xicm@asiainfo.com> * [HUDI-4359] Support show_fs_path_detail command on Call Produce Command (apache#6042) * [HUDI-4356] Fix the error when sync hive in CTAS (apache#6029) * [HUDI-4219] Merge Into when update expression "col=s.col+2" on precombine cause exception (apache#5828) * [HUDI-4357] Support flink 1.15.x (apache#6050) * [HUDI-4152] Flink offline compaction support compacting multi compaction plan at once (apache#5677) * [HUDI-4152] Flink offline compaction allow compact multi compaction plan at once * [HUDI-4152] Fix exception for duplicated uid when multi compaction plan are compacted * [HUDI-4152] Provider UT & IT for compact multi compaction plan * [HUDI-4152] Put multi compaction plans into one compaction plan source * [HUDI-4152] InstantCompactionPlanSelectStrategy allow multi instant by using comma * [HUDI-4152] Add IT for InstantCompactionPlanSelectStrategy * [HUDI-4309] fix spark32 repartition error (apache#6033) * [HUDI-4366] Synchronous cleaning for flink bounded source (apache#6051) * [minor] following 4152, refactor the clazz about plan selection strategy (apache#6060) * [HUDI-4367] Support copyToTable on call (apache#6054) * [HUDI-4335] Bug fixes in AWSGlueCatalogSyncClient post schema evolution. (apache#5995) * fix for updateTableParameters which is not excluding partition columns and updateTableProperties boolean check * Fix - serde parameters getting overrided on table property update * removing stale syncConfig * [HUDI-4276] Reconcile schema-inject null values for missing fields and add new fields (apache#6017) * [HUDI-4276] Reconcile schema-inject null values for missing fields and add new fields. * fix comments Co-authored-by: public (bdcee5037027) <mengtao0326@qq.com> * [HUDI-3500] Add call procedure for RepairsCommand (apache#6053) * [HUDI-2150] Rename/Restructure configs for better modularity (apache#6061) - Move clean related configuration to HoodieCleanConfig - Move Archival related configuration to HoodieArchivalConfig - hoodie.compaction.payload.class move this to HoodiePayloadConfig * [MINOR] Bump xalan from 2.7.1 to 2.7.2 (apache#6062) Bumps xalan from 2.7.1 to 2.7.2. --- updated-dependencies: - dependency-name: xalan:xalan dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [HUDI-4324] Remove use_jdbc config from hudi sync (apache#6072) * [HUDI-4324] Remove use_jdbc config from hudi sync * Users should use HIVE_SYNC_MODE instead * [HUDI-3730][RFC-55] Improve hudi-sync classes design and simplify configs (apache#5695) * [HUDI-4146] RFC for Improve Hive/Meta sync class design and hierarchies Co-authored-by: jian.feng <jian.feng@shopee.com> Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com> * [HUDI-4323] Make database table names optional in sync tool (apache#6073) * [HUDI-4323] Make database table names optional in sync tool * Infer from these properties from the table config * [MINOR] Update RFCs status (apache#6078) * [HUDI-4298] When reading the mor table with QUERY_TYPE_SNAPSHOT,Unabl… (apache#5937) * [HUDI-4298] Add test case for reading mor table Signed-off-by: LinMingQiang <1356469429@qq.com> * [HUDI-4379] Bump Flink versions to 1.14.5 and 1.15.1 (apache#6080) * [HUDI-4391] Incremental read from archived commits for flink (apache#6096) * [RFC-51] [HUDI-3478] Hudi to support Change-Data-Capture (apache#5436) Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com> * [HUDI-4393] Add marker file for target file when flink merge handle rolls over (apache#6103) * [HUDI-4399][RFC-57] Claim RFC 57 for DeltaStreamer proto support (apache#6112) * [HUDI-4397] Flink Inline Cluster and Compact plan distribute strategy changed from rebalance to hash to avoid potential multiple threads accessing the same file (apache#6106) Co-authored-by: jerryyue <jerryyue@didiglobal.com> * [MINOR] Disable TestHiveSyncGlobalCommitTool (apache#6119) * [HUDI-4403] Fix the end input metadata for bounded source (apache#6116) * [HUDI-4408] Reuse old rollover file as base file for flink merge handle (apache#6120) * [HUDI-3503] Add call procedure for CleanCommand (apache#6065) * [HUDI-3503] Add call procedure for CleanCommand Co-authored-by: simonssu <simonssu@tencent.com> * [HUDI-4249] Fixing in-memory `HoodieData` implementation to operate lazily (apache#5855) * [HUDI-4170] Make user can use hoodie.datasource.read.paths to read necessary files (apache#5722) * Rebase codes * Move listFileSlices to HoodieBaseRelation * Fix review * Fix style * Fix bug * Remove a few files that were removed in upstream master * Fix build issues Co-authored-by: KnightChess <981159963@qq.com> Co-authored-by: Danny Chan <yuzhao.cyz@gmail.com> Co-authored-by: huberylee <shibei.lh@foxmail.com> Co-authored-by: watermelon12138 <49849410+watermelon12138@users.noreply.github.com> Co-authored-by: y00617041 <yangxuan42@huawei.com> Co-authored-by: Ibson <pushengli@163.com> Co-authored-by: pusheng.li01 <pusheng.li01@liulishuo.com> Co-authored-by: LiChuang <64473732+CodeCooker17@users.noreply.github.com> Co-authored-by: Gary Li <yanjia.gary.li@gmail.com> Co-authored-by: 吴祥平 <408317717@qq.com> Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com> Co-authored-by: xicm <36392121+xicm@users.noreply.github.com> Co-authored-by: xicm <xicm@asiainfo.com> Co-authored-by: Wangyh <763941163@qq.com> Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com> Co-authored-by: Todd Gao <todd.gao.2013@gmail.com> Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com> Co-authored-by: qianchutao <72595723+qianchutao@users.noreply.github.com> Co-authored-by: guanziyue <30882822+guanziyue@users.noreply.github.com> Co-authored-by: Jin Xing <jinxing.corey@gmail.com> Co-authored-by: Sivabalan Narayanan <n.siva.b@gmail.com> Co-authored-by: cxzl25 <cxzl25@users.noreply.github.com> Co-authored-by: BruceLin <brucekellan@gmail.com> Co-authored-by: ForwardXu <forwardxu315@gmail.com> Co-authored-by: aliceyyan <104287562+aliceyyan@users.noreply.github.com> Co-authored-by: aliceyyan <aliceyyan@tencent.com> Co-authored-by: Lanyuanxiaoyao <lanyuanxiaoyao@gmail.com> Co-authored-by: Alexey Kudinkin <alexey@infinilake.com> Co-authored-by: YueZhang <69956021+zhangyue19921010@users.noreply.github.com> Co-authored-by: yuezhang <yuezhang@freewheel.tv> Co-authored-by: Bo Cui <cuibo0108@163.com> Co-authored-by: Xingcan Cui <xcui@wealthsimple.com> Co-authored-by: wqwl611 <67826098+wqwl611@users.noreply.github.com> Co-authored-by: wqwl611 <wqwl611@gmail.com> Co-authored-by: 董可伦 <dongkelun01@inspur.com> Co-authored-by: 陈浩 <bettermouse94@gmail.com> Co-authored-by: Yuwei XIAO <ywxiaozero@gmail.com> Co-authored-by: Shawy Geng <gengxiaoyu1996@gmail.com> Co-authored-by: gengxiaoyu <gengxiaoyu@bytedance.com> Co-authored-by: luokey <854194341@qq.com> Co-authored-by: Zhaojing Yu <yuzhaojing@bytedance.com> Co-authored-by: wangxianghu <wangxianghu@apache.org> Co-authored-by: uday08bce <uday08bce@gmail.com> Co-authored-by: YuangZhang <z_yuang@foxmail.com> Co-authored-by: zhangyuang <zhangyuang@corp.netease.com> Co-authored-by: felixYyu <felix2003@live.cn> Co-authored-by: Heap <35054152+h1ap@users.noreply.github.com> Co-authored-by: liujinhui <965147871@qq.com> Co-authored-by: luoyajun <luoyajun1010@gmail.com> Co-authored-by: 冯健 <fengjian428@gmail.com> Co-authored-by: RexAn <anh131@126.com> Co-authored-by: komao <masterwangzx@gmail.com> Co-authored-by: wangzixuan.wzxuan <wangzixuan.wzxuan@bytedance.com> Co-authored-by: Rex An <bonean131@gmail.com> Co-authored-by: Carter Shanklin <cartershanklin@users.noreply.github.com> Co-authored-by: 苏承祥 <scx_white@aliyun.com> Co-authored-by: 苏承祥 <sucx@tuya.com> Co-authored-by: Kumud Kumar Srivatsava Tirupati <kumudkumartirupati@users.noreply.github.com> Co-authored-by: Qi Ji <qjqqyy@users.noreply.github.com> Co-authored-by: leesf <490081539@qq.com> Co-authored-by: Nicolas Paris <nicolas.paris@riseup.net> Co-authored-by: Saisai Shao <sai.sai.shao@gmail.com> Co-authored-by: marchpure <marchpure@126.com> Co-authored-by: HunterXHunter <1356469429@qq.com> Co-authored-by: john.wick <john.wick@vipshop.com> Co-authored-by: liuzhuang2017 <95120044+liuzhuang2017@users.noreply.github.com> Co-authored-by: sandyfog <154525105@qq.com> Co-authored-by: yanenze <34880077+yanenze@users.noreply.github.com> Co-authored-by: yanenze <yanenze@keytop.com.cn> Co-authored-by: superche <73096722+hechao-ustc@users.noreply.github.com> Co-authored-by: superche <superche@tencent.com> Co-authored-by: 5herhom <35916131+5herhom@users.noreply.github.com> Co-authored-by: Shizhi Chen <107476116+chenshzh@users.noreply.github.com> Co-authored-by: chenshizhi <chenshizhi@bilibili.com> Co-authored-by: Alexander Trushev <42293632+trushev@users.noreply.github.com> Co-authored-by: Forus <70357858+Forus0322@users.noreply.github.com> Co-authored-by: Shawn Chang <42792772+CTTY@users.noreply.github.com> Co-authored-by: Shawn Chang <yxchang@amazon.com> Co-authored-by: jiz <31836510+microbearz@users.noreply.github.com> Co-authored-by: jiimmyzhan <jiimmyzhan@tencent.com> Co-authored-by: zhanshaoxiong <shaoxiong0001@@gmail.com> Co-authored-by: xiarixiaoyao <mengtao0326@qq.com> Co-authored-by: bschell <bdscheller@gmail.com> Co-authored-by: Brandon Scheller <bschelle@amazon.com> Co-authored-by: Teng <teng_huo@outlook.com> Co-authored-by: yuezhang <yuezhang@yuezhang-mac.freewheelmedia.net> Co-authored-by: wenningd <wenningding95@gmail.com> Co-authored-by: Wenning Ding <wenningd@amazon.com> Co-authored-by: miomiocat <284487410@qq.com> Co-authored-by: JerryYue-M <272614347@qq.com> Co-authored-by: jerryyue <jerryyue@didiglobal.com> Co-authored-by: jian.feng <fengjian428@gmial.com> Co-authored-by: jian.feng <jian.feng@shopee.com> Co-authored-by: voonhous <voonhousu@gmail.com> Co-authored-by: voonhou.su <voonhou.su@shopee.com> Co-authored-by: shenjiayu17 <54424149+shenjiayu17@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Luning (Lucas) Wang <rsl4@foxmail.com> Co-authored-by: Yann Byron <biyan900116@gmail.com> Co-authored-by: Tim Brown <tim.brown126@gmail.com> Co-authored-by: simonsssu <barley0806@gmail.com> Co-authored-by: rmahindra123 <rmahindra@Rajeshs-MacBook-Pro.local>

guanziyue force-pushed the HUDI-2875 branch from c0d4c1c to b561916 Compare December 9, 2021 05:36

yihua changed the title ~~[HUDI-2785] Make HoodieParquetWriter Thread safe and memory executor …~~ [HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor … Dec 10, 2021

guanziyue force-pushed the HUDI-2875 branch 2 times, most recently from 74beb1e to b5c5823 Compare December 22, 2021 10:04

vinothchandar self-assigned this Dec 25, 2021

nsivabalan added the priority:critical production down; pipelines stalled; Need help asap. label Feb 8, 2022

nsivabalan reviewed Feb 9, 2022

View reviewed changes

guanziyue force-pushed the HUDI-2875 branch from b5c5823 to 6f55461 Compare February 18, 2022 08:47

nsivabalan assigned yihua and nsivabalan Feb 25, 2022

yihua reviewed Mar 10, 2022

View reviewed changes

alexeykudinkin requested changes Mar 10, 2022

View reviewed changes

alexeykudinkin reviewed Mar 10, 2022

View reviewed changes

guanziyue force-pushed the HUDI-2875 branch from 6f55461 to 4a9c787 Compare March 11, 2022 06:07

guanziyue force-pushed the HUDI-2875 branch from 4a9c787 to 4a3662c Compare March 11, 2022 06:50

vinothchandar reviewed Mar 11, 2022

View reviewed changes

vinothchandar removed their assignment Mar 11, 2022

guanziyue force-pushed the HUDI-2875 branch from 4a3662c to a2c46ca Compare March 11, 2022 17:06

guanziyue force-pushed the HUDI-2875 branch 4 times, most recently from ffa6d1e to c74c41d Compare March 12, 2022 22:04

alexeykudinkin reviewed Mar 14, 2022

View reviewed changes

nsivabalan unassigned yihua Mar 16, 2022

guanziyue force-pushed the HUDI-2875 branch from c74c41d to ab3f9ad Compare March 20, 2022 08:51

nsivabalan assigned alexeykudinkin and unassigned nsivabalan Mar 20, 2022

alexeykudinkin approved these changes Mar 21, 2022

View reviewed changes

nsivabalan force-pushed the HUDI-2875 branch from 1877192 to 6c01b6b Compare March 29, 2022 01:15

guanziyue added 3 commits March 30, 2022 00:31

[HUDI-2875] Make memory executor exit gracefully. And fix concurrent …

b02ff26

…call of HoodieWriteHandle

rebase master and change the UT

bd04a51

Add a method to generate generic record in test data gen

b1b66d9

guanziyue force-pushed the HUDI-2875 branch from 6c01b6b to b1b66d9 Compare March 29, 2022 16:33

guanziyue requested a review from nsivabalan March 30, 2022 02:47

yihua merged commit abb4893 into apache:master May 5, 2022

cdmikechen pushed a commit to cdmikechen/hudi that referenced this pull request May 13, 2022

[HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor …

4404021

…exit gracefully (apache#4264)

yihua pushed a commit to yihua/hudi that referenced this pull request Jun 3, 2022

[HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor …

cde187f

…exit gracefully (apache#4264)

alexeykudinkin mentioned this pull request Nov 19, 2022

[HUDI-5238] Fixing HoodieMergeHandle shutdown sequence #7245

Merged

4 tasks

boneanxs reviewed Mar 29, 2023

View reviewed changes

[HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor … #4264

[HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor … #4264

Conversation

guanziyue commented Dec 9, 2021 • edited Loading

Tips

What is the purpose of the pull request

Brief change log

Verify this pull request

Committer checklist

vinothchandar commented Dec 15, 2021

guanziyue commented Dec 15, 2021 • edited Loading

nsivabalan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guanziyue Feb 18, 2022 • edited Loading

Choose a reason for hiding this comment

nsivabalan commented Feb 9, 2022

nsivabalan commented Feb 11, 2022

guanziyue commented Feb 18, 2022

nsivabalan commented Feb 25, 2022

yihua left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexeykudinkin commented Mar 10, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guanziyue commented Mar 11, 2022

vinothchandar left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guanziyue commented Mar 11, 2022

alexeykudinkin commented Mar 11, 2022

guanziyue commented Mar 13, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nsivabalan commented Mar 29, 2022

guanziyue commented Mar 29, 2022

hudi-bot commented Mar 30, 2022

CI report:

alexeykudinkin commented May 5, 2022

Choose a reason for hiding this comment

guanziyue commented Dec 9, 2021 •

edited

Loading

guanziyue commented Dec 15, 2021 •

edited

Loading

guanziyue Feb 18, 2022 •

edited

Loading

alexeykudinkin commented Mar 10, 2022 •

edited

Loading

vinothchandar left a comment •

edited

Loading