[SUPPORT] spark-sql schema_evolution #7040

MihawkZoro · 2022-10-23T13:05:37Z

Environment Description

Hudi version :
0.11.1
Spark version :
3.2.2
Hive version :
2.3.9
Hadoop version :
2.7.3
Storage
hdfs

Describe the problem you faced

I have a hudi table

create table ddl_test_t2 (
  col1 string,
  col2 string,
  col3 string,
  ts bigint
) using hudi
tblproperties (
  type = 'mor',
  primaryKey = 'col1',
  preCombineField = 'ts'
);

I executed some DML and DDL for test about schema evolution

insert into ddl_test_t2 values('1','col2','col3',1),('2','col2','col3',2),('3','col2','col3',3);

ALTER TABLE ddl_test_t2 DROP COLUMN col3;
ALTER TABLE ddl_test_t2 RENAME COLUMN col2 to col3;

insert into ddl_test_t2 values('4','col2',4);

then I searched column col3 from table ddl_test_t2

select col3 from ddl_test_t2;

the result I expect is

col2
col2
col2
col2

the actual result was

col3
col3
col3
col2

I want know what is the problem and if this is a bug

The text was updated successfully, but these errors were encountered:

xiarixiaoyao · 2022-10-24T02:41:41Z

@MihawkZoro
Thank you for your test，
This is really a bug， the final write ‘insert into ddl_test_t2 values('4','col2',4);’ trigger is bug，Fix this bug as soon as possible

xiarixiaoyao · 2022-10-24T03:00:49Z

rewriteRecordWithNewSchema deal with rename failed，it should deal with rename first

MihawkZoro · 2022-10-24T03:15:34Z

@xiarixiaoyao When will this bug be fixed, we are using this feature, it is urgent

xiarixiaoyao · 2022-10-24T03:19:56Z

      spark.sql("insert into ddl_test_t2 values('1','col2','col3',1),('2','col2','col3',2),('3','col2','col3',3)")
      spark.sql("""ALTER TABLE ddl_test_t2 DROP COLUMN col3""")
      spark.sql("ALTER TABLE ddl_test_t2 RENAME COLUMN col2 to col3")
      spark.sql("insert into ddl_test_t2 values('4','col2',4)")
      spark.sql("select col3 from ddl_test_t2").show(false)

+----+
|col3|
+----+
|col2|
|col2|
|col2|
|col2|
+----+

xiarixiaoyao · 2022-10-24T03:29:29Z

@MihawkZoro schema evolution for hive and presto（mor table） can be found #6989

MihawkZoro · 2022-10-24T03:34:22Z

@xiarixiaoyao Thank you. When will the repaired official spark bundle jar be released?

xiarixiaoyao · 2022-10-25T09:24:04Z

@xiarixiaoyao Thank you. When will the repaired official spark bundle jar be released?

expect 0.13.0

xiarixiaoyao self-assigned this Oct 24, 2022

xiarixiaoyao mentioned this issue Oct 24, 2022

[HUDI-5083]Fixed a bug when schema evolution #7045

Merged

4 tasks

xushiyan added this to Hudi Issue Support Oct 24, 2022

xushiyan moved this to Triaged in Hudi Issue Support Oct 27, 2022

xushiyan added schema-and-data-types priority:critical production down; pipelines stalled; Need help asap. labels Oct 27, 2022

xushiyan linked a pull request Oct 27, 2022 that will close this issue

[HUDI-5083]Fixed a bug when schema evolution #7045

Merged

4 tasks

xiarixiaoyao closed this as completed in #7045 Oct 29, 2022

Repository owner moved this from 🏁 Triaged to ✅ Done in Hudi Issue Support Oct 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SUPPORT] spark-sql schema_evolution #7040

[SUPPORT] spark-sql schema_evolution #7040

MihawkZoro commented Oct 23, 2022

xiarixiaoyao commented Oct 24, 2022

xiarixiaoyao commented Oct 24, 2022

MihawkZoro commented Oct 24, 2022

xiarixiaoyao commented Oct 24, 2022

xiarixiaoyao commented Oct 24, 2022

MihawkZoro commented Oct 24, 2022

xiarixiaoyao commented Oct 25, 2022

[SUPPORT] spark-sql schema_evolution #7040

[SUPPORT] spark-sql schema_evolution #7040

Comments

MihawkZoro commented Oct 23, 2022

xiarixiaoyao commented Oct 24, 2022

xiarixiaoyao commented Oct 24, 2022

MihawkZoro commented Oct 24, 2022

xiarixiaoyao commented Oct 24, 2022

xiarixiaoyao commented Oct 24, 2022

MihawkZoro commented Oct 24, 2022

xiarixiaoyao commented Oct 25, 2022