Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SUPPORT] spark-sql schema_evolution #7040

Closed
MihawkZoro opened this issue Oct 23, 2022 · 7 comments · Fixed by #7045
Closed

[SUPPORT] spark-sql schema_evolution #7040

MihawkZoro opened this issue Oct 23, 2022 · 7 comments · Fixed by #7045
Assignees
Labels
priority:critical production down; pipelines stalled; Need help asap. schema-and-data-types

Comments

@MihawkZoro
Copy link

Environment Description

  • Hudi version :
    0.11.1
  • Spark version :
    3.2.2
  • Hive version :
    2.3.9
  • Hadoop version :
    2.7.3
  • Storage
    hdfs

Describe the problem you faced

I have a hudi table

create table ddl_test_t2 (
  col1 string,
  col2 string,
  col3 string,
  ts bigint
) using hudi
tblproperties (
  type = 'mor',
  primaryKey = 'col1',
  preCombineField = 'ts'
);

I executed some DML and DDL for test about schema evolution

insert into ddl_test_t2 values('1','col2','col3',1),('2','col2','col3',2),('3','col2','col3',3);

ALTER TABLE ddl_test_t2 DROP COLUMN col3;
ALTER TABLE ddl_test_t2 RENAME COLUMN col2 to col3;

insert into ddl_test_t2 values('4','col2',4);

then I searched column col3 from table ddl_test_t2

select col3 from ddl_test_t2;

the result I expect is

col2
col2
col2
col2

the actual result was

col3
col3
col3
col2

image

I want know what is the problem and if this is a bug

@xiarixiaoyao xiarixiaoyao self-assigned this Oct 24, 2022
@xiarixiaoyao
Copy link
Contributor

@MihawkZoro
Thank you for your test,
This is really a bug, the final write ‘insert into ddl_test_t2 values('4','col2',4);’ trigger is bug,Fix this bug as soon as possible

@xiarixiaoyao
Copy link
Contributor

rewriteRecordWithNewSchema deal with rename failed,it should deal with rename first

@MihawkZoro
Copy link
Author

@xiarixiaoyao When will this bug be fixed, we are using this feature, it is urgent

@xiarixiaoyao
Copy link
Contributor

already fix local, let me raise a pr
spark.sql("set hoodie.schema.on.read.enable=true")
spark.sql("""create table ddl_test_t2 (
| col1 string,
| col2 string,
| col3 string,
| ts bigint
|) using hudi
|tblproperties (
| type = 'mor',
| primaryKey = 'col1',
| preCombineField = 'ts'
|)""".stripMargin)

      spark.sql("insert into ddl_test_t2 values('1','col2','col3',1),('2','col2','col3',2),('3','col2','col3',3)")
      spark.sql("""ALTER TABLE ddl_test_t2 DROP COLUMN col3""")
      spark.sql("ALTER TABLE ddl_test_t2 RENAME COLUMN col2 to col3")
      spark.sql("insert into ddl_test_t2 values('4','col2',4)")
      spark.sql("select col3 from ddl_test_t2").show(false)

+----+
|col3|
+----+
|col2|
|col2|
|col2|
|col2|
+----+

@xiarixiaoyao
Copy link
Contributor

@MihawkZoro schema evolution for hive and presto(mor table) can be found #6989

@MihawkZoro
Copy link
Author

@xiarixiaoyao Thank you. When will the repaired official spark bundle jar be released?

@xiarixiaoyao
Copy link
Contributor

@xiarixiaoyao Thank you. When will the repaired official spark bundle jar be released?

expect 0.13.0

@xushiyan xushiyan moved this to Triaged in Hudi Issue Support Oct 27, 2022
@xushiyan xushiyan added schema-and-data-types priority:critical production down; pipelines stalled; Need help asap. labels Oct 27, 2022
@xushiyan xushiyan linked a pull request Oct 27, 2022 that will close this issue
4 tasks
Repository owner moved this from 🏁 Triaged to ✅ Done in Hudi Issue Support Oct 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority:critical production down; pipelines stalled; Need help asap. schema-and-data-types
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants