Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-6804] Fix hive read schema evolution MOR table #9573

Merged
merged 2 commits into from
Sep 5, 2023

Conversation

Zouxxyy
Copy link
Contributor

@Zouxxyy Zouxxyy commented Aug 30, 2023

Change Logs

Current hive read schema evolution MOR table will error, e.g. :

-- spark-sql
set hoodie.schema.on.read.enable=true;
create table if not exists hudi_mor_test_tbl (
  id   bigint,
  name string,
  num  int,
  ts   bigint,
  ds   string
) using hudi 
tblproperties (
  type = 'mor',
  primaryKey = 'id',
  preCombineField = 'ts'
 )
partitioned by (ds);

insert into hudi_mor_test_tbl partition(ds = '20211211') select 1, 'a1', 1000,100;
update hudi_mor_test_tbl set name = 'a2' where id = 1;
alter table hudi_mor_test_tbl rename column name to name_new; 

-- hive
select id,name_new from hudi_mor_test_tbl_rt;
Failed with exception java.io.IOException:java.lang.ArrayIndexOutOfBoundsException: 25

Impact

Fix above

Risk level (write none, low medium or high below)

low

Documentation Update

None

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

Copy link
Contributor

@danny0405 danny0405 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@xiarixiaoyao
Copy link
Contributor

@Zouxxyy
thanks for your fix.
could you pls point out the specific reason for the error? Thank you
i cannot reproduce this problem on hive 3.1.1 (hudi 0.11) with my cluster

@Zouxxyy
Copy link
Contributor Author

Zouxxyy commented Aug 30, 2023

thanks for your fix. could you pls point out the specific reason for the error? Thank you i cannot reproduce this problem on hive 3.1.1 (hudi 0.11) with my cluster

The core change is internalSchemaOption = Option.of(prunedInternalSchema); You can remove it and then run the ut add in this patch

hudi 0.11 may not have this patch #6989 and #6358

  /**
   * Get final Read Schema for support evolution.
   * step1: find the fileSchema for current dataBlock.
   * step2: determine whether fileSchema is compatible with the final read internalSchema.
   * step3: merge fileSchema and read internalSchema to produce final read schema.
   *
   * @param dataBlock current processed block
   * @return final read schema.
   */
  private Option<Pair<Function<HoodieRecord, HoodieRecord>, Schema>> composeEvolvedSchemaTransformer(
      HoodieDataBlock dataBlock) {
    if (internalSchema.isEmptySchema()) {
      return Option.empty();
    }

    long currentInstantTime = Long.parseLong(dataBlock.getLogBlockHeader().get(INSTANT_TIME));
    InternalSchema fileSchema = InternalSchemaCache.searchSchemaAndCache(currentInstantTime,
        hoodieTableMetaClient, false);
    InternalSchema mergedInternalSchema = new InternalSchemaMerger(fileSchema, internalSchema,
        true, false).mergeSchema();
    Schema mergedAvroSchema = AvroInternalSchemaConverter.convert(mergedInternalSchema, readerSchema.getFullName());

    return Option.of(Pair.of((record) -> {
      return record.rewriteRecordWithNewSchema(
          dataBlock.getSchema(),
          this.hoodieTableMetaClient.getTableConfig().getProps(),
          mergedAvroSchema,
          Collections.emptyMap());
    }, mergedAvroSchema));
  }

@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@Zouxxyy Zouxxyy closed this Sep 1, 2023
@Zouxxyy Zouxxyy reopened this Sep 1, 2023
@danny0405 danny0405 merged commit 31bc565 into apache:master Sep 5, 2023
27 checks passed
leosanqing pushed a commit to leosanqing/hudi that referenced this pull request Sep 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

4 participants