Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] FE restart failed when replaying alter job #50101

Closed
gengjun-git opened this issue Aug 21, 2024 · 0 comments · Fixed by #50100
Closed

[BUG] FE restart failed when replaying alter job #50101

gengjun-git opened this issue Aug 21, 2024 · 0 comments · Fixed by #50100
Assignees
Labels
type/bug Something isn't working

Comments

@gengjun-git
Copy link
Contributor

Steps to reproduce the behavior (Required)

  1. create a table
CREATE TABLE aggregate_table_with_null (k1 decimal(7,2),k2 int,k3 decimal(30,17),v1 int sum,v2 int min,v3 int max) AGGREGATE KEY (k1, k2, k3) DISTRIBUTED BY HASH(k1, k2, k3) BUCKETS 32 PROPERTIES ( "replication_num" = "3", "in_memory" = "false", "storage_format" = "V2" );
  1. create a mv on that table
create materialized view aggregate_table_with_null_mv as select k1, sum(v1), min(v2), max(v3) from aggregate_table_with_null group by k1;
  1. restart FE immediately after step 2.

  2. wait for the completion of the alter job.

  3. restart FE.

Expected behavior (Required)

step 5 should succeed.

Real behavior (Required)

 com.starrocks.journal.JournalInconsistentException: failed to load journal type 121
        at com.starrocks.persist.EditLog.loadJournal(EditLog.java:1349)
        at com.starrocks.server.GlobalStateMgr.replayJournalInner(GlobalStateMgr.java:2472)
        at com.starrocks.server.GlobalStateMgr.replayJournal(GlobalStateMgr.java:2421)
        at com.starrocks.leader.Checkpoint.replayAndGenerateGlobalStateMgrImage(Checkpoint.java:208)
        at com.starrocks.leader.Checkpoint.createImage(Checkpoint.java:193)
        at com.starrocks.leader.Checkpoint.runAfterCatalogReady(Checkpoint.java:108)
        at com.starrocks.common.util.FrontendDaemon.runOneCycle(FrontendDaemon.java:72)
        at com.starrocks.common.util.Daemon.run(Daemon.java:107)
Caused by: java.lang.NullPointerException: 12136
        at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:921)
        at com.starrocks.alter.RollupJobV2.onFinished(RollupJobV2.java:699)
        at com.starrocks.alter.RollupJobV2.replayFinished(RollupJobV2.java:858)
        at com.starrocks.alter.RollupJobV2.replay(RollupJobV2.java:893)
        at com.starrocks.alter.AlterHandler.replayAlterJobV2(AlterHandler.java:204)
        at com.starrocks.alter.MaterializedViewHandler.replayAlterJobV2(MaterializedViewHandler.java:912)
        at com.starrocks.persist.EditLog.loadJournal(EditLog.java:857)

StarRocks version (Required)

  • 3.1+
@gengjun-git gengjun-git added the type/bug Something isn't working label Aug 21, 2024
gengjun-git added a commit that referenced this issue Aug 22, 2024
## What I'm doing:
The root cause of this bug is that the span attribute is null after restarting FE, causing NPE to be thrown when calling span. Add a default constructor to each subclass of AlterJobV2 so that span will be set to a default value after deserialization.

Fixes #50101

Co-authored-by: gengjun-git <gengjun@starrocks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant