-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SUPPORT] Hudi 0.12.1 support for Spark Structured Streaming. read clustering metadata replace avro file error. Unrecognized token 'Obj^A^B^Vavro' #7375
Comments
thanks~ |
Built with the latest version on master (fd62a14) but still encountered the same issue (with Flink r/w). |
Stacktrace:
@danny0405, could you also take a look? Thanks! |
Yeah, it's a bug that was introduced in #7296, would fire a fix soon ~ |
A fix PR is fired here: #7540 |
This issue regressed in 0.13.1+ and a new PR is posted here to fix: #9711 |
Is there a possible workaround for this ? In other words how do we recover from this situation ? We are using spark structured streaming on kafka and write output to hudi (v0.13.1) on s3. Upon deleting the partial commit file (as a workaround) or moving to a previous checkpoint, we are observing even though streaming job is progressing with updated offsets, but no data is ever written to hudi. |
Describe the problem you faced
When i enable async clustering, hudi write xxx.replacecommit.requested is avro.schema. but canSkipBatch function read it file use json reader, throw Unrecognized token 'Obj^A^B^Vavro'.
How can i fixed it ?i deleted it but it also happend in next replacecommit
To Reproduce
Steps to reproduce the behavior:
spark.
sql(conf.getSql).
na.fill("").
writeStream.
format("hudi").
options( conf.getHudiConf).
option("checkpointLocation", conf.getCheckpointPath).
trigger(conf.getTrigger).
outputMode(OutputMode.Append()).
start(conf.getOutputPath(conf.getHudiTableName))
hoodie.table.name=history
hoodie.datasource.write.table.type=MERGE_ON_READ
hoodie.datasource.write.operation=insert
hoodie.datasource.write.recordkey.field=id
hoodie.datasource.write.precombine.field=version
hoodie.datasource.write.partitionpath.field=partition
hoodie.cleaner.commits.retained=3
hoodie.clustering.async.enabled=true
hoodie.clean.async=true
hoodie.parquet.max.file.size=268435456
hoodie.metrics.on=true
=> 20221204150150328.replacecommit.requested
is avro file
Caused by: org.apache.hudi.exception.HoodieIOException: Failed to parse HoodieCommitMetadata for [==>20221204152715580_
_replacecommit__REQUESTED]
Caused by: com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'Obj^A^B^Vavro': was expecting ('true', 'f
alse' or 'null')
Expected behavior
Environment Description
Hudi version :
0.12.1
Spark version :
2.4.3.2
Hive version :
no
Hadoop version :
Storage (HDFS/S3/GCS..) :
use HDFS
Running on Docker? (yes/no) :
no
Additional context
Add any other context about the problem here.
Stacktrace
Add the stacktrace of the error.
The text was updated successfully, but these errors were encountered: