Cannot read a streaming Delta table with a watermark #457
Labels
binding/rust
Issues for the Rust crate
bug
Something isn't working
good first issue
Good for newcomers
help wanted
Extra attention is needed
Environment
Delta-rs version:
rust-v0.4.0, rust-v0.4.1 (head)
Binding:
rust
Environment:
Bug
What happened: Attempting to use
delta-rs
to read a streaming Delta table with a watermark fails with "Error: Failed to apply transaction log: Invalid JSON in log record"What you expected to happen:
delta-rs
should be able to successfully read a streaming Delta table with a watermarkHow to reproduce it:
Create a streaming Delta table in Spark with a watermark:
Then, attempt to read the streaming Delta table with, e.g.,
delta-inspect
More details:
The error occurs because the streaming Delta table has a
metaData
transaction whoseschemaString
looks like this:However,
delta-rs
definesSchemaField::metadata
as typeHashMap<String, String>
(see here), so attempts to deserialize the third field above fail due to the keyspark.watermarkDelayMs
having a numeric value instead of a string.I was able to work around this issue by simply disabling deserialization of the
schemaString
(not needed for my purposes), but a real fix would require changing the definition ofSchemaField
.The text was updated successfully, but these errors were encountered: