Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to read timestamp fields from column statistics #1372

Closed
cmackenzie1 opened this issue May 16, 2023 · 1 comment · Fixed by #1373
Closed

Unable to read timestamp fields from column statistics #1372

cmackenzie1 opened this issue May 16, 2023 · 1 comment · Fixed by #1373
Labels
bug Something isn't working

Comments

@cmackenzie1
Copy link
Contributor

cmackenzie1 commented May 16, 2023

Environment

Delta-rs version: 0.10.0

Binding: rust

Environment:
N/A


Bug

What happened:

With a field defined as

SchemaField::new(
  "EdgeEndTimestamp".to_string(),
  SchemaDataType::primitive("timestamp".to_string()),
  true,
  HashMap::new(),
),

I get the following error when querying using datafusion.

{
  "timestamp": "2023-05-16T18:58:40.206275Z",
  "level": "WARN",
  "fields": {
    "message": "Unexpected type when parsing min/max values for EdgeEndTimestamp. Found 2023-05-16 18:56:10 +00:00",
    "log.target": "deltalake::action::parquet_read",
    "log.module_path": "deltalake::action::parquet_read",
    "log.file": "/root/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/deltalake-0.10.0/src/action/parquet_read/mod.rs",
    "log.line": 216
  },
  "target": "deltalake::action::parquet_read"
}

What you expected to happen:

Query to succeed, statistics read correctly.

How to reproduce it:

  1. define a timestamp field
  2. write data
  3. query back data

More details:

It looks like the format might have changed for timestamp fields? It no longer looks like RFC3339. I have some data written using this (maybe incorrect) format. Would be great if the fix also supports this format too

@cmackenzie1 cmackenzie1 added the bug Something isn't working label May 16, 2023
@cmackenzie1
Copy link
Contributor Author

cmackenzie1 commented May 16, 2023

Error is happening in this chunk of code when reading .checkpoint.parquet files with timestamps.

if let Ok(val) = primitive_parquet_field_to_json_value(field) {
Some(ColumnValueStat::Value(val))
} else {
log::warn!(
"Unexpected type when parsing min/max values for {}. Found {}",
field_name,
field
);
None
}
}

Using parquet-tools, the field is encoded in the parquet as such:

"Tag": "name=EdgeEndTimestamp, type=INT64, convertedtype=TIMESTAMP_MICROS, repetitiontype=OPTIONAL"

Looking at

fn primitive_parquet_field_to_json_value(field: &Field) -> Result<serde_json::Value, &'static str> {
match field {
Field::Bool(value) => Ok(json!(value)),
Field::Byte(value) => Ok(json!(value)),
Field::Short(value) => Ok(json!(value)),
Field::Int(value) => Ok(json!(value)),
Field::Long(value) => Ok(json!(value)),
Field::Float(value) => Ok(json!(value)),
Field::Double(value) => Ok(json!(value)),
Field::Str(value) => Ok(json!(value)),
Field::Decimal(decimal) => match BigInt::from_signed_bytes_be(decimal.data()).to_f64() {
Some(int) => Ok(json!(
int / (10_i64.pow((decimal.scale()).try_into().unwrap()) as f64)
)),
_ => Err("Invalid type for min/max values."),
},
Field::TimestampMillis(timestamp) => Ok(serde_json::Value::String(
convert_timestamp_millis_to_string(*timestamp)?,
)),
Field::Date(date) => Ok(serde_json::Value::String(convert_date_to_string(*date)?)),
_ => Err("Invalid type for min/max values."),
}
}
I see there is no match for Field::TimestampMicros - could that be the issue?

cmackenzie1 added a commit to cmackenzie1-contrib/delta-rs that referenced this issue May 16, 2023
…cros`

Add a match statement to handle the case when timestamp is encoded
as `TimestampMicros` in the parquet file.

Resolves delta-io#1372.
roeap pushed a commit that referenced this issue May 17, 2023
…1373)

# Description

Add a match statement to handle the case when timestamp is encoded as
`TimestampMicros` in the parquet file.

# Related Issue(s)

- resolves #1372.

# Documentation
N/A
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant