Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Struct columns are returning null when querying Iceberg tables #2721

Open
engechas opened this issue Jun 6, 2024 · 1 comment
Open
Labels
bug Something isn't working

Comments

@engechas
Copy link

engechas commented Jun 6, 2024

What is the bug?
Iceberg table support was recently added. When querying tables that contain struct type columns, the values of these columns is being returned as null.

{
  "metadata": "null",
  "time": 1717691198000,
  "time_dt": "2024-06-06T16:26:38.000Z",
  "cloud": "null",
  "api": "null",
  "dst_endpoint": "null",
  "actor": "null",
  "http_request": "null",
  "src_endpoint": "null",
  "session": "null",
  "policy": "null",
  "resources": "null",
  "class_name": "API Activity",
  "class_uid": 6003,
  "category_name": "Application Activity",
  "category_uid": 6,
  "severity_id": 1,
  "severity": "Informational",
  "user": "null",
  "activity_name": "Invoke",
  "activity_id": 99,
  "type_uid": 600399,
  "type_name": "API Activity: Other",
  "status": "Success",
  "is_mfa": "null",
  "unmapped": "{REDACTED}",
  "accountid": "REDACTED",
  "region": "us-east-1",
  "observables": "null"
}

Schema for the table

{
            "Columns": [
                {
                    "Name": "metadata",
                    "Type": "struct<product:struct<version:string,name:string,vendor_name:string,feature:struct<name:string>>,uid:string,profiles:array<string>,version:string>"
                },
                {
                    "Name": "time",
                    "Type": "bigint"
                },
                {
                    "Name": "cloud",
                    "Type": "struct<region:string,provider:string>"
                },
                {
                    "Name": "api",
                    "Type": "struct<response:struct<error:string,message:string>,operation:string,version:string,service:struct<name:string>,request:struct<uid:string>>"
                },
                {
                    "Name": "dst_endpoint",
                    "Type": "struct<svc_name:string>"
                },
                {
                    "Name": "actor",
                    "Type": "struct<user:struct<type:string,name:string,uid:string,uuid:string,account_uid:string,credential_uid:string>,session:struct<created_time:bigint,mfa:boolean,issuer:string>,invoked_by:string,idp:struct<name:string>>"
                },
                {
                    "Name": "http_request",
                    "Type": "struct<user_agent:string>"
                },
                {
                    "Name": "src_endpoint",
                    "Type": "struct<uid:string,ip:string,domain:string>"
                },
                {
                    "Name": "resources",
                    "Type": "array<struct<uid:string,account_uid:string,type:string>>"
                },
                {
                    "Name": "class_name",
                    "Type": "string"
                },
                {
                    "Name": "class_uid",
                    "Type": "int"
                },
                {
                    "Name": "category_name",
                    "Type": "string"
                },
                {
                    "Name": "category_uid",
                    "Type": "int"
                },
                {
                    "Name": "severity_id",
                    "Type": "int"
                },
                {
                    "Name": "severity",
                    "Type": "string"
                },
                {
                    "Name": "user",
                    "Type": "struct<uid:string,uuid:string,name:string>"
                },
                {
                    "Name": "activity_name",
                    "Type": "string"
                },
                {
                    "Name": "activity_id",
                    "Type": "int"
                },
                {
                    "Name": "type_uid",
                    "Type": "int"
                },
                {
                    "Name": "type_name",
                    "Type": "string"
                },
                {
                    "Name": "status",
                    "Type": "string"
                },
                {
                    "Name": "status_id",
                    "Type": "int"
                },
                {
                    "Name": "mfa",
                    "Type": "boolean"
                },
                {
                    "Name": "unmapped",
                    "Type": "map<string,string>"
                }
            ],
            "Location": "REDACTED",
            "InputFormat": "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat",
            "OutputFormat": "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat",
            "Compressed": false,
            "NumberOfBuckets": 0,
            "SerdeInfo": {
                "SerializationLibrary": "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"
            },
            "SortColumns": [],
            "Parameters": {
                "partition_filtering.enabled": "true"
            },
            "StoredAsSubDirectories": false
        }

The same behavior is not seen when querying non-Iceberg tables. In those queries, the struct fields are successfully populated in the results.

How can one reproduce the bug?
Steps to reproduce the behavior:

  1. Create an iceberg table with struct columns
  2. Run a query against this table
  3. Validate struct columns are null

What is the expected behavior?
The struct columns should be successfully populated

What is your host/environment?

  • OS: [e.g. iOS]
  • Version [e.g. 22]
  • Plugins

Do you have any screenshots?
If applicable, add screenshots to help explain your problem.

Do you have any additional context?
Add any other context about the problem.

@engechas engechas added bug Something isn't working untriaged labels Jun 6, 2024
@penghuo
Copy link
Collaborator

penghuo commented Jun 6, 2024

Test with vanilla EMR-S + Iceberg. struct filed return null value

@penghuo penghuo removed the untriaged label Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants