Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(aws-glue-alpha): (struct schema produces unsupported inputStrings) #26935

Open
nihakue opened this issue Aug 30, 2023 · 2 comments
Open

(aws-glue-alpha): (struct schema produces unsupported inputStrings) #26935

nihakue opened this issue Aug 30, 2023 · 2 comments
Labels
@aws-cdk/aws-glue Related to AWS Glue bug This issue is a bug. effort/medium Medium work item – several days of effort p2

Comments

@nihakue
Copy link

nihakue commented Aug 30, 2023

Describe the bug

Regarding this line: https://github.com/aws/aws-cdk/blame/main/packages/%40aws-cdk/aws-glue-alpha/lib/schema.ts#L209

As far as I can tell, this will happily create invalid inputStrings for nested structs:

const nested = Schema.struct([
  {
    name: "name",
    comment: "The name of the thing",
    type: Schema.STRING
  },
  {
    name: "url",
    type: Schema.STRING
  }
])
{
  name: "some_nested_struct",
  type: nested
}

Will generate the following inputString for the nested struct:

struct<name:string COMMENT 'The name of the thing',url:string>

If you create a Glue table with this in the schema, athena will throw an error whenever you try to query the table:

HIVE_INVALID_METADATA: Glue table 'db.table' column 'some_nested_struct' has invalid data type: struct<name:string COMMENT 'The name of the thing',url:string>
...

From what I can tell, 'COMMENT' is not supported in nested structs. If I try to manually create a schema in a fresh glue table, adding "COMMENT" to the inputString of a nested string causes Glue to treat the type as 'unknown'

For example, before the COMMENT I can inspect the schema and see its type:

{
  "some_nested_struct": {
    "name": "string",
    "url": "string"
  }
}

But if I add the comment and inspect the type of the column I see:

{
  "some_nested_struct": {
    "name": {
      "unknown": "STRUCT <\n  name: STRING COMMENT 'some comment',\n  url: STRING\n>"
    },
    "url": "string"
  }
}

Expected Behavior

Ideally Glue would support nested comments (or at worst ignore them), but the CDK construct should at least not generate input strings that are guaranteed to not work.

Current Behavior

See description

Reproduction Steps

See description

Possible Solution

See expected behavior

Additional Information/Context

No response

CDK CLI Version

2.87.0 (build 9fca790)

Framework Version

No response

Node.js Version

v18.16.0

OS

AL2

Language

Typescript

Language Version

5.1.3

Other information

No response

@nihakue nihakue added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Aug 30, 2023
@github-actions github-actions bot added the @aws-cdk/aws-glue Related to AWS Glue label Aug 30, 2023
@pahud
Copy link
Contributor

pahud commented Aug 30, 2023

Thank you for the report. Are you able to provide a smallest code snippet that we can reproduce this in our environment?

@pahud pahud added p2 effort/medium Medium work item – several days of effort and removed needs-triage This issue or PR still needs to be triaged. labels Aug 30, 2023
@nihakue
Copy link
Author

nihakue commented Aug 30, 2023

import { Schema, Type } from "@aws-cdk/aws-glue-alpha";

function buildBrokenSchema() {
  const nested = Schema.struct([
    {
      name: "name",
      comment: "The name of the thing",
      type: Schema.STRING,
    },
    {
      name: "url",
      type: Schema.STRING,
    },
  ]);
  console.log(nested.inputString);
}

buildBrokenSchema();

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-glue Related to AWS Glue bug This issue is a bug. effort/medium Medium work item – several days of effort p2
Projects
None yet
Development

No branches or pull requests

2 participants