Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(aws-glue): what format to use w/ cloudfront #16695

Closed
wants to merge 4 commits into from

Conversation

naseemkullah
Copy link
Contributor


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@gitpod-io
Copy link

gitpod-io bot commented Sep 28, 2021

@mergify
Copy link
Contributor

mergify bot commented Sep 28, 2021

Title does not follow the guidelines of Conventional Commits. Please adjust title before merge.

@naseemkullah naseemkullah changed the title (glue) fix doc regarding what format to use w/ cloudfront docs(glue): what format to use w/ cloudfront Sep 28, 2021
@naseemkullah naseemkullah changed the title docs(glue): what format to use w/ cloudfront docs(aws-glue): what format to use w/ cloudfront Sep 28, 2021
@kaizencc kaizencc self-assigned this Oct 18, 2021
@peterwoodworth peterwoodworth changed the title docs(aws-glue): what format to use w/ cloudfront docs(aws-glue): what format to use w/ cloudfront Oct 21, 2021
@github-actions github-actions bot added the @aws-cdk/aws-glue Related to AWS Glue label Oct 21, 2021
Copy link
Contributor

@kaizencc kaizencc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @naseemkullah! Can you provide some documentation as to how you know that this is the correct approach for CloudFront logs? It seems like the main difference is the type of the property serializationLibrary.

At the end of the day, what I'd like to do here is to provide a static method like DataFormat.CLOUDFRONT_LOGS instead of appending this functionality on to other methods. Does that sound like something you'd like to add?

packages/@aws-cdk/aws-glue/lib/data-format.ts Outdated Show resolved Hide resolved
@@ -304,6 +304,10 @@ export class DataFormat {
/**
* DataFormat for TSV (Tab-Separated Values)
*
* @remarks Also works for CloudFront logs when supplied with
* additional SerdeInfo parameters (`field.delim` and
* `serialization.format` set to "\t")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't clear to me where you are suggesting to put the additional SerdeInfo parameters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! My next question is, where does field.delim and serialization.format come from? How can I supply the additional parameters? I don't have a ton of context in this space so you're going to have to make sure your documentation makes sense to those who are learning how to use glue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, there is no way of adding additional Serde parameters into the Glue L2 construct, am I missing something?

serdeInfo: {
serializationLibrary: props.dataFormat.serializationLibrary.className,
},

https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-glue-table-serdeinfo.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! My next question is, where does field.delim and serialization.format come from? How can I supply the additional parameters? I don't have a ton of context in this space so you're going to have to make sure your documentation makes sense to those who are learning how to use glue.

I noticed these when I created the table via the console UI and then queried it. They must come from:
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, there is no way of adding additional Serde parameters into the Glue L2 construct, am I missing something?

serdeInfo: {
serializationLibrary: props.dataFormat.serializationLibrary.className,
},

https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-glue-table-serdeinfo.html

workaround:

  // escape hatch for missing config option in glue.Table
  // ref.: https://github.com/aws/aws-cdk/issues/16660
  propertyOverrides.forEach(({ propertyPath, value }) => {
    (table.node.defaultChild as glue.CfnTable).addPropertyOverride(
      propertyPath,
      value
    );
  });

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes. I meant to say that there is no native support for adding additional Serde parameters. We can't add documentation to our L2 Glue Construct that requires an escape hatch. Seems related to #14159, though I'm not sure if you are suggesting that the additional parameters needed belong in serdeInfo.parameters or tableInput.parameters.

Do you have working code for this feature using the Glue L2 w/ escape hatches? And have you confirmed that the originally documented method using APACHE_LOGS does not work? I can/will investigate this as well but since this is marked p2 it isn't something I can immediately prioritize.

At any rate, if these parameters are necessary, we need native support for parameters before we support it in DataFormat.

Copy link
Contributor Author

@naseemkullah naseemkullah Oct 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have working code for this feature using the Glue L2 w/ escape hatches? And have you confirmed that the originally documented method using APACHE_LOGS does not work? I can/will investigate this as well but since this is marked p2 it isn't something I can immediately prioritize.

Yes ans yes.

At any rate, if these parameters are necessary, we need native support for parameters before we support it in DataFormat.

OK, we then depend on #14159

BTW this will be useful for awslabs/aws-solutions-constructs#399

@mergify mergify bot dismissed kaizencc’s stale review October 22, 2021 01:48

Pull request has been modified.

Co-authored-by: kaizen3031593 <36202692+kaizen3031593@users.noreply.github.com>
@kaizencc kaizencc added effort/small Small work item – less than a day of effort p2 labels Oct 22, 2021
@naseemkullah
Copy link
Contributor Author

Hi @naseemkullah! Can you provide some documentation as to how you know that this is the correct approach for CloudFront logs? It seems like the main difference is the type of the property serializationLibrary.

At the end of the day, what I'd like to do here is to provide a static method like DataFormat.CLOUDFRONT_LOGS instead of appending this functionality on to other methods. Does that sound like something you'd like to add?

that sounds good, what would that look like?

@kaizencc
Copy link
Contributor

Hi @naseemkullah! Can you provide some documentation as to how you know that this is the correct approach for CloudFront logs? It seems like the main difference is the type of the property serializationLibrary.
At the end of the day, what I'd like to do here is to provide a static method like DataFormat.CLOUDFRONT_LOGS instead of appending this functionality on to other methods. Does that sound like something you'd like to add?

that sounds good, what would that look like?

I'm not sure. The problem we are trying to solve here is making sure that using Cloudwatch Logs as a data format is accessible and intuitive for the CDK user. Your attempt is a strong first effort, but I feel like there's more to be done. Whether that's creating a new static method called DataFormat.CLOUDFRONT_LOGS or clearly documenting DataFormat.TSV inline and in the README, its up to you. Our goal is that if a user wants to specify Cloudwatch Logs as a data format, they can find out how to do so relatively easily.

I think the first question is if there is more to be done re: adding additional Serde parameters. If that is necessary, then we may have to expose that in the glue.Table L2 as well. It gets complicated...

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildProject89A8053A-LhjRyN9kxr8o
  • Commit ID: ba109d1
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mattiamatrix
Copy link
Contributor

I am having a similar issue with CloudTrail logs, I'd like to create a cloudtrail table with CDK but the result that I obtain is different than the suggested SQL statement.

In the specific case, I can't add serialization.format=1 with the L2 construct Table.

@kaizencc kaizencc removed their assignment Jan 24, 2022
@TheRealAmazonKendra
Copy link
Contributor

This PR has been deemed to be abandoned, and will be closed. Please create a new PR for these changes if you think this decision has been made in error.

@naseemkullah
Copy link
Contributor Author

This PR has been deemed to be abandoned, and will be closed. Please create a new PR for these changes if you think this decision has been made in error.

Abandoned by the reviewer you mean? This documentation improvement is based on real world experience setting up log querying in Athena, please reconsider opening. Having the contributor reopen a duplicate PR is an unnecessary hoop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-glue Related to AWS Glue effort/small Small work item – less than a day of effort p2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants