-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs(aws-glue): what format to use w/ cloudfront #16695
Conversation
Title does not follow the guidelines of Conventional Commits. Please adjust title before merge. |
924c117
to
ebfd5f2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @naseemkullah! Can you provide some documentation as to how you know that this is the correct approach for CloudFront logs? It seems like the main difference is the type of the property serializationLibrary
.
At the end of the day, what I'd like to do here is to provide a static method like DataFormat.CLOUDFRONT_LOGS
instead of appending this functionality on to other methods. Does that sound like something you'd like to add?
@@ -304,6 +304,10 @@ export class DataFormat { | |||
/** | |||
* DataFormat for TSV (Tab-Separated Values) | |||
* | |||
* @remarks Also works for CloudFront logs when supplied with | |||
* additional SerdeInfo parameters (`field.delim` and | |||
* `serialization.format` set to "\t") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't clear to me where you are suggesting to put the additional SerdeInfo parameters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @kaizen3031593 , this comes from https://docs.aws.amazon.com/athena/latest/ug/cloudfront-logs.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! My next question is, where does field.delim
and serialization.format
come from? How can I supply the additional parameters? I don't have a ton of context in this space so you're going to have to make sure your documentation makes sense to those who are learning how to use glue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, there is no way of adding additional Serde parameters into the Glue L2 construct, am I missing something?
aws-cdk/packages/@aws-cdk/aws-glue/lib/table.ts
Lines 274 to 276 in 672acba
serdeInfo: { | |
serializationLibrary: props.dataFormat.serializationLibrary.className, | |
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! My next question is, where does field.delim and serialization.format come from? How can I supply the additional parameters? I don't have a ton of context in this space so you're going to have to make sure your documentation makes sense to those who are learning how to use glue.
I noticed these when I created the table via the console UI and then queried it. They must come from:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, there is no way of adding additional Serde parameters into the Glue L2 construct, am I missing something?
aws-cdk/packages/@aws-cdk/aws-glue/lib/table.ts
Lines 274 to 276 in 672acba
serdeInfo: { serializationLibrary: props.dataFormat.serializationLibrary.className, },
workaround:
// escape hatch for missing config option in glue.Table
// ref.: https://github.com/aws/aws-cdk/issues/16660
propertyOverrides.forEach(({ propertyPath, value }) => {
(table.node.defaultChild as glue.CfnTable).addPropertyOverride(
propertyPath,
value
);
});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes. I meant to say that there is no native support for adding additional Serde parameters. We can't add documentation to our L2 Glue Construct that requires an escape hatch. Seems related to #14159, though I'm not sure if you are suggesting that the additional parameters needed belong in serdeInfo.parameters
or tableInput.parameters
.
Do you have working code for this feature using the Glue L2 w/ escape hatches? And have you confirmed that the originally documented method using APACHE_LOGS does not work? I can/will investigate this as well but since this is marked p2
it isn't something I can immediately prioritize.
At any rate, if these parameters are necessary, we need native support for parameters before we support it in DataFormat
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have working code for this feature using the Glue L2 w/ escape hatches? And have you confirmed that the originally documented method using APACHE_LOGS does not work? I can/will investigate this as well but since this is marked
p2
it isn't something I can immediately prioritize.
Yes ans yes.
At any rate, if these parameters are necessary, we need native support for parameters before we support it in
DataFormat
.
OK, we then depend on #14159
BTW this will be useful for awslabs/aws-solutions-constructs#399
Pull request has been modified.
Co-authored-by: kaizen3031593 <36202692+kaizen3031593@users.noreply.github.com>
that sounds good, what would that look like? |
I'm not sure. The problem we are trying to solve here is making sure that using Cloudwatch Logs as a data format is accessible and intuitive for the CDK user. Your attempt is a strong first effort, but I feel like there's more to be done. Whether that's creating a new static method called I think the first question is if there is more to be done re: adding additional Serde parameters. If that is necessary, then we may have to expose that in the |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
I am having a similar issue with CloudTrail logs, I'd like to create a cloudtrail table with CDK but the result that I obtain is different than the suggested SQL statement. In the specific case, I can't add |
This PR has been deemed to be abandoned, and will be closed. Please create a new PR for these changes if you think this decision has been made in error. |
Abandoned by the reviewer you mean? This documentation improvement is based on real world experience setting up log querying in Athena, please reconsider opening. Having the contributor reopen a duplicate PR is an unnecessary hoop |
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license