Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS integration fails to create the integration and rolls back cloudformation stack with Internal failure. #236

Open
dogfish182 opened this issue Dec 16, 2022 · 9 comments
Labels
kind/bug Bug related issue stale Stale - Bot reminder

Comments

@dogfish182
Copy link

dogfish182 commented Dec 16, 2022

Describe the bug
AWS integration fails with obscure error

To Reproduce
Steps to reproduce the behavior:
run a template that looks like this

Resources:
  DatadogAWSDatadogIntegrationAWS:
    Type: Datadog::Integrations::AWS
    Properties:
      AccountID: '123123123123'
      RoleName: shared-datadog-aws-integration
    Metadata:
      aws:cdk:path: mystack/DatadogAWSDatadogIntegrationAWS
  DatadogRoleF31A7099:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Statement:
          - Action: sts:AssumeRole
            Condition:
              StringEquals:
                sts:ExternalId:
                  Fn::Join:
                    - ''
                    - - '{{resolve:secretsmanager:arn:'
                      - Ref: AWS::Partition
                      - :secretsmanager:eu-west-1:123123123123:secret:DatadogIntegrationExternalID:SecretString:::}}
            Effect: Allow
            Principal:
              AWS: arn:aws:iam::464622532012:root
        Version: '2012-10-17'
      Description: Datadog integration for aws monitoring
      PermissionsBoundary:
        Fn::Join:
          - ''
          - - 'arn:aws:iam::'
            - Ref: AWS::AccountId
            - :policy/base-permissions-boundary
      RoleName: shared-datadog-aws-integration
      Tags:
        - Key: tag
          value: tag
    DependsOn:
      - DatadogAWSDatadogIntegrationAWS
    Metadata:
      aws:cdk:path: mystack/DatadogRole/Resource
  DatadogRolePolicy6CE03EE3:
    Type: AWS::IAM::Policy
    Properties:
      PolicyDocument:
        Statement:
          - Action:
              - alldatadogstuffasperdocs
            Effect: Allow
            Resource: '*'
        Version: '2012-10-17'
      PolicyName: shared-datadog-integration-policy
      Roles:
        - Ref: DatadogRoleF31A7099

Logs

1:36:58 PM | CREATE_FAILED        | Datadog::Integrations::AWS                  | DatadogAWSDatadogIntegrationAWS
Resource handler returned message: "" (RequestToken: 16b2f5a7-3d09-738e-76ae-33db3a6ad5b8, HandlerErrorCode: InternalFa
ilure)


 ❌  mystack failed: Error: The stack named mystack failed to deploy: UPDATE_ROLLBACK_COMPLETE: Resource handler returned message: "" (RequestToken: 16b2f5a7-3d09-738e-76ae-33db3a6ad5b8, HandlerErrorCode: InternalFailure)
    at FullCloudFormationDeployment.monitorDeployment (/Users/me/code/place/project/node_modules/aws-cdk/lib/api/deploy-stack.ts:505:13)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at deployStack2 (/Users/me/code/place/project/node_modules/aws-cdk/lib/cdk-toolkit.ts:265:24)
    at /Users/me/code/place/project/node_modules/aws-cdk/lib/deploy.ts:39:11
    at run (/Users/me/code/place/project/node_modules/p-queue/dist/index.js:163:29)

 ❌ Deployment failed: Error: Stack Deployments Failed: Error: The stack named mystack failed to deploy: UPDATE_ROLLBACK_COMPLETE: Resource handler returned message: "" (RequestToken: 16b2f5a7-3d09-738e-76ae-33db3a6ad5b8, HandlerErrorCode: InternalFailure)
    at deployStacks (/Users/me/code/place/project/node_modules/aws-cdk/lib/deploy.ts:61:11)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at CdkToolkit.deploy (/Users/me/code/place/project/node_modules/aws-cdk/lib/cdk-toolkit.ts:339:7)
    at initCommandLine (/Users/me/code/place/project/node_modules/aws-cdk/lib/cli.ts:374:12)

Stack Deployments Failed: Error: The stack named mystack failed to deploy: UPDATE_ROLLBACK_COMPLETE: Resource handler returned message: "" (RequestToken: 16b2f5a7-3d09-738e-76ae-33db3a6ad5b8, HandlerErrorCode: InternalFailure)

Expected behavior
The cloudformation should run to completion.
I expect the account integration to enable the account in datadog (this does occur)
I expect the secret to be written to secrets manager (this does NOT occur)
I expect my role to be created which I pull the secret from secrets manager (this does NOT occur)

Environment and Versions (please complete the following information):
Datadog AWS Integration 2.2.1
I am generating cloudformation via cdkv2 however I doubt this is relevant as I've included the generated cloudformation template above (which is run and faults).

Additional context
It essentially looks like the cloudformation handler is swallowing the error, which makes it very hard to troubleshoot this.
I've also logged a ticket with datadog support.

@dogfish182 dogfish182 added the kind/bug Bug related issue label Dec 16, 2022
@dogfish182
Copy link
Author

To put a bit more context on this issue, I'm confused by the datadog instructions on how to setup this integration (and have a support ticket running).

This page
https://github.com/DataDog/cloudformation-template/tree/master/aws

^^ says it will setup datadog for you, however one of the first steps is to manually provision your accounts in datadog and copy the externalID as a parameter before you manually run cloudformation. (not really doable at any kind of scale). At the end of the doc it says you can use THIS integration if you wish to manage the integration, this seems like circular logic, because if I already set it up manually then it's unmanaged now?

What I would like to achieve is to use this integration which creates the datadog side resources and then create the AWS side resource myself and input the externalID into the role I'm creating, by reading the secrets manger entry that this extension writes.

Has anyone been able to achieve this?

@github-actions
Copy link

Thanks for your contribution!

This issue has been automatically marked as stale because it has not had activity in the last 30 days. Note that the issue will not be automatically closed, but this notification will remind us to investigate why there's been inactivity. Thank you for participating in the Datadog open source community.

If you would like this issue to remain open:

  1. Verify that you can still reproduce the issue in the latest version of this project.

  2. Comment that the issue is still reproducible and include updated details requested in the issue template.

@github-actions github-actions bot added the stale Stale - Bot reminder label Jan 20, 2023
@dogfish182
Copy link
Author

I can still reproduce this issue as shown in the orginal post.

@flavioelawi
Copy link

We are facing the same error (although on Monitor and Dashboards)
We have a support case open with AWS

@dogfish182
Copy link
Author

We are facing the same error (although on Monitor and Dashboards) We have a support case open with AWS

I did the same and they told us we need to contact datadog as the error is being swallowed by the custom cloudformation resource handler.

@flavioelawi
Copy link

Thanks we just did the same, lets see what happens

@skarimo
Copy link
Member

skarimo commented Mar 30, 2023

Thanks for opening this issue. We are going to merge and release the change #258 which should catch any unhandled exceptions in the resources them selves.

However, this wouldn't expose all errors mainly because AWS does obfuscate logs/events quite heavily on their end so things such as bad type configuration and bad execution roles would still fail in non-obvious ways. Which I suspect is the reason for the failures you are seeing @flavioelawi with dashboards and monitors

@flavioelawi
Copy link

We have resolved our issue;

our execution role already had the correct trust policy:

    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "resources.cloudformation.amazonaws.com",
                    "cloudformation.amazonaws.com"
                ]
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

And a policy to allow access to the Secrets and its Kms key

        {
            "Action": [
                "secretsmanager:GetSecretValue",
                "kms:Decrypt",
                "kms:DescribeKey"
            ],
            "Resource": "*",
            "Effect": "Allow"
        },

We also added the CloudWatchLogsFullAccess managed policy to allow for the integration to push logs to Cloudwatch logs (but its log group is still empty, I guess for another issue)

The issue in our case was a typo in the dynamic reference, where we were missing the SecretString part before the Json attribute selector.

@dogfish182 in your case you are missing external_id from your dynamic reference at the end, this is what is setup by the integration lambda/code

Also some feedback:

  • It would be useful to know for each integration, what kind execution policy and trust policy is expected in the execution role.
  • I can't find it anymore, but was there a github issue in the upstream aws/cloudformation project about LogConfiguration not working?

@skarimo
Copy link
Member

skarimo commented Apr 11, 2023

We released the AWS resource version 2.4.0 that should capture and return any unhandled exception on the resource it self. However, as mentioned previously, errors swallowed by AWS would probably still not be captured by this change as it happens outside of the resource handler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Bug related issue stale Stale - Bot reminder
Projects
None yet
Development

No branches or pull requests

3 participants