Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When a line of code raises outside of the handler function, then datadog is not able to detect the error. #210

Closed
nalepae opened this issue Mar 8, 2022 · 9 comments
Labels
bug Something isn't working

Comments

@nalepae
Copy link

nalepae commented Mar 8, 2022

Expected Behavior

When a line of code raises outside of the handler function, then datadog should be able to detect the error.

Actual Behavior

When a line of code raises outside of the handler function, then datadog is not able to detect the error.
==> If an monitor (attached to a Slack alert) is set up when an exception is raised (and not catched) on this lambda, then the corresponding monitor is not triggered and no Slack message is sent.

Steps to Reproduce the Problem

Define the following Lambda function:

import json

# The following line will raise on purpose
0/0

def lambda_handler(event, context):
    return {
        'statusCode': 200,
        'body': json.dumps('Hello from Lambda!')
    }

Set

  • The AWS handler to datadog_lambda.handler.handler, and
  • The DD_LAMBDA_HANDLER environment variable to <your_file>.lambda_handler

Run the lambda.
==> Even if the line 0/0 raises, no trace will be visible in the Invocation Serverless part of Datadog.
image

Note we see invocations on top left chart (3 blue vertical bars), but there is nothing in the center panel (No traced invocation in the time window), no way to see the traces, the Python stack trace ...

If we move the 0/0 in the handler, like below:

import json

def lambda_handler(event, context):
    # The following line will raise on purpose
    0/0
    return {
        'statusCode': 200,
        'body': json.dumps('Hello from Lambda!')
    }

then Datadog behaves correctly (visible Traces, Monitor, Slack Message ...)

Specifications

  • Datadog Lambda Layer version:
    image

  • Python version: 3.8

Additional information

I understand we specify the handler to Datadog, and thus cannot be aware of things running out of the handler, but as indicated in AWS best practices, there is benefits to run some code out of the handler. If this code fails, it is very important that the developer team is notified.

Take advantage of execution environment reuse to improve the performance of your function. Initialize SDK clients and database connections outside of the function handler, and cache static assets locally in the /tmp directory. Subsequent invocations processed by the same instance of your function can reuse these resources. This saves cost by reducing function run time.

@nalepae nalepae changed the title When a line of code raises outside of the handler function, then datadog is be able to detect the error. When a line of code raises outside of the handler function, then datadog is not able to detect the error. Mar 8, 2022
@astuyve
Copy link
Contributor

astuyve commented Mar 25, 2022

Hi @nalepae - thanks for this ticket. I'm sorry for the delay in responding to you.

The Datadog library works by wrapping your handler function, so if you've got a syntax error, import error, or divide by zero error outside of your handler function - so I can imagine there could be scenarios where we can't catch a failure of some kind.

I recently attempted this:

import json
def throw():
  0/0

def hello(event, context):

    body = {
        "message": "Go Serverless v1.0! Your function executed successfully!",
        "input": event
    }

    response = {
        "statusCode": 200,
        "body": json.dumps(body),
        "headers": {"content-type": "application/json"}
    }

    return response

And I see logs, metrics, and traces:
image
and here's the trace with the error:
image

As per your note, I think most users will like create methods outside of their handler functions and call them from the handler in order to memoize a connection or cache data - and these calls would be traced and captured by Datadog in the event of a failure.

However, when I removed the throw method and instead just divide by zero, the function crashed entirely with An unknown application error occurred:
image
This causes our runtime to crash, which is why it's not reported in Datadog.

This looks like it's a bug which could be on our end or something AWS can fix. I'll update you with more information soon.

Thanks again!

@astuyve
Copy link
Contributor

astuyve commented May 12, 2022

Closing as there has been no reply for over 30 days.

@astuyve astuyve closed this as completed May 12, 2022
@nalepae
Copy link
Author

nalepae commented May 12, 2022

Closing as there has been no reply for over 30 days.

Yes, but the issue is still here!

@astuyve
Copy link
Contributor

astuyve commented May 12, 2022

Hi! Thanks for the reply, I'm sorry about that. I'm returning from a few weeks of vacation and mis-remembered my own reply. I think there are a few options here, I'll explore how we can solve this either in the library itself or in the extension.

Thanks!

@astuyve astuyve reopened this May 12, 2022
@duncanista duncanista added the bug Something isn't working label Jan 9, 2024
@MatejBalantic
Copy link

I'd like to +1 on this issue. In our case, we run database migrations outside of the handler because we want them to only happen in cold starts (first-time lambda starts) rather than in all consequent warm executions. As we utilize provisioned concurrency, we've got quite a number of lambdas running all the time and gain a lot of benefits from this setup.

However, if our database migration crashes (it does; that's why I am here :)), the errors don't show up in DataDog.

@nalepae
Copy link
Author

nalepae commented Feb 14, 2024

Yes my use case was the same: A lot of work to do outside the handler because I want them to only happen in cold starts.
And is something crashes when this "out of handler code" is executed, then datadog is blind about this event.

@astuyve
Copy link
Contributor

astuyve commented Feb 14, 2024

Hi folks, this should still be flagged in log-based error tracking. Is that not showing up?

Is the ask here for this to create an APM span upon failure? Where else would you expect to see init failures flagged?

Thank you!

@MatejBalantic
Copy link

MatejBalantic commented Feb 14, 2024

Exactly, it should be shown in the APM as a trace/span, like in case of any other error. This is where we always start our investigation from. It is also what drives our error tracking and monitoring, such as alerts for exceeded error ratio etc. right now it flies under the radar

@duncanista
Copy link
Contributor

Closing due to #475, and latest release of this package including it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants