Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure Operations Sent To Application Insights #33741

Closed
mike-england opened this issue Jan 8, 2024 · 13 comments
Closed

Failure Operations Sent To Application Insights #33741

mike-england opened this issue Jan 8, 2024 · 13 comments
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Monitor - ApplicationInsights needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention Workflow: This issue is responsible by Azure service team.

Comments

@mike-england
Copy link

  • Package Name: azure-monitor-opentelemetry
  • Package Version: 1.1.1
  • Operating System: MacOS, Debian (bookworm)
  • Python Version: 3.12.1

Describe the bug
I've been able to instrument my flask application and after some struggling to get the application map to show up correctly, I think I've got everything reporting correctly and to my liking. That said, I do get a "Failed Operation" showing up in application insights. it looks like one every 3-5 seconds. The really bad part is I'm not able to look at any additional information as they don't have a response code.

I thought it might be related to this issue: #33623 as I was seeing these status messages. I silenced those with the following but, of course, any data they're actually sending would still be showing up an application insights

azure_api_logging = logging.getLogger('azure') azure_api_logging.setLevel(logging.ERROR)

I've attached a screen shot of the strange operations. Has anyone seen something like this before?

To Reproduce
I don't have a clean, simple example. I'm currently running flask 3.0 but also saw this behaviour with flask 2.1.2.

Expected behavior
No operation errors reported

Screenshots
Screenshot 2024-01-08 at 11 04 13 AM

@github-actions github-actions bot added Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Monitor - ApplicationInsights needs-team-triage Workflow: This issue needs the team to triage. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Jan 8, 2024
@xiangyan99 xiangyan99 added Service Attention Workflow: This issue is responsible by Azure service team. and removed needs-team-triage Workflow: This issue needs the team to triage. labels Jan 8, 2024
@github-actions github-actions bot added needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team labels Jan 8, 2024
Copy link

github-actions bot commented Jan 8, 2024

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @azmonapplicationinsights.

@lzchen
Copy link
Member

lzchen commented Jan 9, 2024

@mike-england

Could you upload a code snippet of how you are instrumenting with the SDK as well as any logging/flask calls you are making.

@mike-england
Copy link
Author

Yes, I think I've been able to narrow this down to the minimum to reproduce the problem:

app.py

import logging

from flask import Flask
from opentelemetry.instrumentation.flask import FlaskInstrumentor
from waitress import serve

from azure_telemetry import setup_logging

FORMAT = '%(asctime)s - {%(module)s:%(lineno)d} %(levelname)s - %(message)s'
myHandler = logging.StreamHandler()

logging.basicConfig(format=FORMAT, level=logging.INFO, handlers=[myHandler], datefmt="%Y-%m-%d %H:%M:%S %Z")
logger = logging.getLogger(__name__)

logging.getLogger('azure').setLevel(logging.ERROR)
app = Flask(__name__)

@app.route('/health')
def health():
    return {"status":"all good"}

if __name__ == "__main__":
    instrumentor = FlaskInstrumentor()
    instrumentor.instrument_app(app)
    
    setup_logging()

    serve(app, port=8080, threads=4)

azure_telemetry.py

from os import environ

from azure.monitor.opentelemetry import configure_azure_monitor

azure_connection_string = environ.get('AZURE_CONNECTION', 'InstrumentationKey=cd7428fa-655b-4ace-bdd8-f4e146f74fae;IngestionEndpoint=https://westus2-2.in.applicationinsights.azure.com/')
azure_cloud_role = environ.get('AZURE_CLOUD_ROLE', 'Azure-Test')
# this works, but I'd rather set programatically
environ['OTEL_SERVICE_NAME'] = azure_cloud_role

def setup_logging():
    configure_azure_monitor(connection_string=azure_connection_string)

requirements.txt

Flask==3.0.0
waitress==2.1.2
azure-monitor-opentelemetry==1.1.1

Dockerfile

# For more information, please refer to https://aka.ms/vscode-docker-python
FROM python:3.12.1-slim-bookworm

# Keeps Python from generating .pyc files in the container
ENV PYTHONDONTWRITEBYTECODE 1

# Turns off buffering for easier container logging
ENV PYTHONUNBUFFERED 1

RUN apt-get update
RUN apt-get install -y libsasl2-dev python-dev-is-python3 libldap2-dev libssl-dev gcc librdkafka-dev

# Install pip requirements
ADD requirements.txt .
RUN python -m pip install -r requirements.txt

WORKDIR /app
ADD . /app

# Switching to a non-root user, please refer to https://aka.ms/vscode-docker-python-user-rights
RUN useradd appuser && chown -R appuser /app
USER appuser

# see https://github.com/Azure/azure-sdk-for-python/issues/33295
ENV OTEL_EXPERIMENTAL_RESOURCE_DETECTORS "azure_app_service"
ENV OTEL_PYTHON_EXCLUDED_URLS="health,swagger.*"

# During debugging, this entry point will be overridden. For more information, please refer to https://aka.ms/vscode-docker-python-debug
CMD ["python", "app.py"]

I copied the docker file from another project so there are a couple extra packages in there but I don't think they have anything to do with this.

build / run: docker build -t azuretelemetry . && docker run --rm -p 8080:8080 azuretelemetry

And then hit the /health endpoint http://localhost:8080/health a bunch of times

When we run this under kubernetes, the /health endpoint gets hit a lot which is what I think I'm seeing in the operations error page.

@lzchen
Copy link
Member

lzchen commented Jan 17, 2024

@mike-england

configure_azure_monitor() already calls FlaskInstrumentor().instrument() underneath the hood. Could you remove the calls

instrumentor = FlaskInstrumentor()
    instrumentor.instrument_app(app)
```
and just call `setup_logging()` from your app and see if you still are getting this issue?

@mike-england
Copy link
Author

Technically the problem goes away, but so does all the data. I don't get any failed requests but I'm also missing server response times, and nothing shows up in the application map. I do continue to get logs under traces.

If I put that line in, all of the data is there plus those errors. Did you get different results from the example code? Maybe I've got something setup incorrectly in my azure application insights.

@lzchen
Copy link
Member

lzchen commented Jan 18, 2024

@mike-england

We have a basic example for flask here, could you try this and then build your application on top of it to see if you get telemetry?

@mike-england
Copy link
Author

Do you guys happen to have a 'complete' example? I can't run that directly which has been a source of confusion.

@lzchen
Copy link
Member

lzchen commented Jan 19, 2024

May I ask why you aren't able to run it directly? Are you running into errors? The idea is to have a basic setup using flask and azure monitor distro and then build your solution on top of it, checking that telemetry is being sent along the way.

@mike-england
Copy link
Author

mike-england commented Jan 20, 2024

It'd be nice if there were some notes in the code. There are errors. If I just run it I get this

raise ValueError("Instrumentation key cannot be none or empty.")

And of course I need to set APPLICATIONINSIGHTS_CONNECTION_STRING.

And once that's fixed it hangs with an error Exception in detector <opentelemetry.resource.detector.azure.vm.AzureVMResourceDetector object at 0x104727580>, ignoring

I can fix that by setting OTEL_EXPERIMENTAL_RESOURCE_DETECTORS="azure_app_service" and then the demo runs.

I will also say that I think I've been able to fix my problem, at least with the test code I have above. It's all about ordering, so it'd be nice if the team was able to do something about that.

Yes, I need to run configure_azure_monitor() before flask is initialized, but after my logging has been established.

That results in app.py looking like this:

import logging
from azure.monitor.opentelemetry import configure_azure_monitor

FORMAT = '%(asctime)s - {%(module)s:%(lineno)d} %(levelname)s - %(message)s'
myHandler = logging.StreamHandler()

logging.basicConfig(format=FORMAT, level=logging.INFO, handlers=[myHandler], datefmt="%Y-%m-%d %H:%M:%S %Z")
logger = logging.getLogger(__name__)

logging.getLogger('azure').setLevel(logging.ERROR)

configure_azure_monitor()

from flask import Flask, redirect,url_for
from waitress import serve
from blueprint import blueprint as blueprintv1

app = Flask(__name__)

@app.route('/health')
def health():
    return {"status":"all good"}

@app.route('/')
def hello():
    return redirect(url_for('.apiv1.doc'))

if __name__ == "__main__":
    app.register_blueprint(blueprintv1, url_prefix="/api/v1.0")
    serve(app, port=8080, threads=4)

Not as intuitive as I'd like but so far, seems to be working.

@mike-england
Copy link
Author

And with some further testing, just to be clear to anyone running into this thread. You need to import flask after you've initialized azure monitor, not just initializing flask. You also need to setup any local logging you'd like before or you will get azure logs but you won't get anything local.

This means I can technically move the from waitress and from blueprint statements to the top of the file but the rest need to stay. Again, in my opinion, not ideal. I'd rather need a separate flask initialization and have the flow of my code be as I like but at least this is working now without any phantom failed requests

@mike-england
Copy link
Author

mike-england commented Jan 21, 2024

Sorry, one more thing. I think the requirements are also tripping me up. Things were working find on my laptop but when I put it into a docker container the application map and server requests stopped working. At first I thought this was flask 3.0, however I think it's actually Werkzeug. If I use this as my requirements.txt file:

Flask==2.3.3
azure-monitor-opentelemetry==1.1.1
Werkzeug==2.3.8

It works fine, but if I use this:

Flask==2.3.3
azure-monitor-opentelemetry==1.1.1
Werkzeug==3.0.1

It no longer works. I'm trying to run flask 3.0+ which requires Werkzeug 3.0+.

@lzchen
Copy link
Member

lzchen commented Jan 22, 2024

@mike-england

Alot to unpack here. I'll attempt to address your points individually

raise ValueError("Instrumentation key cannot be none or empty.")

We have numerous amount of documentation related to having to set your connection string in our official docs and github. The concept of connection string should be understood as an Applicaiton Insights concept before using the SDK.

Exception in detector <opentelemetry.resource.detector.azure.vm.AzureVMResourceDetector object at 0x104727580>, ignoring

This is a recent bug that was introduced that SHOULD be mitigated with the latest release. Please try to upgrade your SDK to the latest version to see if you are still getting errors (without setting the env var). If you are still getting the hanging, you can set the env var for now to unblock you but we do want to try to debug that use case without it. Should not be indicative of regular usage.

You also need to setup any local logging you'd like before or you will get azure logs but you won't get anything local.

This shouldn't be the case. If you are getting this behavior then this is not the intention of the SDK to change your configured behavior outside of monitoring. Try running this example to see if you still witness this related to logging.

You need to import flask after you've initialized azure monitor, not just initializing flask.

I believe you can do import flask before running configure_azure_monitor() but not from flask Import Flask. Our example just needs updating most likely.

At first I thought this was flask 3.0, however I think it's actually Werkzeug.

Flask 3.0+ is currently not supported in the OpenTelemetry Flask instrumentation

@mike-england
Copy link
Author

Thanks for your patience. I think my first two comments were really about things I expected. I have read as much documentation as I can find but found things a bit confusing. Having a couple lines of comment in that demo code could go a long way to letting new people get up and running, but that's just my 2 cents.

For initialization order, I'll play more with this, but for sure I have something that works. So, while it has been a bit frustrating, I'm through that.

As for flask 3, I was pretty sure I read it somewhere, but I guess not. I have seen this issue, so hopefully something will be in a release soon and that would solve my problem:

open-telemetry/opentelemetry-python-contrib#2013

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Monitor - ApplicationInsights needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention Workflow: This issue is responsible by Azure service team.
Projects
None yet
Development

No branches or pull requests

3 participants