Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logging Noise Seen from Resource Processor #2135

Closed
martinpeck opened this issue Jun 27, 2022 · 4 comments · Fixed by #3668
Closed

Logging Noise Seen from Resource Processor #2135

martinpeck opened this issue Jun 27, 2022 · 4 comments · Fixed by #3668
Assignees
Labels
bug Something isn't working

Comments

@martinpeck
Copy link
Member

When running an AppInsights query such as this...

traces
| where cloud_RoleName == "runner.py"
| where message !startswith "Retrying due to transient client"
| where message !startswith "No sessions for this process"
| where not (message startswith "Unable to find image" and strlen(message)<170)
| where severityLevel >1
| order by timestamp desc

... a number of messages that are simply [stderr] are reported...

image

Specifically, this query will return all such logs:

traces
| where cloud_RoleName == "runner.py"
| where message == "[stderr]"

This appears to be related to code in runner.py and logging.py within the resource processor.

We need to avoid this noise and, at the same time, make sure this noise doesn't mean we're dropping other more important logs.

@martinpeck martinpeck added the bug Something isn't working label Jun 27, 2022
@martinpeck
Copy link
Member Author

Is shell_output_logger ever supposed to log to AppInsights?
The name suggests it shouldn't.

@tamirkamara
Copy link
Collaborator

tamirkamara commented Jun 27, 2022

When addressing this just need to also account for local development and us using direct console view (like docker logs) on the vm.

@stuartleeks
Copy link
Contributor

I'm seeing a lot of noise from the Retrying due to transient client... error

I just ran this query:

traces
| where message startswith "Retrying due to transient client side error HTTPsConnectionPool(host='westus-0.in"
| summarize count() by bin(timestamp, 1s)
| render timechart 

And get the following output showing a flurry of instances roughly every 15 minutes

image

It's not a great experience when you're digging into the logs to troubleshoot an issue!

@joalmeid joalmeid added this to the Release 0.4 milestone Jul 6, 2022
@jjgriff93 jjgriff93 moved this to Up Next in Azure TRE - Engineering Nov 9, 2022
@martinpeck martinpeck removed their assignment Nov 28, 2022
@marrobi marrobi removed this from the Release 0.5 milestone Jun 27, 2023
@marrobi
Copy link
Member

marrobi commented Aug 7, 2023

At present:

    logger.log(logging_level, prefix_item)
    logger.log(logging_level, console_output)

This means [stderr] prefix appears as its own entry.

Will look at changing to:

    logger.log(logging_level, f"{prefix_item} {console_output}")

Also default log level is INFO, some packages are outputting too much info at the INFO level, such as service bus, going to look at changing the default to `WARN' when not debugging. Ideally this would be configurable.

@marrobi marrobi self-assigned this Aug 7, 2023
@github-project-automation github-project-automation bot moved this from In Progress to Done in AzureTRE - Crew Rock Sep 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Status: Done
Development

Successfully merging a pull request may close this issue.

5 participants