Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrading New Relic Python agent past v8.10 on API causes crash loop in K8s #458

Closed
2 tasks done
jimleroyer opened this issue Nov 20, 2024 · 5 comments
Closed
2 tasks done
Assignees
Labels
Bug | Bogue Bug related task. Reliability Task related to reliability.

Comments

@jimleroyer
Copy link
Member

jimleroyer commented Nov 20, 2024

Describe the bug

Upgrading past New Relic v8.10.0+ leads to the Kubernetes pods to get into a crash loop.

Bug Severity

(SEV-1 Critical, SEV-2 Major, SEV-3 Minor, SEV-4 Low)

SEV-2 - Slower bootup -- potentially infinite deployment crash loop

To Reproduce

Upgrade New Relic with version 8.10.1+. Observe deployment failures leading to continual crash loop.

Expected behavior

New Relic can be upgraded to a more recent version than 8.10.1 without issues.

Impact

We are stuck with New Relic 8.10 for the time being and that might prevent us from upgrading Python, and monitoring our systems with latest New Relic features. The Python agent version might get outdated and lead to troubles as well. If we upgrade without fixing the issue, then we get a slower deployment time for our k8s pods and that potentially leads to instability with these.

If applicable

Impact on Notify users:

Impact on Recipients:

Impact on Notify team:

Screenshots

If applicable, add screenshots to help explain your problem.

QA

  • Verify that New Relic has been upgraded to latest version (or after 8.10+).
  • Verify that the API pods can deploy with no issues with the upgraded New Relic, avoiding the potential deployment crash loop.

Additional context

See issue #313 which led the investigation to the root cause of the New Relic python agent performance issues.

@jimleroyer jimleroyer added Bug | Bogue Bug related task. Reliability Task related to reliability. labels Nov 20, 2024
@jimleroyer jimleroyer self-assigned this Nov 20, 2024
@jimleroyer jimleroyer changed the title Upgrade New Relic to latest version for API with no deployment crash loop Upgrading New Relic Python agent past v8.10 on API causes crash loop in K8s Nov 20, 2024
@jimleroyer
Copy link
Member Author

jimleroyer commented Nov 20, 2024

PR to add a CloudWatch Logs Insights query to track Gunicorn total running time in API:
cds-snc/notification-terraform#1665

PR for API and upgrade to latest of latest of New Relic:
cds-snc/notification-api#2367

@jimleroyer
Copy link
Member Author

The admin was upgraded to latest New Relic Python agent v10.3.0. Ready to get QA'ed and deployed in production.

@jimleroyer jimleroyer assigned sastels and unassigned jimleroyer Nov 25, 2024
@jimleroyer
Copy link
Member Author

Steve to QA this card.

@sastels
Copy link

sastels commented Nov 25, 2024

new api pods deployed on prod,

# pip list | grep relic
newrelic                    10.3.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug | Bogue Bug related task. Reliability Task related to reliability.
Projects
None yet
Development

No branches or pull requests

2 participants