-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(azure): adjust SKU and storage for staging #1601
Conversation
📝 WalkthroughWalkthroughThe pull request introduces several modifications to the Changes
Possibly related PRs
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
.azure/infrastructure/staging.bicepparam (2)
42-42
: Enabling index tuning is a good performance optimizationEnabling
enableIndexTuning
is a positive change that will help:
- Automatically optimize query performance
- Reduce manual database maintenance
- Complement the upgraded hardware resources
This is particularly beneficial when supporting multiple test environments.
Consider implementing monitoring to track the effectiveness of automatic index tuning and query performance insights. This will help validate the infrastructure changes and identify any further optimizations needed.
34-42
: Consider implementing auto-scaling policiesWhile the upgrade addresses immediate performance concerns, consider implementing auto-scaling policies for:
- Storage auto-scaling (already enabled with
autoGrow: 'Enabled'
)- Compute resource scaling based on usage patterns
- Cost optimization during low-usage periods
This would provide better resource utilization across multiple test environments.
Would you like assistance in implementing auto-scaling policies?
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.azure/infrastructure/staging.bicepparam
(1 hunks)
🔇 Additional comments (2)
.azure/infrastructure/staging.bicepparam (2)
34-35
: SKU upgrade aligns with objectives but verify cost implications
The change from Standard_B1ms
(Burstable) to Standard_D4ads_v5
(GeneralPurpose) addresses the high CPU usage issue and provides:
- 4 vCPUs and memory-optimized configuration
- Consistent performance without throttling
- Better suited for multiple test environments
However, this is a significant upgrade that will increase costs.
Let's verify if this is the most cost-effective SKU for the requirements:
✅ Verification successful
SKU choice is appropriate based on environment alignment
The SKU upgrade from Burstable to Standard_D4ads_v5 is justified based on the following findings:
- Test environment uses Burstable B2s tier (lower resources)
- Production uses Standard_D8ads_v5 GeneralPurpose tier (higher resources)
- Staging's Standard_D4ads_v5 serves as a middle ground between test and production
- The upgrade aligns with the environment hierarchy and supports multiple test environments
The chosen SKU provides a balanced approach:
- More powerful than test (B2s)
- Less expensive than production (D8ads_v5)
- Consistent performance needed for staging environment
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Search for any monitoring or metrics data that might help justify the SKU choice
# Check if there are any monitoring configurations or metrics collection
rg "metrics|monitoring|performance" .azure/infrastructure/
Length of output: 60
Script:
#!/bin/bash
# Let's check the current and previous SKU configurations to understand the scale of change
rg -A 3 "name:|tier:" .azure/infrastructure/
# Also check for any comments or documentation about performance requirements
rg -i "cpu|memory|performance|load" .azure/infrastructure/
Length of output: 12144
38-38
: Verify if 256GB storage increase is justified
The storage increase from 32GB to 256GB is substantial. While this provides headroom for growth, it's important to ensure this aligns with actual usage patterns.
Let's check for any data retention policies or database size configurations:
✅ Verification successful
Let me gather more context about storage usage patterns and configurations.
Storage increase to 256GB appears justified
The storage increase is aligned with production requirements based on the following evidence:
- Production (prod.bicepparam) and YT01 (yt01.bicepparam) environments are already configured with 256GB storage
- The change history in CHANGELOG.md shows deliberate SKU and storage adjustments for production environments
- There's a documented risk of storage pressure from WAL files in the CDC documentation, indicating the need for adequate storage capacity
- The environment uses a higher-tier SKU (Standard_D4ads_v5) suitable for production workloads
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Look for retention policies or database size configurations
# Search for retention or storage related configurations
rg "retention|storage|backup" .azure/infrastructure/
Length of output: 952
Script:
#!/bin/bash
# Search for any documentation, issues, or PRs related to storage requirements
rg -i "storage|disk|space" --type md
# Look for database configurations that might indicate storage needs
rg -i "database|postgres" .azure/infrastructure/ -A 5
Length of output: 7131
Quality Gate passedIssues Measures |
🤖 I have created a release *beep* *boop* --- ## [1.41.3](v1.41.2...v1.41.3) (2024-12-13) ### Bug Fixes * **azure:** adjust SKU and storage for staging ([#1601](#1601)) ([3fb9f95](3fb9f95)) * Collapse subject resource mappings before building sql query ([#1579](#1579)) ([b39c376](b39c376)) * **webapi:** Explicit null on non-nullable lists no longer causes 500 INTERNAL SERVER ERROR ([#1602](#1602)) ([2e8b3e6](2e8b3e6)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Description
Related to https://digdir.slack.com/archives/C079D6PAGDS/p1734018202302459
Experiencing heavy CPU usage on the postgresql server in staging. After some investigations, it was hard to pin down the cause. Being on a burstable tier may be too fragile, and perhaps contribute to the high CPU usage, so upgrading. Staging will be used by many test environments moving forward so making it more stable with GeneralPurpose machines makes sense.
Related Issue(s)
Verification
Documentation
docs
-directory, Altinnpedia or a separate linked PR in altinn-studio-docs., if applicable)