fix(azure): adjust SKU and storage for staging #1601

arealmaas · 2024-12-13T09:14:18Z

Description

Related to https://digdir.slack.com/archives/C079D6PAGDS/p1734018202302459

Experiencing heavy CPU usage on the postgresql server in staging. After some investigations, it was hard to pin down the cause. Being on a burstable tier may be too fragile, and perhaps contribute to the high CPU usage, so upgrading. Staging will be used by many test environments moving forward so making it more stable with GeneralPurpose machines makes sense.

Related Issue(s)

#N/A

Verification

Your code builds clean without any errors or warnings
Manual testing done (required)
Relevant automated test added (if you find this hard, leave it and we'll help out)

Documentation

Documentation is updated (either in docs-directory, Altinnpedia or a separate linked PR in altinn-studio-docs., if applicable)

coderabbitai · 2024-12-13T09:14:26Z

📝 Walkthrough

Walkthrough

The pull request introduces several modifications to the .azure/infrastructure/staging.bicepparam file, specifically targeting the PostgreSQL database configuration. Key changes include updates to the SKU name, tier, storage size, and the enabling of index tuning. The overall structure of the parameter definitions remains unchanged, with the adjustments reflecting a transition to a more robust database setup.

Changes

File	Change Summary
`.azure/infrastructure/staging.bicepparam`	Updated `postgresConfiguration`: - SKU name: `Standard_B1ms` → `Standard_D4ads_v5` - SKU tier: `Burstable` → `GeneralPurpose` - Storage size: `32 GB` → `256 GB` - Enable index tuning: `false` → `true` - `enableQueryPerformanceInsight` remains `true`.

Possibly related PRs

feat(azure): enable query performance insights for postgres #1417: Introduces a new parameter enableQueryPerformanceInsight in PostgreSQL configuration, related to the main PR's changes.
feat(azure): enable index tuning for postgres in YT #1455: Adds the enableIndexTuning parameter to PostgreSQL configuration, aligning with the main PR's update to this setting.
feat(azure): Upgrade postgres SKU for prod/yt01 #1450: Updates the SKU name in the postgresConfiguration, relevant to the main PR's SKU name modification.
feat(azure): adjust SKU and storage for yt01 and prod #1508: Adjusts SKU and storage configurations for PostgreSQL, relevant to the main PR's focus on these changes.
fix(azure): ensure correct properties are used when adjusting SKU and storage for postgres #1514: Simplifies the storage configuration for PostgreSQL by removing the tier property, relating to adjustments in the main PR.

Suggested reviewers

oskogstad

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Experiment)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)

.azure/infrastructure/staging.bicepparam (2)

42-42: Enabling index tuning is a good performance optimization

Enabling enableIndexTuning is a positive change that will help:

Automatically optimize query performance

Reduce manual database maintenance

Complement the upgraded hardware resources

This is particularly beneficial when supporting multiple test environments.

Consider implementing monitoring to track the effectiveness of automatic index tuning and query performance insights. This will help validate the infrastructure changes and identify any further optimizations needed.

34-42: Consider implementing auto-scaling policies

While the upgrade addresses immediate performance concerns, consider implementing auto-scaling policies for:

Storage auto-scaling (already enabled with autoGrow: 'Enabled')

Compute resource scaling based on usage patterns

Cost optimization during low-usage periods

This would provide better resource utilization across multiple test environments.

Would you like assistance in implementing auto-scaling policies?

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b39c376 and 8388b71.

📒 Files selected for processing (1)

.azure/infrastructure/staging.bicepparam (1 hunks)

🔇 Additional comments (2)

.azure/infrastructure/staging.bicepparam (2)

34-35: SKU upgrade aligns with objectives but verify cost implications

The change from Standard_B1ms (Burstable) to Standard_D4ads_v5 (GeneralPurpose) addresses the high CPU usage issue and provides:

4 vCPUs and memory-optimized configuration
Consistent performance without throttling
Better suited for multiple test environments

However, this is a significant upgrade that will increase costs.

Let's verify if this is the most cost-effective SKU for the requirements:

✅ Verification successful

SKU choice is appropriate based on environment alignment

The SKU upgrade from Burstable to Standard_D4ads_v5 is justified based on the following findings:

Test environment uses Burstable B2s tier (lower resources)
Production uses Standard_D8ads_v5 GeneralPurpose tier (higher resources)
Staging's Standard_D4ads_v5 serves as a middle ground between test and production
The upgrade aligns with the environment hierarchy and supports multiple test environments

The chosen SKU provides a balanced approach:

More powerful than test (B2s)
Less expensive than production (D8ads_v5)
Consistent performance needed for staging environment

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Search for any monitoring or metrics data that might help justify the SKU choice

# Check if there are any monitoring configurations or metrics collection
rg "metrics|monitoring|performance" .azure/infrastructure/

Length of output: 60

Script:

#!/bin/bash
# Let's check the current and previous SKU configurations to understand the scale of change
rg -A 3 "name:|tier:" .azure/infrastructure/

# Also check for any comments or documentation about performance requirements
rg -i "cpu|memory|performance|load" .azure/infrastructure/

Length of output: 12144

38-38: Verify if 256GB storage increase is justified

The storage increase from 32GB to 256GB is substantial. While this provides headroom for growth, it's important to ensure this aligns with actual usage patterns.

Let's check for any data retention policies or database size configurations:

✅ Verification successful

Let me gather more context about storage usage patterns and configurations.

Storage increase to 256GB appears justified

The storage increase is aligned with production requirements based on the following evidence:

Production (prod.bicepparam) and YT01 (yt01.bicepparam) environments are already configured with 256GB storage
The change history in CHANGELOG.md shows deliberate SKU and storage adjustments for production environments
There's a documented risk of storage pressure from WAL files in the CDC documentation, indicating the need for adequate storage capacity
The environment uses a higher-tier SKU (Standard_D4ads_v5) suitable for production workloads

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Look for retention policies or database size configurations

# Search for retention or storage related configurations
rg "retention|storage|backup" .azure/infrastructure/

Length of output: 952

Script:

#!/bin/bash
# Search for any documentation, issues, or PRs related to storage requirements
rg -i "storage|disk|space" --type md

# Look for database configurations that might indicate storage needs
rg -i "database|postgres" .azure/infrastructure/ -A 5

Length of output: 7131

sonarqubecloud · 2024-12-13T09:16:16Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

🤖 I have created a release *beep* *boop* --- ## [1.41.3](v1.41.2...v1.41.3) (2024-12-13) ### Bug Fixes * **azure:** adjust SKU and storage for staging ([#1601](#1601)) ([3fb9f95](3fb9f95)) * Collapse subject resource mappings before building sql query ([#1579](#1579)) ([b39c376](b39c376)) * **webapi:** Explicit null on non-nullable lists no longer causes 500 INTERNAL SERVER ERROR ([#1602](#1602)) ([2e8b3e6](2e8b3e6)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).

fix(azure): adjust SKU and storage for staging

8388b71

arealmaas requested review from a team as code owners December 13, 2024 09:14

coderabbitai bot reviewed Dec 13, 2024

View reviewed changes

oskogstad approved these changes Dec 13, 2024

View reviewed changes

arealmaas merged commit 3fb9f95 into main Dec 13, 2024
19 checks passed

arealmaas deleted the fix/upgrade-postgres-sku-in-staging branch December 13, 2024 10:15

dialogporten-bot mentioned this pull request Dec 13, 2024

chore(main): release 1.41.3 #1600

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(azure): adjust SKU and storage for staging #1601

fix(azure): adjust SKU and storage for staging #1601

arealmaas commented Dec 13, 2024

coderabbitai bot commented Dec 13, 2024 •

edited

Loading

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

coderabbitai bot left a comment

sonarqubecloud bot commented Dec 13, 2024

fix(azure): adjust SKU and storage for staging #1601

fix(azure): adjust SKU and storage for staging #1601

Conversation

arealmaas commented Dec 13, 2024

Description

Related Issue(s)

Verification

Documentation

coderabbitai bot commented Dec 13, 2024 • edited Loading

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

sonarqubecloud bot commented Dec 13, 2024

Quality Gate passed

coderabbitai bot commented Dec 13, 2024 •

edited

Loading