Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(azure): adjust SKU and storage for staging #1601

Merged
merged 1 commit into from
Dec 13, 2024

Conversation

arealmaas
Copy link
Collaborator

Description

Related to https://digdir.slack.com/archives/C079D6PAGDS/p1734018202302459

Experiencing heavy CPU usage on the postgresql server in staging. After some investigations, it was hard to pin down the cause. Being on a burstable tier may be too fragile, and perhaps contribute to the high CPU usage, so upgrading. Staging will be used by many test environments moving forward so making it more stable with GeneralPurpose machines makes sense.

Related Issue(s)

  • #N/A

Verification

  • Your code builds clean without any errors or warnings
  • Manual testing done (required)
  • Relevant automated test added (if you find this hard, leave it and we'll help out)

Documentation

  • Documentation is updated (either in docs-directory, Altinnpedia or a separate linked PR in altinn-studio-docs., if applicable)

@arealmaas arealmaas requested review from a team as code owners December 13, 2024 09:14
Copy link
Contributor

coderabbitai bot commented Dec 13, 2024

📝 Walkthrough

Walkthrough

The pull request introduces several modifications to the .azure/infrastructure/staging.bicepparam file, specifically targeting the PostgreSQL database configuration. Key changes include updates to the SKU name, tier, storage size, and the enabling of index tuning. The overall structure of the parameter definitions remains unchanged, with the adjustments reflecting a transition to a more robust database setup.

Changes

File Change Summary
.azure/infrastructure/staging.bicepparam Updated postgresConfiguration:
- SKU name: Standard_B1msStandard_D4ads_v5
- SKU tier: BurstableGeneralPurpose
- Storage size: 32 GB256 GB
- Enable index tuning: falsetrue
- enableQueryPerformanceInsight remains true.

Possibly related PRs

Suggested reviewers

  • oskogstad

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Experiment)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
.azure/infrastructure/staging.bicepparam (2)

42-42: Enabling index tuning is a good performance optimization

Enabling enableIndexTuning is a positive change that will help:

  • Automatically optimize query performance
  • Reduce manual database maintenance
  • Complement the upgraded hardware resources

This is particularly beneficial when supporting multiple test environments.

Consider implementing monitoring to track the effectiveness of automatic index tuning and query performance insights. This will help validate the infrastructure changes and identify any further optimizations needed.


34-42: Consider implementing auto-scaling policies

While the upgrade addresses immediate performance concerns, consider implementing auto-scaling policies for:

  • Storage auto-scaling (already enabled with autoGrow: 'Enabled')
  • Compute resource scaling based on usage patterns
  • Cost optimization during low-usage periods

This would provide better resource utilization across multiple test environments.

Would you like assistance in implementing auto-scaling policies?

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b39c376 and 8388b71.

📒 Files selected for processing (1)
  • .azure/infrastructure/staging.bicepparam (1 hunks)
🔇 Additional comments (2)
.azure/infrastructure/staging.bicepparam (2)

34-35: SKU upgrade aligns with objectives but verify cost implications

The change from Standard_B1ms (Burstable) to Standard_D4ads_v5 (GeneralPurpose) addresses the high CPU usage issue and provides:

  • 4 vCPUs and memory-optimized configuration
  • Consistent performance without throttling
  • Better suited for multiple test environments

However, this is a significant upgrade that will increase costs.

Let's verify if this is the most cost-effective SKU for the requirements:

✅ Verification successful

SKU choice is appropriate based on environment alignment

The SKU upgrade from Burstable to Standard_D4ads_v5 is justified based on the following findings:

  • Test environment uses Burstable B2s tier (lower resources)
  • Production uses Standard_D8ads_v5 GeneralPurpose tier (higher resources)
  • Staging's Standard_D4ads_v5 serves as a middle ground between test and production
  • The upgrade aligns with the environment hierarchy and supports multiple test environments

The chosen SKU provides a balanced approach:

  • More powerful than test (B2s)
  • Less expensive than production (D8ads_v5)
  • Consistent performance needed for staging environment
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Search for any monitoring or metrics data that might help justify the SKU choice

# Check if there are any monitoring configurations or metrics collection
rg "metrics|monitoring|performance" .azure/infrastructure/

Length of output: 60


Script:

#!/bin/bash
# Let's check the current and previous SKU configurations to understand the scale of change
rg -A 3 "name:|tier:" .azure/infrastructure/

# Also check for any comments or documentation about performance requirements
rg -i "cpu|memory|performance|load" .azure/infrastructure/

Length of output: 12144


38-38: Verify if 256GB storage increase is justified

The storage increase from 32GB to 256GB is substantial. While this provides headroom for growth, it's important to ensure this aligns with actual usage patterns.

Let's check for any data retention policies or database size configurations:

✅ Verification successful

Let me gather more context about storage usage patterns and configurations.


Storage increase to 256GB appears justified

The storage increase is aligned with production requirements based on the following evidence:

  • Production (prod.bicepparam) and YT01 (yt01.bicepparam) environments are already configured with 256GB storage
  • The change history in CHANGELOG.md shows deliberate SKU and storage adjustments for production environments
  • There's a documented risk of storage pressure from WAL files in the CDC documentation, indicating the need for adequate storage capacity
  • The environment uses a higher-tier SKU (Standard_D4ads_v5) suitable for production workloads
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Look for retention policies or database size configurations

# Search for retention or storage related configurations
rg "retention|storage|backup" .azure/infrastructure/

Length of output: 952


Script:

#!/bin/bash
# Search for any documentation, issues, or PRs related to storage requirements
rg -i "storage|disk|space" --type md

# Look for database configurations that might indicate storage needs
rg -i "database|postgres" .azure/infrastructure/ -A 5

Length of output: 7131

@arealmaas arealmaas merged commit 3fb9f95 into main Dec 13, 2024
19 checks passed
@arealmaas arealmaas deleted the fix/upgrade-postgres-sku-in-staging branch December 13, 2024 10:15
arealmaas pushed a commit that referenced this pull request Dec 13, 2024
🤖 I have created a release *beep* *boop*
---


##
[1.41.3](v1.41.2...v1.41.3)
(2024-12-13)


### Bug Fixes

* **azure:** adjust SKU and storage for staging
([#1601](#1601))
([3fb9f95](3fb9f95))
* Collapse subject resource mappings before building sql query
([#1579](#1579))
([b39c376](b39c376))
* **webapi:** Explicit null on non-nullable lists no longer causes 500
INTERNAL SERVER ERROR
([#1602](#1602))
([2e8b3e6](2e8b3e6))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants