-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: add timeout for health checks #1388
Conversation
📝 Walkthrough📝 WalkthroughWalkthroughThe changes involve enhancements to the health check functionality for Redis and HTTP endpoints. The Changes
Possibly related PRs
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (3)
src/Digdir.Domain.Dialogporten.Infrastructure/HealthChecks/RedisHealthCheck.cs (2)
20-21
: Consider externalizing the timeout valueThe 40-second timeout value should be moved to configuration to allow easier adjustment without code changes.
+ private const int HealthCheckTimeoutSeconds = 40; public async Task<HealthCheckResult> CheckHealthAsync(HealthCheckContext context, CancellationToken cancellationToken = default) { using var cts = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken); - cts.CancelAfter(TimeSpan.FromSeconds(40)); + cts.CancelAfter(TimeSpan.FromSeconds(HealthCheckTimeoutSeconds));
29-32
: Enhance degraded state handlingTwo suggestions for improvement:
- Consider externalizing the 5-second degraded threshold to configuration
- Add logging when the connection is degraded for better monitoring capabilities
+ private const int DegradedThresholdSeconds = 5; + private readonly ILogger<RedisHealthCheck> _logger; + public RedisHealthCheck(IOptions<InfrastructureSettings> options, ILogger<RedisHealthCheck> logger) + { + _settings = options?.Value ?? throw new ArgumentNullException(nameof(options)); + _logger = logger ?? throw new ArgumentNullException(nameof(logger)); + } - if (sw.Elapsed > TimeSpan.FromSeconds(5)) + if (sw.Elapsed > TimeSpan.FromSeconds(DegradedThresholdSeconds)) { + _logger.LogWarning("Redis connection is experiencing slowness. Response time: {ResponseTime}s", sw.Elapsed.TotalSeconds); return HealthCheckResult.Degraded($"Redis connection is slow ({sw.Elapsed.TotalSeconds:N1}s)."); }src/Digdir.Library.Utils.AspNet/HealthChecks/EndpointsHealthCheck.cs (1)
36-36
: Consider extracting timeout values into configuration or constantsThe timeout durations (5 seconds for slow responses and 40 seconds for request cancellation) are hardcoded. For better maintainability and flexibility, consider extracting these values into configurable settings or constants. This approach allows for easier adjustments in the future without modifying the codebase.
Also applies to: 42-42
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (2)
src/Digdir.Domain.Dialogporten.Infrastructure/HealthChecks/RedisHealthCheck.cs
(1 hunks)src/Digdir.Library.Utils.AspNet/HealthChecks/EndpointsHealthCheck.cs
(1 hunks)
🔇 Additional comments (2)
src/Digdir.Domain.Dialogporten.Infrastructure/HealthChecks/RedisHealthCheck.cs (1)
20-39
: Implementation meets PR objectives with robust error handling
The changes successfully implement the required timeout functionality while also providing degraded state detection. The implementation is solid but could be enhanced with the suggested improvements for configuration and logging.
Let's verify if similar patterns are implemented consistently:
✅ Verification successful
Implementation follows consistent patterns across health checks
The Redis health check implementation aligns perfectly with the established patterns in the codebase, as evidenced by the EndpointsHealthCheck implementation. Both health checks consistently implement:
- 40-second timeout using CancellationTokenSource
- Performance degradation detection at 5-second threshold
- Proper error handling for timeouts
- Similar response patterns for health/degraded/unhealthy states
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check for similar health check implementations
rg -l "IHealthCheck" | xargs rg "CancellationTokenSource|Stopwatch"
Length of output: 687
Script:
#!/bin/bash
# Let's check the full implementation of EndpointsHealthCheck to compare patterns
rg -A 20 "EndpointsHealthCheck.*IHealthCheck"
Length of output: 2391
Script:
#!/bin/bash
# Let's see the rest of the EndpointsHealthCheck implementation to compare timeout and degraded state handling
rg -A 30 "CreateClient\(\);" src/Digdir.Library.Utils.AspNet/HealthChecks/EndpointsHealthCheck.cs
Length of output: 1590
src/Digdir.Library.Utils.AspNet/HealthChecks/EndpointsHealthCheck.cs (1)
35-36
: Timeout handling is correctly implemented
The use of CancellationTokenSource
with a 40-second timeout effectively prevents the HTTP requests from hanging indefinitely. Proper disposal using using var cts
ensures resources are managed correctly. Catching OperationCanceledException
appropriately handles timeout scenarios.
Also applies to: 53-57
src/Digdir.Domain.Dialogporten.Infrastructure/HealthChecks/RedisHealthCheck.cs
Outdated
Show resolved
Hide resolved
src/Digdir.Domain.Dialogporten.Infrastructure/HealthChecks/RedisHealthCheck.cs
Outdated
Show resolved
Hide resolved
src/Digdir.Library.Utils.AspNet/HealthChecks/EndpointsHealthCheck.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (3)
src/Digdir.Domain.Dialogporten.Infrastructure/HealthChecks/RedisHealthCheck.cs (1)
31-40
: Improve resource management with using declaration.While the connection monitoring logic is sound, the
ConnectionMultiplexer
should use a using declaration to ensure proper disposal in all code paths.- using var redis = await ConnectionMultiplexer.ConnectAsync(options); + await using var redis = await ConnectionMultiplexer.ConnectAsync(options);src/Digdir.Library.Utils.AspNet/HealthChecks/EndpointsHealthCheck.cs (2)
38-41
: Consider clarifying the response time measurement in logsThe stopwatch correctly measures the total request duration, but it includes both network latency and server processing time. Consider updating the log message to be more specific about what's being measured.
- _logger.LogWarning("Health check response was slow for endpoint: {Url}. Elapsed time: {Elapsed:N1}s", url, sw.Elapsed.TotalSeconds); + _logger.LogWarning("Health check total round-trip time was slow for endpoint: {Url}. Total time: {Elapsed:N1}s", url, sw.Elapsed.TotalSeconds);
53-56
: Consider distinguishing between timeout types in error handlingThe current implementation doesn't distinguish between cancellations due to the 40s timeout versus external cancellations. Consider checking the cancellation source.
- catch (OperationCanceledException) + catch (OperationCanceledException) when (cts.IsCancellationRequested && !cancellationToken.IsCancellationRequested) { _logger.LogWarning("Health check timed out for endpoint: {Url}", url); unhealthyEndpoints.Add($"{url} (Timeout after 40s)"); + } + catch (OperationCanceledException) + { + _logger.LogWarning("Health check was cancelled for endpoint: {Url}", url); + unhealthyEndpoints.Add($"{url} (Cancelled)"); }
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (2)
src/Digdir.Domain.Dialogporten.Infrastructure/HealthChecks/RedisHealthCheck.cs
(1 hunks)src/Digdir.Library.Utils.AspNet/HealthChecks/EndpointsHealthCheck.cs
(1 hunks)
🔇 Additional comments (5)
src/Digdir.Domain.Dialogporten.Infrastructure/HealthChecks/RedisHealthCheck.cs (3)
4-4
: LGTM! Proper dependency injection and null validation.
The logger integration follows best practices with proper null checks and immutable field declaration.
Also applies to: 10-10, 12-16
43-61
: LGTM! Comprehensive error handling with appropriate logging.
The error handling implementation:
- Catches specific Redis exceptions separately from general exceptions
- Uses appropriate logging levels (Warning for expected issues, Error for unexpected)
- Ensures the stopwatch is always stopped
- Provides detailed error messages
23-30
: 🛠️ Refactor suggestion
Review timeout configuration.
The current implementation has several concerns:
- The timeout of 15 seconds doesn't match the PR objective of 40 seconds for unhealthy state
- Using magic numbers (15000) reduces maintainability
- Using the same timeout for all operations (async, sync, connect) might not be optimal
Consider this refactoring:
- var timeout = 15000;
+ private const int UnhealthyTimeoutSeconds = 40;
+ private const int DegradedTimeoutSeconds = 5;
+ private const int RedisOperationTimeoutMs = UnhealthyTimeoutSeconds * 1000;
+
+ // In CheckHealthAsync:
+ var timeout = RedisOperationTimeoutMs;
src/Digdir.Library.Utils.AspNet/HealthChecks/EndpointsHealthCheck.cs (2)
35-37
: LGTM: Timeout implementation is well-structured
The implementation correctly uses a linked cancellation token source with a 40-second timeout, ensuring proper cleanup with the using
statement.
47-51
: LGTM: Status and response time evaluation logic is correct
The implementation properly evaluates the status code first and then checks for slow responses, which is the correct order of precedence. The 5-second threshold for marking as degraded aligns with the PR objectives.
Quality Gate passedIssues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (4)
src/Digdir.Domain.Dialogporten.Infrastructure/HealthChecks/RedisHealthCheck.cs (1)
26-33
: Consider making timeout configurable via settings.The timeout value is currently hardcoded. Consider moving it to the configuration to allow for environment-specific adjustments.
+ private const int DefaultTimeoutMilliseconds = 15_000; - const int timeout = 15_000; + var timeout = _settings.Redis.Timeout ?? DefaultTimeoutMilliseconds;src/Digdir.Library.Utils.AspNet/HealthChecks/EndpointsHealthCheck.cs (3)
14-15
: Consider making timeouts configurable via optionsWhile the hardcoded values align with requirements, consider moving these to the
AspNetUtilitiesSettings
to allow for environment-specific configuration.internal sealed class EndpointsHealthCheck : IHealthCheck { private readonly IHttpClientFactory _httpClientFactory; private readonly ILogger<EndpointsHealthCheck> _logger; private readonly List<string> _endpoints; - private const int DegradationThresholdInSeconds = 5; - private const int TimeoutInSeconds = 40; + private readonly int _degradationThresholdInSeconds; + private readonly int _timeoutInSeconds; public EndpointsHealthCheck( IHttpClientFactory httpClientFactory, ILogger<EndpointsHealthCheck> logger, IOptions<AspNetUtilitiesSettings> options) { _httpClientFactory = httpClientFactory ?? throw new ArgumentNullException(nameof(httpClientFactory)); _logger = logger ?? throw new ArgumentNullException(nameof(logger)); _endpoints = options.Value.HealthCheckSettings.HttpGetEndpointsToCheck; + _degradationThresholdInSeconds = options.Value.HealthCheckSettings.DegradationThresholdInSeconds ?? 5; + _timeoutInSeconds = options.Value.HealthCheckSettings.TimeoutInSeconds ?? 40; }
38-54
: Consider using async using for CancellationTokenSourceThe implementation looks good and correctly handles both status code and response time checks independently. Consider using
await using
for better resource cleanup in the async context.- using var cts = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken); + await using var cts = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken).ConfigureAsAsyncDisposable();
56-59
: Consider disposing of the request messageWhile the exception handling is good, consider using
using var request = new HttpRequestMessage(HttpMethod.Get, url)
to ensure proper resource cleanup, especially in timeout scenarios.try { await using var cts = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken).ConfigureAsAsyncDisposable(); cts.CancelAfter(TimeSpan.FromSeconds(TimeoutInSeconds)); + using var request = new HttpRequestMessage(HttpMethod.Get, url); var startTime = Stopwatch.GetTimestamp(); - var response = await client.GetAsync(url, cts.Token); + var response = await client.SendAsync(request, cts.Token); var responseTime = Stopwatch.GetElapsedTime(startTime);
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (2)
src/Digdir.Domain.Dialogporten.Infrastructure/HealthChecks/RedisHealthCheck.cs
(1 hunks)src/Digdir.Library.Utils.AspNet/HealthChecks/EndpointsHealthCheck.cs
(3 hunks)
🔇 Additional comments (4)
src/Digdir.Domain.Dialogporten.Infrastructure/HealthChecks/RedisHealthCheck.cs (4)
1-18
: LGTM! Well-structured dependency injection setup.
The addition of logging capabilities and the degradation threshold constant is implemented correctly with proper null checks.
34-44
: LGTM! Robust implementation of performance monitoring.
The implementation effectively monitors connection performance and properly handles degraded states when responses are slow.
48-62
: LGTM! Comprehensive error handling with appropriate logging.
The error handling implementation:
- Properly distinguishes between different types of failures
- Includes response times in error messages
- Uses appropriate logging levels for different scenarios
15-18
: Verify health check registration.
The constructor signature has changed with the addition of the logger parameter. Ensure that the health check registration in the DI container has been updated accordingly.
✅ Verification successful
Health check registration is properly configured
The health check registration is correctly set up. The RedisHealthCheck
is registered as a singleton service and properly added to the health checks pipeline with appropriate tags. The dependency injection container will handle the constructor parameters automatically since both IOptions<InfrastructureSettings>
and ILogger<RedisHealthCheck>
are standard services that are typically available in the DI container.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for RedisHealthCheck registration in the codebase
rg -l "RedisHealthCheck" | xargs rg "AddHealthChecks|services\.Add.*RedisHealthCheck"
Length of output: 319
Script:
#!/bin/bash
# Let's check the health check registration context
rg -A 5 -B 5 "AddHealthChecks\(\)|services\.Add.*RedisHealthCheck" src/Digdir.Domain.Dialogporten.Infrastructure/InfrastructureExtensions.cs
Length of output: 722
🤖 I have created a release *beep* *boop* --- ## [1.31.0](v1.30.0...v1.31.0) (2024-11-08) ### Features * **azure:** enable query performance insights for postgres ([#1417](#1417)) ([bb832d8](bb832d8)) ### Bug Fixes * add timeout for health checks ([#1388](#1388)) ([d68cc65](d68cc65)) * **azure:** set diagnostic setting to allow query perf insights ([#1422](#1422)) ([5919258](5919258)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Description
Related Issue(s)
Verification
Documentation
docs
-directory, Altinnpedia or a separate linked PR in altinn-studio-docs., if applicable)Summary by CodeRabbit
New Features
Bug Fixes