-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: obsy tests #578
fix: obsy tests #578
Conversation
WalkthroughThe changes in this pull request involve modifications to the Changes
Sequence Diagram(s)sequenceDiagram
participant Test as TestObservabilityCollector
participant Prometheus as Prometheus Config
participant Service as Service
participant Telemetry as Telemetry Struct
participant Logging as LoggingExporter
Test->>Prometheus: Set prometheusArgs with dynamic config
Prometheus-->>Test: Return updated config
Test->>Service: Initialize Service with Telemetry
Service->>Telemetry: Set Address and Level
Telemetry-->>Service: Return initialized Telemetry
Service->>Test: Return Service with valid metrics and traces
Test->>Logging: Set LoggingExporter with log level
Logging-->>Test: Confirm LoggingExporter setup
Possibly related PRs
Suggested reviewers
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🔇 Additional comments (1)e2e/system/build_image_test.go (1)
Test might not effectively validate build arguments. The change to Let's verify the Dockerfile contents to understand how the MESSAGE build arg is used: Suggestion: Consider changing back to a more distinctive message or using a unique format like: - expectedData = "Hello, World!"
+ expectedData = "BuildArg_Test_Message_12345" This would make it clear that the value comes from the build argument rather than being hardcoded in the Dockerfile. Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (4)
e2e/basic/observability_test.go (2)
29-31
: LGTM! Consider making the sleep interval configurable.The grouped variable declarations and use of
fmt.Sprintf
improve code quality. However, the 5-second sleep interval in the command is hardcoded. Consider making it configurable to align with the test's configurable nature.const ( namePrefix = "observability" scrapeInterval = "2s" + tracePushInterval = "5s" ) var ( - targetStartCommand = fmt.Sprintf("while true; do curl -X POST http://localhost:%d/v1/traces; sleep 5; done", otlpPort) + targetStartCommand = fmt.Sprintf("while true; do curl -X POST http://localhost:%d/v1/traces; sleep %s; done", otlpPort, tracePushInterval) ctx = context.Background() )
84-85
: Consider making the wait time configurable or dynamic.While reducing the wait time from 60s to 20s aligns with the faster scrape interval, fixed wait times can make tests flaky. Consider:
- Making the wait time configurable like other parameters
- Implementing a polling mechanism instead of a fixed wait
const ( namePrefix = "observability" scrapeInterval = "2s" + dataWaitTime = 20 * time.Second ) -s.T().Log("Waiting 20 seconds for the target pod to push data to the otel collector") -time.Sleep(20 * time.Second) +s.T().Logf("Waiting %s for the target pod to push data to the otel collector", dataWaitTime) +time.Sleep(dataWaitTime)Or better, implement polling:
// Add this helper function func waitForMetrics(ctx context.Context, prometheusURL string, timeout time.Duration) error { ticker := time.NewTicker(time.Second) defer ticker.Stop() deadline := time.Now().Add(timeout) for time.Now().Before(deadline) { select { case <-ctx.Done(): return ctx.Err() case <-ticker.C: resp, err := http.Get(prometheusURL) if err == nil && resp.StatusCode == http.StatusOK { return nil } } } return fmt.Errorf("timeout waiting for metrics") }pkg/sidecars/observability/otel.go (2)
Line range hint
147-159
: Consider enhancing type safety for telemetry configuration.The
Level
field inMetricsTelemetry
allows arbitrary strings but comments indicate specific valid options ("basic", "normal", "detailed"). Consider:
- Using an enum-like type for better type safety
- Adding validation for the Address field format
+ type TelemetryLevel string + const ( + BasicTelemetryLevel TelemetryLevel = "basic" + NormalTelemetryLevel TelemetryLevel = "normal" + DetailedTelemetryLevel TelemetryLevel = "detailed" + ) type MetricsTelemetry struct { Address string `yaml:"address,omitempty"` - Level string `yaml:"level,omitempty"` + Level TelemetryLevel `yaml:"level,omitempty"` }
Line range hint
408-420
: Consider making telemetry defaults configurable.The telemetry configuration currently uses hardcoded constants (
internalTelemetryEndpoint
andbasicTelemetryLevel
). Consider making these configurable through theobsyConfig
struct to support different deployment scenarios.This would allow users to:
- Configure custom telemetry endpoints
- Set different default telemetry levels
- Adapt the configuration based on environment needs
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
e2e/basic/observability_test.go
(5 hunks)pkg/sidecars/observability/otel.go
(2 hunks)
🔇 Additional comments (7)
e2e/basic/observability_test.go (4)
50-51
: LGTM! Configuration is now properly templated.
The Prometheus configuration now correctly uses the configurable scrapeInterval
and properly formats the target configuration.
Also applies to: 55-55
65-65
: LGTM! Improved isolation with scoped naming.
The addition of the knuu scope to the Prometheus endpoint name improves test isolation and maintains consistency with the configurable scrape interval.
70-70
: LGTM! Consistent naming convention.
The hyphen separator in the instance name maintains consistency with other instance naming patterns.
17-17
: LGTM with a note about the scrape interval.
The constant refactoring improves configurability. However, the 2-second scrape interval is quite aggressive. While this may be fine for testing, please verify that this interval doesn't cause any performance issues during the test execution.
Also applies to: 26-28
✅ Verification successful
Let me analyze the test file further to understand the context of the scrape interval.
Let me check the test duration and execution context to make a final assessment.
2-second scrape interval is appropriate for this test context
The 2-second scrape interval is actually well-suited for this test because:
- This is a focused E2E test that verifies the observability collector setup
- The test sends metrics every 5 seconds (
sleep 5
intargetStartCommand
) - The test runs with a short context timeout (30 seconds, as seen in earlier results)
- No timeout or performance issues were found in the logs related to this interval
The aggressive scrape interval ensures we can collect multiple data points within the short test duration, making it appropriate for testing purposes rather than production use.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if there are any timeout or performance related issues in the test logs
# Test: Search for timeout or performance related errors in the test logs
rg -i "timeout|performance|too many requests|rate limit"
Length of output: 6778
Script:
#!/bin/bash
# Check the test implementation to understand the duration and usage of metrics collection
rg -A 10 "scrapeInterval" e2e/basic/observability_test.go
Length of output: 1440
Script:
#!/bin/bash
# Check the test duration and context
rg -B 5 -A 5 "func.*Test.*Observability" e2e/basic/observability_test.go
Length of output: 396
pkg/sidecars/observability/otel.go (3)
362-367
: LGTM! Good validation for incomplete metrics pipeline.
The added validation ensures that incomplete metrics pipeline configurations (those without both receivers and exporters) are properly handled by resetting to an empty pipeline.
387-391
: LGTM! Good validation for incomplete traces pipeline.
The added validation ensures that incomplete traces pipeline configurations (those without both receivers and exporters) are properly handled by resetting to an empty pipeline.
Line range hint 362-391
: Verify impact on celestia-node integration.
The new validation logic for metrics and traces pipelines could affect the celestia-node integration. Please ensure:
- The celestia-node's existing telemetry configuration remains compatible
- The pipeline validation doesn't unintentionally disable required telemetry features
✅ Verification successful
Let's search for actual usage patterns and test configurations to understand how the telemetry is configured and validate if the changes maintain compatibility.
Let's search for the configuration structure and its usage to understand how telemetry is configured across the codebase.
Pipeline validation changes are safe and maintain compatibility
Based on the code analysis:
- The validation logic in
prepareMetricsForServicePipeline
andprepareTracesForServicePipeline
correctly handles telemetry configuration:- Only creates pipelines when both receivers and exporters are configured
- Maintains existing behavior where partial configurations (only receivers or only exporters) are treated as invalid
- The e2e tests in
e2e/basic/observability_test.go
demonstrate proper integration with both OTLP and Jaeger configurations - The configuration structure and helper methods show clear separation between receiver and exporter settings, ensuring backward compatibility
The changes improve configuration validation without breaking existing celestia-node integration patterns.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for celestia-node telemetry configuration patterns
echo "Checking celestia-node telemetry configuration..."
rg -l "metrics.*pipeline|traces.*pipeline" | grep -i "celestia"
# Look for potential telemetry configuration in tests
echo "Checking test configurations..."
rg -l "metrics.*telemetry|traces.*telemetry" | grep "_test.go$"
Length of output: 311
Script:
#!/bin/bash
# Let's try a broader search for telemetry configuration patterns
echo "Checking telemetry configuration in tests..."
rg -l "NewObsy|otlpPort|jaegerEndpoint" pkg/sidecars/observability/otel_test.go
echo "Checking usage patterns..."
rg -A 5 "NewObsy"
echo "Checking test configurations..."
ast-grep --pattern 'func Test$_($_) {
$$$
obsyConfig := $_
$$$
}'
Length of output: 488
Script:
#!/bin/bash
# Let's check the entire codebase for telemetry configuration patterns
echo "Checking for telemetry configuration patterns..."
rg -A 5 "type ObsyConfig struct"
echo "Checking for telemetry usage..."
rg -A 5 "otlpPort|jaegerEndpoint|otlpEndpoint"
echo "Checking for test files..."
fd "_test\.go$"
Length of output: 12718
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (7)
pkg/sidecars/observability/helpers.go (1)
89-100
: Consider adding log level validation.The implementation follows the established pattern and properly handles empty values. However, consider validating the provided logLevel against a set of allowed values to prevent configuration errors.
Example validation:
func (o *Obsy) SetLoggingExporter(logLevel string) error { if err := o.validateStateForObsy("Logging exporter"); err != nil { return err } if logLevel == "" { logLevel = defaultLoggingExporterLogLevel + } else { + validLevels := map[string]bool{"debug": true, "info": true, "warn": true, "error": true} + if !validLevels[strings.ToLower(logLevel)] { + return fmt.Errorf("invalid log level: %s", logLevel) + } } o.obsyConfig.loggingExporterLogLevel = logLevel return nil }pkg/sidecars/observability/obsy.go (3)
78-80
: Add documentation for the loggingExporterLogLevel field.Please document the valid values for this field to help users understand what log levels are supported.
prometheusRemoteWriteExporterEndpoint string + // loggingExporterLogLevel is the log level for the logging exporter + // Valid values are: "debug", "info", "warn", "error" loggingExporterLogLevel string
Line range hint
89-96
: Initialize loggingExporterLogLevel in New() constructor.The new field should be initialized with a default value in the constructor to ensure consistent behavior.
func New() *Obsy { return &Obsy{ obsyConfig: &ObsyConfig{ otelCollectorVersion: DefaultOtelCollectorVersion, + loggingExporterLogLevel: "info", // Set default log level }, image: fmt.Sprintf(DefaultImage, DefaultOtelCollectorVersion), } }
Line range hint
89-96
: Consider adding a constant for the default log level.To maintain consistency and make it easier to update the default value, consider defining it as a constant.
const ( DefaultOtelOtlpPort = 8888 DefaultOtelMetricsPort = 9090 DefaultImage = "otel/opentelemetry-collector-contrib:%s" DefaultOtelCollectorVersion = "0.83.0" + DefaultLoggingExporterLogLevel = "info" otelAgentName = "otel-agent" // %s will be replaced with the otelCollectorVersion
Then use it in the constructor:
func New() *Obsy { return &Obsy{ obsyConfig: &ObsyConfig{ otelCollectorVersion: DefaultOtelCollectorVersion, + loggingExporterLogLevel: DefaultLoggingExporterLogLevel, }, image: fmt.Sprintf(DefaultImage, DefaultOtelCollectorVersion), } }
e2e/basic/observability_test.go (3)
76-76
: Inconsistent naming pattern for target instances.There's an inconsistency in the naming pattern:
- Line 76:
namePrefix + "-target"
- Line 136:
namePrefix + "target"
(missing hyphen)This could cause confusion in logs and monitoring.
Apply this fix in the new test:
-target, err := s.Knuu.NewInstance(namePrefix + "target") +target, err := s.Knuu.NewInstance(namePrefix + "-target")Also applies to: 136-136
122-167
: LGTM: Well-structured test for logging functionality.The test provides good coverage for the new logging feature with robust verification using regex patterns.
Consider extracting the hardcoded port 8888 in
targetStartCommand
to a constant for better maintainability:const ( namePrefix = "observability" + defaultTracesPort = 8888 ) - targetStartCommand = "while true; do curl -X POST http://localhost:8888/v1/traces; sleep 2; done" + targetStartCommand = fmt.Sprintf("while true; do curl -X POST http://localhost:%d/v1/traces; sleep 2; done", defaultTracesPort)
Line range hint
29-146
: Consider reducing code duplication in test setup.Both test functions share similar setup code for creating and configuring instances. Consider extracting common setup logic into helper methods to improve maintainability.
Example structure:
func (s *Suite) setupTarget(ctx context.Context, namePrefix, startCommand string) (*knuu.Instance, error) { target, err := s.Knuu.NewInstance(namePrefix + "-target") if err != nil { return nil, err } if err := target.Build().SetImage(ctx, curlImage); err != nil { return nil, err } if err := target.Build().SetStartCommand("sh", "-c", startCommand); err != nil { return nil, err } return target, nil }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (4)
e2e/basic/observability_test.go
(5 hunks)pkg/sidecars/observability/helpers.go
(1 hunks)pkg/sidecars/observability/obsy.go
(1 hunks)pkg/sidecars/observability/otel.go
(7 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/sidecars/observability/otel.go
🔇 Additional comments (4)
pkg/sidecars/observability/helpers.go (1)
95-97
: Verify the default log level constant.
The code references defaultLoggingExporterLogLevel
which should be defined elsewhere. Let's verify its existence and value.
✅ Verification successful
Default log level constant is correctly defined as "debug" in the codebase
The constant defaultLoggingExporterLogLevel
is properly defined in pkg/sidecars/observability/otel.go
with a value of "debug", which is referenced by the code under review.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for the defaultLoggingExporterLogLevel constant definition
rg "defaultLoggingExporterLogLevel\s*=\s*" --type go
Length of output: 134
pkg/sidecars/observability/obsy.go (1)
78-80
: Verify the integration with other components.
The new field is part of a larger change set affecting multiple files. Let's verify its integration:
✅ Verification successful
Let me analyze the implementation details and integration points to provide a more comprehensive verification.
Integration verified and properly implemented
The loggingExporterLogLevel
field is well integrated across the codebase:
- Properly initialized through
SetLoggingExporter
helper with a default value of "debug" - Used in
createLoggingExporter
to configure the logging exporter - Correctly handled in the exporter creation flow
- Covered by e2e tests in the observability test suite
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check how the new field is used across the codebase
# Test 1: Find where the field is used
echo "Searching for direct usage of loggingExporterLogLevel..."
rg "loggingExporterLogLevel"
# Test 2: Find related logging exporter code
echo "Searching for related logging exporter code..."
rg "LoggingExporter"
# Test 3: Find test coverage
echo "Searching for related test cases..."
rg -t go "Test.*Logging.*Exporter"
Length of output: 1860
Script:
#!/bin/bash
# Let's verify the integration more thoroughly
# Check the full context of the logging exporter configuration and usage
echo "Checking the full context of logging exporter in otel.go..."
rg -A 10 -B 10 "createLoggingExporter"
# Check the full context of SetLoggingExporter helper
echo "Checking the SetLoggingExporter helper implementation..."
rg -A 10 -B 10 "SetLoggingExporter"
# Check the test coverage more broadly
echo "Checking broader test coverage..."
rg -t go "obsy.*logging" -i
Length of output: 6705
e2e/basic/observability_test.go (2)
8-9
: LGTM: Well-structured constant definitions.
The new constants improve test configurability and readability. The retry intervals and timeouts are reasonable for e2e tests.
Also applies to: 24-25, 31-34
90-119
: LGTM: Robust implementation of Eventually block.
The implementation addresses the previous review comment about flakiness. The error handling is comprehensive with proper logging and timeout management.
Signed-off-by: Jose Ramon Mañes <jose@celestia.org>
There was an issue with the obsy sidecar and the test which was blocking the celestia-node integration. This PR proposes a solution for it.
Thank you @smuu for helping in finding the bug.
Summary by CodeRabbit
New Features
Bug Fixes