-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: collect pod metrics of duration of multiple stage in pod lifetime #217
Conversation
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe changes introduce a metrics collection system to the GreptimeDB operator. A new Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
7116264
to
8f585f4
Compare
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (6)
pkg/metrics/metrics_test.go (2)
8-33
: Consider enhancing test coverage and readabilityWhile the table-driven test pattern is well implemented, consider these improvements:
- Add test cases for edge cases and invalid inputs (e.g., malformed messages, missing duration)
- Include descriptive names for test cases to improve readability
- Make duration calculations more readable
Here's a suggested improvement:
func TestParseImagePullingDuration(t *testing.T) { tests := []struct { + name string message string duration time.Duration + wantErr bool }{ { + name: "valid duration in milliseconds", message: `Successfully pulled image "greptime/greptimedb:latest" in 314.950966ms (6.159733714s including waiting)`, - duration: time.Duration(314950966 * time.Nanosecond), + duration: 314*time.Millisecond + 950966*time.Nanosecond, }, { + name: "valid duration in seconds", message: `Successfully pulled image "greptime/greptimedb:latest" in 10s (6.159733714s including waiting)`, duration: time.Duration(10 * time.Second), }, + { + name: "invalid message format", + message: `Invalid message without duration`, + wantErr: true, + }, } for _, test := range tests { + t.Run(test.name, func(t *testing.T) { duration, err := parseImagePullingDuration(test.message) - if err != nil { + if (err != nil) != test.wantErr { t.Fatalf("parse image pulling duration from message %q: %v", test.message, err) } + if test.wantErr { + return + } + if duration != test.duration { t.Fatalf("expected duration: %v, got: %v", test.duration, duration) } + }) } }
35-60
: Enhance test coverage for image name parsingThe test structure is good, but consider these improvements:
- Add test cases for various image formats (e.g., with/without registry, different tag formats)
- Include negative test cases
- Add descriptive names for test cases
Here's a suggested improvement:
func TestParseImageName(t *testing.T) { tests := []struct { + name string message string - name string + imageName string + wantErr bool }{ { + name: "simple image with latest tag", message: `Successfully pulled image "greptime/greptimedb:latest" in 314.950966ms (6.159733714s including waiting)`, - name: "greptime/greptimedb:latest", + imageName: "greptime/greptimedb:latest", }, { + name: "image with version tag", message: `Successfully pulled image "greptime/greptimedb:v0.9.5" in 314.950966ms (6.159733714s including waiting)`, - name: "greptime/greptimedb:v0.9.5", + imageName: "greptime/greptimedb:v0.9.5", }, + { + name: "image with registry", + message: `Successfully pulled image "registry.example.com/greptime/greptimedb:latest" in 10s`, + imageName: "registry.example.com/greptime/greptimedb:latest", + }, + { + name: "invalid message format", + message: `Invalid message without image`, + wantErr: true, + }, } for _, test := range tests { + t.Run(test.name, func(t *testing.T) { - name, err := parseImageName(test.message) - if err != nil { + imageName, err := parseImageName(test.message) + if (err != nil) != test.wantErr { t.Fatalf("parse image name from message %q: %v", test.message, err) } - if name != test.name { - t.Fatalf("expected name: %q, got: %q", test.name, name) + if test.wantErr { + return + } + + if imageName != test.imageName { + t.Fatalf("expected name: %q, got: %q", test.imageName, imageName) } + }) } }controllers/greptimedbcluster/controller.go (2)
68-72
: Consider enhancing error handling with more contextWhile the error handling is present, it would be more helpful to wrap the error with additional context about the failure point.
Consider updating the error handling:
metricsCollector, err := metrics.NewMetricsCollector() if err != nil { - return err + return fmt.Errorf("failed to create metrics collector: %w", err) }
233-240
: Consider structured logging and simplified returnWhile the error handling approach is correct, there are two potential improvements:
- Use structured logging for better error tracking
- Simplify the return statement after error
Consider these improvements:
if cluster.Status.ClusterPhase == v1alpha1.PhaseRunning && r.MetricsCollector != nil { if err := r.MetricsCollector.CollectClusterPodMetrics(ctx, cluster); err != nil { - klog.Errorf("Failed to collect cluster pod metrics: '%v'", err) + klog.ErrorS(err, "Failed to collect cluster pod metrics", + "namespace", cluster.Namespace, + "name", cluster.Name) // We will not return error here because it is not a critical issue. - return ctrl.Result{}, nil + return ctrl.Result{RequeueAfter: defaultRequeueAfter}, nil } }The
RequeueAfter
ensures we retry the metrics collection after a delay, which is more robust than simply continuing without retry.pkg/metrics/metrics.go (2)
164-167
: Continue processing pods even if one failsIn
collectPodMetricsByRole
, returning an error immediately upon failure to collect metrics for a pod will skip metrics collection for the remaining pods. To improve resilience, consider logging the error and continuing with the next pod.Apply this diff to handle errors gracefully:
for _, pod := range pods { if err := c.collectPodMetrics(ctx, cluster.Name, &pod, role); err != nil { - return err + // Log the error and continue processing + log.Error(err, "Failed to collect metrics for pod", "pod", pod.Name) } }Ensure that a logger (
log
) is available in this context.
44-93
: Reduce label cardinality in Prometheus metricsUsing dynamic labels like
pod
andcontainer
names can lead to high cardinality in Prometheus metrics, impacting performance and storage. It's recommended to use more static labels.Apply this diff to remove high-cardinality labels:
- []string{"namespace", "resource", "pod", "node", "role"}, + []string{"namespace", "resource", "node", "role"},This change applies to all metric definitions where
pod
andcontainer
labels are used.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (3)
controllers/greptimedbcluster/controller.go
(4 hunks)pkg/metrics/metrics.go
(1 hunks)pkg/metrics/metrics_test.go
(1 hunks)
🔇 Additional comments (3)
pkg/metrics/metrics_test.go (1)
1-6
: LGTM! Clean package structure and imports
The package name matches the directory structure, and imports are minimal and appropriate for the testing requirements.
controllers/greptimedbcluster/controller.go (2)
40-40
: LGTM: Import statement is correctly placed
The metrics package import follows the project's import organization pattern.
55-58
: LGTM: Reconciler struct field addition is well-structured
The MetricsCollector field is properly added as a pointer type and follows the existing field organization pattern.
Summary by CodeRabbit
New Features
Bug Fixes
Tests