Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: add StartUpProbe and reset the default values of probe for stability #197

Merged
merged 3 commits into from
Oct 25, 2024

Conversation

zyy17
Copy link
Collaborator

@zyy17 zyy17 commented Oct 23, 2024

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced StartupProbe to enhance pod initialization monitoring.
    • Added comprehensive health check configurations (livenessProbe, readinessProbe, startupProbe) for GreptimeDBCluster and GreptimeDBStandalone.
    • Updated HTTP port for health checks to improve consistency across configurations.
  • Documentation

    • Expanded API reference documentation to include details on new health check mechanisms and resource specifications.
  • Chores

    • Updated CRDs for GreptimeDBCluster and GreptimeDBStandalone to include new startupProbe configurations.

Copy link
Contributor

coderabbitai bot commented Oct 23, 2024

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The changes in this pull request introduce a new StartupProbe field to the MainContainerSpec struct and enhance the health probe configurations for GreptimeDBCluster and GreptimeDBStandalone. The new probes include StartupProbe, LivenessProbe, and ReadinessProbe, which are integrated into various configurations and templates. Additionally, updates are made to the related YAML files and CRDs to support these new probes, improving health monitoring capabilities during the application's lifecycle.

Changes

File Path Change Summary
apis/v1alpha1/common.go Added field: StartupProbe *corev1.Probe in MainContainerSpec.
apis/v1alpha1/defaulting.go Introduced defaultStartupProbe(), defaultLivenessProbe(), and defaultReadinessProbe() methods.
apis/v1alpha1/testdata/defaulting/greptimedbcluster/merge/test00/expect.yaml Updated httpPort to 9000 and standardized probe configurations across sections.
apis/v1alpha1/testdata/defaulting/greptimedbcluster/merge/test00/input.yaml Added new field: httpPort: 9000 in frontend section.
apis/v1alpha1/testdata/defaulting/greptimedbcluster/setdefaults/test00/expect.yaml Added readinessProbe and startupProbe configurations, updated livenessProbe.
apis/v1alpha1/testdata/defaulting/greptimedbcluster/setdefaults/test01/expect.yaml Added readinessProbe and startupProbe, updated livenessProbe.
apis/v1alpha1/testdata/defaulting/greptimedbcluster/setdefaults/test02/expect.yaml Added livenessProbe, readinessProbe, and startupProbe configurations.
apis/v1alpha1/testdata/defaulting/greptimedbcluster/setdefaults/test03/expect.yaml Added probes under base and monitoring, updated storageSize.
apis/v1alpha1/testdata/defaulting/greptimedbstandalone/test00/expect.yaml Added readinessProbe and startupProbe, updated livenessProbe.
apis/v1alpha1/zz_generated.deepcopy.go Updated DeepCopyInto and DeepCopy methods to handle StartupProbe.
config/crd/resources/greptime.io_greptimedbclusters.yaml Added startupProbe configurations to GreptimeDBCluster CRD.
config/crd/resources/greptime.io_greptimedbstandalones.yaml Added startupProbe configurations to GreptimeDBStandalone CRD.
docs/api-references/docs.md Added documentation for the startupProbe field in MainContainerSpec.
manifests/bundle.yaml Added GreptimeDBStandalone CRD and updated GreptimeDBCluster CRD.
manifests/crds.yaml Added startupProbe to GreptimeDBCluster and GreptimeDBStandalone CRDs.

Possibly related PRs

Suggested reviewers

  • daviderli614

Poem

Hop, hop, a new probe's here,
To check the startup, never fear!
With health checks strong, our pods will thrive,
In GreptimeDB, we come alive!
So let’s rejoice and dance about,
For healthy apps, there’s no doubt! 🐇✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Outside diff range and nitpick comments (11)
apis/v1alpha1/testdata/defaulting/greptimedbcluster/merge/test00/input.yaml (1)

Line range hint 1-38: Consider adding comments to clarify test scenarios.

Since this is a test input file for defaulting behavior, it would be helpful to add comments explaining:

  • The purpose of this test case
  • The expected defaulting behavior
  • Why certain values (like httpPorts) are set the way they are
apis/v1alpha1/testdata/defaulting/greptimedbcluster/setdefaults/test02/expect.yaml (1)

38-51: Consider using different endpoints for different probe types.

Currently, all probes are using the same /health endpoint. Consider implementing and using:

  • /livez for liveness (basic ping)
  • /readyz for readiness (checks if the database can accept connections)
  • /startupz for startup (checks if initial setup is complete)

This separation would:

  1. Reduce load on a single endpoint
  2. Allow for different health check depths
  3. Provide clearer metrics about specific probe failures
apis/v1alpha1/testdata/defaulting/greptimedbcluster/setdefaults/test01/expect.yaml (1)

27-40: Consider adding timeoutSeconds to all probes.

The probe configurations are missing timeoutSeconds parameters. This is important to prevent probe operations from hanging.

Add timeoutSeconds to all probes. Example for livenessProbe:

 livenessProbe:
   httpGet:
     path: /health
     port: 4000
   periodSeconds: 5
   failureThreshold: 10
+  timeoutSeconds: 3
apis/v1alpha1/testdata/defaulting/greptimedbcluster/merge/test00/expect.yaml (2)

155-168: LGTM! Consider documenting probe configurations.

The probe configurations are consistent with other components. Consider adding a comment in the documentation about the probe timing configurations and their rationale:

  • 5s/10 failures for liveness/readiness
  • 10s/30 failures for startup

Line range hint 1-168: Consider implementing graduated probe thresholds.

While the current probe configuration is solid, consider implementing graduated thresholds based on component complexity:

  • More complex components (like datanode) might benefit from longer startup periods
  • Components with heavy initialization (like meta) might need different readiness thresholds
docs/api-references/docs.md (1)

566-566: Format the URL consistently with backticks.

For consistency with other URLs in the documentation, wrap the Kubernetes documentation URL in backticks.

-This cannot be updated. More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes
+This cannot be updated. More info: `https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes`
🧰 Tools
🪛 Markdownlint

566-566: null
Bare URL used

(MD034, no-bare-urls)

config/crd/resources/greptime.io_greptimedbstandalones.yaml (2)

2039-2116: LGTM! The startup probe schema is well-structured.

The implementation follows Kubernetes API conventions and properly defines all necessary fields for a startup probe. The schema matches the structure of other probe types (liveness, readiness) and includes proper validation.

Consider documenting the default values that will be used for the probe parameters (e.g., failureThreshold, periodSeconds, etc.) in the CRD description or in the operator's documentation to help users understand the default behavior.


2039-2116: Great addition for improving container lifecycle management!

The startup probe addition provides several architectural benefits:

  • Prevents premature container restarts by allowing proper initialization time
  • Clearly separates startup checks from runtime health checks
  • Improves stability by preventing liveness/readiness probes from running until the application is fully initialized

Consider documenting recommended startup probe configurations for different deployment scenarios (e.g., development vs. production, different storage backends) to help users optimize the stability of their deployments.

manifests/crds.yaml (1)

2050-2127: LGTM! The startupProbe implementation follows Kubernetes best practices.

The addition of startupProbe configurations across container specs is well-structured and consistent with Kubernetes probe standards. This will help better handle slow-starting containers by preventing the liveness probe from interfering with the startup process.

Consider adding:

  1. Documentation comments above the startupProbe sections to explain the default values that will be used for parameters like failureThreshold, periodSeconds etc.
  2. Example configurations in comments to help users understand common use cases

Also applies to: 4951-5028, 7826-7903, 10731-10808, 13659-13736, 16516-16593, 19882-19959

manifests/bundle.yaml (1)

2057-2134: LGTM! Comprehensive implementation of startupProbe across all components.

The startupProbe schema is consistently implemented across all components (base, datanode, flownode, frontend, meta, and monitoring) with complete configuration options including:

  • exec probes
  • HTTP/HTTPS probes
  • TCP socket probes
  • gRPC probes
  • All standard probe parameters (timeouts, thresholds, etc.)

Consider documenting recommended probe configurations for different components:

  • Initial delay based on typical startup time
  • Appropriate failure thresholds for different components
  • Suggested probe types (HTTP vs TCP vs gRPC) for each component

Also applies to: 4958-5035, 7833-7910, 10738-10815, 13666-13743, 16523-16600, 19889-19966

apis/v1alpha1/defaulting.go (1)

Line range hint 285-288: Address the TODO for Flownode Probe Configuration

The TODO comment indicates that the liveness probe for the flownode is currently not needed and will be added in the future. Consider implementing or updating the probe configurations for the flownode when appropriate.

Would you like assistance in implementing the liveness probe for the flownode or opening a GitHub issue to track this task?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 46697f2 and 6d5a81f.

📒 Files selected for processing (16)
  • apis/v1alpha1/common.go (1 hunks)
  • apis/v1alpha1/constants.go (1 hunks)
  • apis/v1alpha1/defaulting.go (7 hunks)
  • apis/v1alpha1/testdata/defaulting/greptimedbcluster/merge/test00/expect.yaml (4 hunks)
  • apis/v1alpha1/testdata/defaulting/greptimedbcluster/merge/test00/input.yaml (1 hunks)
  • apis/v1alpha1/testdata/defaulting/greptimedbcluster/setdefaults/test00/expect.yaml (1 hunks)
  • apis/v1alpha1/testdata/defaulting/greptimedbcluster/setdefaults/test01/expect.yaml (1 hunks)
  • apis/v1alpha1/testdata/defaulting/greptimedbcluster/setdefaults/test02/expect.yaml (1 hunks)
  • apis/v1alpha1/testdata/defaulting/greptimedbcluster/setdefaults/test03/expect.yaml (2 hunks)
  • apis/v1alpha1/testdata/defaulting/greptimedbstandalone/test00/expect.yaml (1 hunks)
  • apis/v1alpha1/zz_generated.deepcopy.go (1 hunks)
  • config/crd/resources/greptime.io_greptimedbclusters.yaml (6 hunks)
  • config/crd/resources/greptime.io_greptimedbstandalones.yaml (1 hunks)
  • docs/api-references/docs.md (1 hunks)
  • manifests/bundle.yaml (7 hunks)
  • manifests/crds.yaml (7 hunks)
🧰 Additional context used
🪛 Markdownlint
docs/api-references/docs.md

566-566: null
Bare URL used

(MD034, no-bare-urls)

🔇 Additional comments (30)
apis/v1alpha1/testdata/defaulting/greptimedbcluster/merge/test00/input.yaml (1)

19-19: Verify the intention of multiple httpPort configurations.

The configuration has two httpPort definitions:

  • Frontend specific: 9000 (line 19)
  • Root level: 5000 (line 38)

While this might be intentional for testing defaulting behavior, please verify if this is the expected configuration.

Also applies to: 38-38

apis/v1alpha1/testdata/defaulting/greptimedbstandalone/test00/expect.yaml (1)

27-32: LGTM: Startup probe configuration follows Kubernetes best practices.

The startup probe configuration with periodSeconds: 10 and failureThreshold: 30 provides a generous 300-second window for application initialization, which is appropriate for database systems that might need extended startup time.

apis/v1alpha1/testdata/defaulting/greptimedbcluster/setdefaults/test00/expect.yaml (4)

27-28: LGTM: Reasonable liveness probe timing configuration.

The periodic check every 5 seconds with 10 retries provides a good balance between quick detection of issues and avoiding false positives.


35-40: LGTM: Well-configured startup probe for database initialization.

The startup probe configuration with a 10-second period and 30 failure threshold (allowing up to 5 minutes for startup) is appropriate for a database system that may need time for initial setup, data loading, or recovery operations.


27-40: Verify probe coordination during container lifecycle.

The probe configuration creates a good progression from startup through normal operation. However, we should verify that these probes work well together during container lifecycle events:

  1. Startup probe allows up to 5 minutes for initialization
  2. Once startup succeeds, readiness and liveness probes take over with faster checking
  3. All probes target the same endpoint

Let's check for any existing test coverage of these probe configurations:

#!/bin/bash
# Search for probe-related tests
rg -g '*_test.go' -i 'probe|health.*test'

29-34: 🛠️ Refactor suggestion

Consider using different endpoints for readiness and liveness.

While the timing configuration is appropriate, using the same /health endpoint for both readiness and liveness probes might not provide adequate differentiation between service readiness and process liveness. Consider implementing a more specific readiness check endpoint.

Let's verify if there are different health check endpoints available:

apis/v1alpha1/testdata/defaulting/greptimedbcluster/setdefaults/test01/expect.yaml (2)

35-40: Verify startupProbe timing for database initialization.

The startupProbe configuration needs careful consideration:

  • periodSeconds: 10 is reasonable for startup checks
  • failureThreshold: 30 allows 300 seconds (5 minutes) for startup, which might need adjustment based on:
    • Database size
    • Initial data loading requirements
    • System resources

Let's verify typical startup times:

#!/bin/bash
# Search for documentation or comments about startup time expectations
rg -l "startup.*time|init.*time|bootstrap.*time" --type md

27-28: Review the livenessProbe timing values.

The livenessProbe configuration looks reasonable, but consider these timing values:

  • periodSeconds: 5 might be too frequent for a database health check
  • failureThreshold: 10 (50 seconds total) is a good buffer to prevent premature restarts

Let's verify if these values align with other GreptimeDB configurations:

✅ Verification successful

Let me try a different search pattern to find probe configurations in YAML files.

This should help us find all probe-related configurations in YAML files and verify if the timing values are consistent across the codebase.


Based on the search results, I can now verify the probe configurations. The output shows that the livenessProbe timing values (periodSeconds: 5, failureThreshold: 10) are consistently used across multiple test files and configurations for GreptimeDB clusters and standalones. These values appear to be the standard default values for the health checks.

The livenessProbe configuration is appropriate

The timing values are well-balanced for a database service:

  • periodSeconds: 5 provides frequent enough checks without overwhelming the system
  • failureThreshold: 10 gives a total of 50 seconds (5s × 10) before considering the container unhealthy, which is reasonable for a database to recover from temporary issues
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for other probe configurations in the codebase
rg -A 10 "livenessProbe:" --type yaml

Length of output: 88


Script:

#!/bin/bash
# Search for probe configurations in YAML files with more context
rg -U "livenessProbe:|readinessProbe:|startupProbe:" -A 10 --type yaml

Length of output: 188333

apis/v1alpha1/testdata/defaulting/greptimedbcluster/setdefaults/test03/expect.yaml (3)

14-27: Well-structured probe configuration!

The probe configuration follows Kubernetes best practices:

  • Appropriate timing values (5s for liveness/readiness, 10s for startup)
  • Reasonable failure thresholds (10 for liveness/readiness, 30 for startup)
  • Consistent health endpoint across all probes

112-112: Verify the increased storage requirements.

The storage size increase from 10Gi to 30Gi for monitoring data is significant. While this should provide better stability for long-term monitoring, please verify:

  • Historical data retention requirements
  • Expected monitoring data growth rate
  • Impact on resource costs

Line range hint 1-138: Verify impact of removed objectStorage configuration.

The AI summary indicates that the objectStorage configuration was removed from the monitoring section. Please verify:

  • Whether monitoring data needs persistent storage
  • Impact on monitoring data retention
  • Migration plan for existing monitoring data (if any)
apis/v1alpha1/constants.go (1)

46-48: Please clarify the rationale for 30Gi monitoring data size.

The constant addition follows the proper naming and documentation conventions. However, the choice of 30Gi (3x the DefaultDataSize) needs clarification:

  • What factors determined this specific size?
  • Are there any calculations or metrics that justify 30Gi as the optimal value?
  • Should this value be configurable based on cluster size or monitoring requirements?

Let's check if this value is referenced in monitoring configurations:

apis/v1alpha1/testdata/defaulting/greptimedbcluster/merge/test00/expect.yaml (2)

34-47: LGTM! Well-structured probe configuration.

The probe configurations follow Kubernetes best practices with appropriate thresholds:

  • Liveness/Readiness probes: 5s period with 10 failures allows for temporary hiccups
  • Startup probe: 10s period with 30 failures provides adequate time for initialization

110-123: LGTM! Consistent probe configuration for meta component.

The probe configurations maintain consistency with other components while keeping the appropriate port (4000) for the meta service.

apis/v1alpha1/common.go (2)

246-254: LGTM! Well-documented Kubernetes-style field addition.

The StartupProbe field is correctly implemented with:

  • Proper type (*corev1.Probe) matching Kubernetes conventions
  • Comprehensive documentation explaining the probe's behavior
  • Correct placement alongside other probe types
  • Appropriate optional field marker

246-254: Verify integration with probe configuration system.

Since this adds a new probe type alongside existing LivenessProbe and ReadinessProbe, please ensure:

  1. Default values are properly configured in the defaulting webhook
  2. Documentation is updated to explain probe configuration best practices
  3. Existing deployments won't be affected by this addition
✅ Verification successful

Let me check the documentation content and webhook integration to make a complete assessment.

The first script will show us if the documentation has been properly updated with StartupProbe information. The second script will help verify if the defaulting webhook properly integrates all probe types including the new StartupProbe.


Let me check one final aspect about the webhook defaulting logic to ensure proper integration.

This will help us verify if the StartupProbe is properly integrated into the defaulting webhook and if there are any validation rules we need to consider.


Based on the results from all our verification attempts, I can now provide a final assessment:

StartupProbe integration appears to be properly implemented

The verification shows:

  1. Default values are properly configured with defaultStartupProbe() function in apis/v1alpha1/defaulting.go
  2. Documentation is updated in docs/api-references/docs.md with complete StartupProbe description and behavior
  3. The probe is properly integrated alongside existing LivenessProbe and ReadinessProbe
  4. Since StartupProbe is marked as optional (+optional) and uses omitempty in json tag, existing deployments won't be affected by this addition

The implementation follows Kubernetes probe configuration patterns and includes necessary documentation, defaulting logic, and backward compatibility considerations.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for probe-related configuration in defaulting logic
echo "Checking for probe configuration in defaulting logic..."
rg -A 10 "func.*[Dd]efault.*[Pp]robe" 

# Check for probe-related documentation updates
echo "Checking for documentation updates..."
fd -e md | xargs rg -l "probe"

Length of output: 2285


Script:

#!/bin/bash
# Check documentation content for probe information
echo "Checking probe documentation content..."
cat docs/api-references/docs.md | grep -A 10 -B 10 "probe"

# Check webhook integration
echo "Checking webhook integration..."
rg -A 5 "Default.*Probe.*=" apis/v1alpha1/defaulting.go
rg "func.*[Ww]ebhook.*" apis/v1alpha1/

Length of output: 16791


Script:

#!/bin/bash
# Check how probes are set in defaulting webhook
echo "Checking probe defaulting in webhook..."
rg -B 5 -A 15 "SetDefault.*Pod" apis/v1alpha1/defaulting.go

# Check if there are any validation rules for probes
echo "Checking validation rules..."
rg -A 10 "validate.*[Pp]robe" apis/v1alpha1/

Length of output: 264

apis/v1alpha1/zz_generated.deepcopy.go (1)

689-693: LGTM! Auto-generated deepcopy implementation.

The StartupProbe field is correctly implemented in the MainContainerSpec's DeepCopyInto method, following the same pattern as other probe fields. Remember that this is an auto-generated file and any changes should be made to the source structs instead.

docs/api-references/docs.md (1)

566-566: LGTM! Well-documented startup probe field.

The documentation for the startupProbe field is comprehensive and follows the established format, providing clear explanations of its purpose and behavior in pod initialization.

🧰 Tools
🪛 Markdownlint

566-566: null
Bare URL used

(MD034, no-bare-urls)

config/crd/resources/greptime.io_greptimedbclusters.yaml (2)

2051-2128: LGTM! Well-structured startupProbe implementation.

The startupProbe schema has been consistently added across all container specs (base, datanode, flownode, frontend) with proper Kubernetes probe configuration including:

  • Multiple probe handlers (exec, httpGet, tcpSocket, grpc)
  • Standard timing configurations
  • Threshold settings for failure and success

This addition will help ensure proper component initialization before other probes begin checking, improving overall cluster stability.

Also applies to: 4952-5029, 7827-7904, 10732-10809


2051-2128: Verify probe handler configurations in defaulting code.

The startupProbe schema looks good, but we should verify that appropriate default values are being set in the defaulting code as mentioned in the PR objectives.

✅ Verification successful

Let me gather more context about where these default probes are being used and the constants they reference.


Let me check where these probes are being used in the defaulting code by looking at the context around lines 105-107 and 317-319.


I now have enough information to provide the final response based on the verification results.

Default probe configurations are properly implemented

The defaulting code correctly sets up all three types of probes (startup, liveness, and readiness) with appropriate configurations:

  • All probes use HTTP health checks on the /health endpoint (DefaultHealthEndpoint)
  • Default HTTP port 4000 is consistently used across the codebase
  • Probes are properly configured in both GreptimeDBCluster and GreptimeDBStandalone specs
  • Each probe type has specific timing configurations suitable for their purpose:
    • Startup probe: 10s period
    • Liveness probe: 5s period, 10 failure threshold
    • Readiness probe: 5s period, 10 failure threshold

The schema in the CRD matches these implementations, allowing users to override these defaults when needed.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for probe default configurations
rg "func.*defaultStartupProbe|func.*defaultLivenessProbe|func.*defaultReadinessProbe" -A 10

Length of output: 2074


Script:

#!/bin/bash
# Search for the constants used in probe configurations
rg "DefaultHealthEndpoint|DefaultHTTPPort" -B 2 -A 2

# Search for where these default probes are being used
ast-grep --pattern 'defaultStartupProbe()'
ast-grep --pattern 'defaultLivenessProbe()'
ast-grep --pattern 'defaultReadinessProbe()'

Length of output: 4698


Script:

#!/bin/bash
# Get context around the probe usage in defaulting code
rg "StartupProbe.*defaultStartupProbe|LivenessProbe.*defaultLivenessProbe|ReadinessProbe.*defaultReadinessProbe" -B 10 -A 2

Length of output: 1695

manifests/bundle.yaml (1)

2057-2134: Verify probe configuration completeness.

The startupProbe schema includes all necessary fields for Kubernetes probes:

  • Standard probe handlers (exec, http, tcp, grpc)
  • Timing configurations (initialDelaySeconds, timeoutSeconds, periodSeconds)
  • Threshold settings (failureThreshold, successThreshold)
✅ Verification successful

Since kubectl is not available in the sandbox environment, let me try a different approach to verify the probe configuration completeness by analyzing the schema directly from the codebase.


Based on the code search results, I can verify that the startupProbe schema in the manifest includes all the necessary fields for Kubernetes probes. Let me generate the final response:

✓ Probe configuration is complete and follows Kubernetes standards

The startupProbe schema includes all standard Kubernetes probe fields:

  • Handlers: exec (with command), httpGet (with path, port, headers), tcpSocket (with host, port), and grpc (with port, service)
  • Timing configurations: initialDelaySeconds, periodSeconds, timeoutSeconds, terminationGracePeriodSeconds
  • Threshold settings: failureThreshold, successThreshold
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify that no probe fields are missing compared to standard Kubernetes probe schema
kubectl explain pod.spec.containers.startupProbe --recursive | grep -v "^  *$" > /tmp/k8s_probe
grep -r "startupProbe:" . -A 50 | grep "^ *[a-zA-Z]" | sort -u > /tmp/crd_probe
diff /tmp/k8s_probe /tmp/crd_probe

Length of output: 255


Script:

#!/bin/bash
# Search for other probe definitions in the codebase to compare with startupProbe
rg "Probe|probe" -A 5 -B 5 manifests/

# Also check for any Kubernetes API schema references or documentation
fd "README|SCHEMA|API" --type f -X cat {}

Length of output: 100934

apis/v1alpha1/defaulting.go (9)

105-107: Default Probes Added to Base Main Container

The addition of StartupProbe, LivenessProbe, and ReadinessProbe to the base main container enhances the stability and readiness checks of the cluster components.


223-231: Monitoring Standalone Uses File Storage by Default

Using file storage for the monitoring standalone simplifies configuration and may improve performance for monitoring purposes.


245-247: Probes Configured with Frontend HTTP Port

Updating the probe ports to match the Frontend HTTP port ensures proper health checking of the frontend component.


261-263: Probes Configured with Meta HTTP Port

Configuring the probes with the Meta HTTP port ensures accurate health checks for the meta component.


277-279: Probes Configured with Datanode HTTP Port

Setting the probe ports to the Datanode HTTP port correctly configures health checks for the datanode component.


317-319: Default Probes Added to Standalone Main Container

Adding the StartupProbe, LivenessProbe, and ReadinessProbe to the standalone's main container enhances its stability and readiness monitoring.


377-392: Review defaultStartupProbe Configuration

The defaultStartupProbe function is properly defined. The FailureThreshold of 30 and PeriodSeconds of 10 allow the container up to 300 seconds to start, which accommodates scenarios where the database may take longer to initialize.


394-406: Review defaultLivenessProbe Configuration

The defaultLivenessProbe function is appropriately defined. With a FailureThreshold of 10 and a PeriodSeconds of 5, the container will be restarted if it fails the liveness check for 50 seconds, which balances responsiveness and fault tolerance.


408-420: Review defaultReadinessProbe Configuration

The defaultReadinessProbe function is suitably configured. It checks the container's readiness every 5 seconds, and after 10 consecutive failures (50 seconds), the container is deemed not ready, ensuring accurate reflection of its state in service endpoints.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (8)
docs/api-references/docs.md (1)

566-566: LGTM with a minor formatting suggestion.

The documentation for the startupProbe field is comprehensive and well-structured. However, the bare URL should be formatted as a Markdown link for consistency with other documentation.

Apply this change:

-More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes
+More info: `https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes`
🧰 Tools
🪛 Markdownlint

566-566: null
Bare URL used

(MD034, no-bare-urls)

config/crd/resources/greptime.io_greptimedbstandalones.yaml (1)

2039-2116: LGTM! StartupProbe schema follows Kubernetes conventions.

The startupProbe implementation correctly follows the standard Kubernetes probe structure with all necessary fields:

  • exec, httpGet, tcpSocket, and gRPC handlers
  • Common probe parameters (failureThreshold, periodSeconds, etc.)
  • Proper type definitions and validations

Consider documenting recommended probe configurations for different deployment scenarios, such as:

  • Memory-intensive deployments that need longer startup times
  • Deployments with heavy initialization processes
  • Different storage backend configurations
config/crd/resources/greptime.io_greptimedbclusters.yaml (1)

2051-2128: Consider documenting recommended startupProbe configurations.

While the probe structure is well-defined, it would be helpful to provide documentation or examples of recommended startupProbe configurations for different components (datanode, frontend, etc.) to guide users in setting appropriate startup checks.

manifests/crds.yaml (1)

2050-2127: LGTM! The addition of startupProbe enhances container lifecycle management.

The addition of the startupProbe field across all components is well-structured and follows Kubernetes best practices. This will help improve stability by allowing fine-grained control over container startup checks.

Consider adding documentation comments in the CRD to describe:

  • The purpose and recommended usage of startupProbe for GreptimeDB components
  • Example configurations for different probe types (exec, httpGet, tcpSocket)
  • Recommended values for timing parameters (initialDelaySeconds, periodSeconds, etc.)

Also applies to: 4951-5028, 7826-7903, 10731-10808, 13659-13736, 16516-16593, 19882-19959

manifests/bundle.yaml (1)

2057-2134: Comprehensive addition of startup probes across all components

The addition of startup probe configurations across all components (base, datanode, flownode, frontend, meta, and standalone) is well structured and follows Kubernetes best practices. The probe configuration supports all check types:

  • exec commands
  • HTTP GET requests
  • gRPC health checks
  • TCP socket checks

Consider the following recommendations for probe configuration:

  1. Set appropriate initialDelaySeconds and periodSeconds based on your application startup characteristics
  2. Configure failureThreshold to allow enough retries during startup
  3. Use startupProbe with higher thresholds for slow-starting containers while keeping livenessProbe aggressive

Also applies to: 4958-5035, 7833-7910, 10738-10815, 13666-13743, 19889-19966

apis/v1alpha1/defaulting.go (3)

223-223: Fix grammatical error in comment.

The comment should read: "the monitoring standalone uses file storage by default."

Apply this diff to correct the typo:

-// For better performance and easy management, the monitoring standalone use file storage by default.
+// For better performance and easy management, the monitoring standalone uses file storage by default.

293-295: Address the TODO comment for Flownode probes.

The code indicates that Flownode does not currently require probes, but there's a TODO to add them in the future. Implementing appropriate probes can enhance the health monitoring of the Flownode component.

Would you like assistance in implementing the probes for Flownode?


379-422: Consider setting TimeoutSeconds for probes to prevent hanging requests.

Currently, the probes do not specify TimeoutSeconds. Without timeouts, a hung request could delay failure detection. Consider adding TimeoutSeconds to the probes to prevent prolonged waits on unresponsive endpoints.

Apply this diff to set TimeoutSeconds to a reasonable value (e.g., 5 seconds):

 func defaultStartupProbe() *corev1.Probe {
 	// When StartupProbe is successful, the liveness probe and readiness probe will be enabled.
 	return &corev1.Probe{
 		ProbeHandler: corev1.ProbeHandler{
 			HTTPGet: &corev1.HTTPGetAction{
 				Path: DefaultHealthEndpoint,
 				Port: intstr.FromInt32(DefaultHTTPPort),
 			},
 		},
 		PeriodSeconds: 10,
+		TimeoutSeconds: 5,
 		// The StartupProbe can try up to 30 * 10 = 300 seconds to start the container.
 		// For some scenarios, the database may take a long time to start, so we set the failure threshold to 30.
 		FailureThreshold: 30,
 	}
 }

 func defaultLivenessProbe() *corev1.Probe {
 	// If the liveness probe fails, the container will be restarted.
 	return &corev1.Probe{
 		ProbeHandler: corev1.ProbeHandler{
 			HTTPGet: &corev1.HTTPGetAction{
 				Path: DefaultHealthEndpoint,
 				Port: intstr.FromInt32(DefaultHTTPPort),
 			},
 		},
 		PeriodSeconds:    5,
+		TimeoutSeconds: 5,
 		FailureThreshold: 10,
 	}
 }

 func defaultReadinessProbe() *corev1.Probe {
 	// If the readiness probe fails, the container will be removed from the service endpoints.
 	return &corev1.Probe{
 		ProbeHandler: corev1.ProbeHandler{
 			HTTPGet: &corev1.HTTPGetAction{
 				Path: DefaultHealthEndpoint,
 				Port: intstr.FromInt32(DefaultHTTPPort),
 			},
 		},
 		PeriodSeconds:    5,
+		TimeoutSeconds: 5,
 		FailureThreshold: 10,
 	}
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 6d5a81f and fd1a016.

📒 Files selected for processing (15)
  • apis/v1alpha1/common.go (1 hunks)
  • apis/v1alpha1/defaulting.go (8 hunks)
  • apis/v1alpha1/testdata/defaulting/greptimedbcluster/merge/test00/expect.yaml (4 hunks)
  • apis/v1alpha1/testdata/defaulting/greptimedbcluster/merge/test00/input.yaml (1 hunks)
  • apis/v1alpha1/testdata/defaulting/greptimedbcluster/setdefaults/test00/expect.yaml (1 hunks)
  • apis/v1alpha1/testdata/defaulting/greptimedbcluster/setdefaults/test01/expect.yaml (1 hunks)
  • apis/v1alpha1/testdata/defaulting/greptimedbcluster/setdefaults/test02/expect.yaml (1 hunks)
  • apis/v1alpha1/testdata/defaulting/greptimedbcluster/setdefaults/test03/expect.yaml (2 hunks)
  • apis/v1alpha1/testdata/defaulting/greptimedbstandalone/test00/expect.yaml (1 hunks)
  • apis/v1alpha1/zz_generated.deepcopy.go (1 hunks)
  • config/crd/resources/greptime.io_greptimedbclusters.yaml (6 hunks)
  • config/crd/resources/greptime.io_greptimedbstandalones.yaml (1 hunks)
  • docs/api-references/docs.md (1 hunks)
  • manifests/bundle.yaml (7 hunks)
  • manifests/crds.yaml (7 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
  • apis/v1alpha1/common.go
  • apis/v1alpha1/testdata/defaulting/greptimedbcluster/merge/test00/expect.yaml
  • apis/v1alpha1/testdata/defaulting/greptimedbcluster/merge/test00/input.yaml
  • apis/v1alpha1/testdata/defaulting/greptimedbcluster/setdefaults/test00/expect.yaml
  • apis/v1alpha1/testdata/defaulting/greptimedbcluster/setdefaults/test01/expect.yaml
  • apis/v1alpha1/testdata/defaulting/greptimedbcluster/setdefaults/test02/expect.yaml
  • apis/v1alpha1/testdata/defaulting/greptimedbcluster/setdefaults/test03/expect.yaml
  • apis/v1alpha1/testdata/defaulting/greptimedbstandalone/test00/expect.yaml
🧰 Additional context used
🪛 Markdownlint
docs/api-references/docs.md

566-566: null
Bare URL used

(MD034, no-bare-urls)

🔇 Additional comments (8)
apis/v1alpha1/zz_generated.deepcopy.go (1)

689-693: LGTM! Auto-generated deepcopy implementation.

The deepcopy implementation for the new StartupProbe field follows the correct pattern and matches the implementation of other probe fields.

Let's verify the field is properly defined in the source types:

✅ Verification successful

Let me try a different approach to verify the field definition.

These commands will help us:

  1. Find the complete MainContainerSpec struct definition
  2. Locate any StartupProbe field definitions
  3. Identify the source type file where MainContainerSpec should be defined

Let me check the complete MainContainerSpec definition and search for StartupProbe with a different pattern.


StartupProbe field is properly defined in the source types

The verification confirms that:

  1. The StartupProbe field is correctly defined in apis/v1alpha1/common.go as part of the MainContainerSpec struct
  2. The field has the correct type *corev1.Probe and proper JSON tags json:"startupProbe,omitempty"
  3. The field is well-documented with appropriate comments explaining its purpose and behavior

Therefore, the auto-generated deepcopy implementation in zz_generated.deepcopy.go is correctly reflecting the source type definition.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify StartupProbe field definition in MainContainerSpec
# Expect: Field definition with proper type and tags

ast-grep --pattern 'type MainContainerSpec struct {
  $$$
  StartupProbe *v1.Probe `json:"startupProbe,omitempty"`
  $$$
}'

Length of output: 125


Script:

#!/bin/bash
# Search for MainContainerSpec struct definition
rg "type MainContainerSpec struct" -A 20 --no-heading

Length of output: 1667


Script:

#!/bin/bash
# Search for StartupProbe field in all Go files
rg "StartupProbe.*\*v1\.Probe" --type go --no-heading

Length of output: 55


Script:

#!/bin/bash
# List all Go files in apis/v1alpha1 directory to find the source type definition
fd -e go . apis/v1alpha1

Length of output: 354


Script:

#!/bin/bash
# Search for MainContainerSpec struct definition with more context lines
rg "type MainContainerSpec struct" -A 50 --no-heading apis/v1alpha1/common.go

Length of output: 2734


Script:

#!/bin/bash
# Search for StartupProbe with a broader pattern
rg "StartupProbe.*omitempty" --type go --no-heading

Length of output: 137


Script:

#!/bin/bash
# Look for recent changes in common.go that might have added the StartupProbe
git log -p apis/v1alpha1/common.go | grep -A 5 "StartupProbe"

Length of output: 780

config/crd/resources/greptime.io_greptimedbstandalones.yaml (1)

2039-2116: Verify probe configuration consistency.

The startupProbe schema matches the structure of livenessProbe and readinessProbe, which is correct. However, we should verify that the default values for these probes are properly configured in the controller.

Also applies to: 1972-2038

✅ Verification successful

Let me generate another script to get more context about the probe configurations.

The previous results showed the default probe configurations, but we need to verify how they're being used and ensure the constants are properly defined. This will help us confirm the consistency of probe configurations across the codebase.


Based on the comprehensive results, I can now generate the final response:

Probe configurations are properly implemented and consistent

The probe configurations are well-defined and properly implemented in the controller:

  • Default probe configurations are defined in apis/v1alpha1/defaulting.go with appropriate settings for all three probe types
  • Constants for health endpoint (/health) and HTTP port (4000) are properly defined in apis/v1alpha1/constants.go
  • Probe configurations are consistently applied across frontend, meta, and datanode components
  • StartupProbe has an appropriate failure threshold of 30 with a 10-second period (allowing up to 300 seconds for startup)
  • The probes are properly initialized with default values and their ports are correctly updated based on the component's HTTP port
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for probe default configurations in the controller code
# Look for probe initialization code
rg -A 10 "func.*[Dd]efault.*[Pp]robe|initializeProbe" 

# Look for startup probe specific configurations
rg -A 5 "StartupProbe.*=.*&.*Probe{|startupProbe.*=.*&.*Probe{"

Length of output: 2264


Script:

#!/bin/bash
# Check for probe-related constants and configurations
rg -A 3 "Default.*Port|Default.*Endpoint"

# Look for probe initialization in the codebase
rg -A 5 "StartupProbe.*=|startupProbe.*=" 

# Check where these default probes are being used
rg -A 5 "defaultStartupProbe|defaultLivenessProbe|defaultReadinessProbe"

Length of output: 10058

config/crd/resources/greptime.io_greptimedbclusters.yaml (3)

2051-2128: Well-structured implementation of startupProbe across container specs.

The startupProbe configuration has been consistently implemented across all container specifications (base, main, meta, frontend, monitoring, and additional containers) with proper Kubernetes probe structure including:

  • Multiple probe handlers (exec, httpGet, tcpSocket, grpc)
  • Standard timing parameters (initialDelaySeconds, periodSeconds, etc.)
  • Proper validation schemas for all fields

This implementation will help improve container startup reliability by:

  1. Preventing premature liveness/readiness probes during slow-starting containers
  2. Providing better startup state management
  3. Allowing fine-grained control over startup behavior

Also applies to: 4952-5029, 7827-7904, 10732-10809, 13660-13737, 16517-16594


2070-2072: Consistent GRPC probe port requirement across all specs.

The GRPC probe configuration consistently requires the port field across all container specifications, which is the correct approach for GRPC health checks.

Also applies to: 4971-4973, 7846-7848, 10751-10753, 13679-13681, 16536-16538


2051-2128: Verify probe handler implementations in the controller.

Since this CRD adds startupProbe support, we should verify that the controller properly implements these probe handlers.

✅ Verification successful

Let me gather more information about how the probes are handled in the codebase.


Based on the gathered information, I can now provide a final response:

The startupProbe support is properly implemented in the codebase

The code shows that:

  1. The MainContainerSpec in apis/v1alpha1/common.go properly defines the StartupProbe field that maps directly to Kubernetes' corev1.Probe
  2. The CRD schema in config/crd/resources/greptime.io_greptimedbclusters.yaml correctly defines all the necessary probe properties
  3. The controller's common package in controllers/common/common.go handles probe configurations by mapping them from the template to the container spec
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for startupProbe handler implementations in the controller code
rg -l "startupProbe" | grep -v "test\|mock\|fake"

Length of output: 260


Script:

#!/bin/bash
# Check the implementation in common.go since it appeared in the results
cat apis/v1alpha1/common.go

# Also search for probe-related code in the controllers
rg -g "controllers/**" "Probe" -A 5 -B 5

Length of output: 29261

manifests/bundle.yaml (1)

2057-2134: Verify probe configuration values

The startupProbe configuration needs appropriate default values to be effective.

✅ Verification successful

Let me search for more specific information about the probe configurations to better understand the default values.


Let me search for actual probe configurations in use to understand the default values better.


Based on the search results, I can now provide a final response since I found the actual probe configurations in use that can guide the startup probe configuration.

Startup probe configuration should align with existing probe patterns

The startupProbe configuration should follow similar patterns to the existing probes in the manifest:

  • Liveness probe uses: initialDelaySeconds: 15, periodSeconds: 20
  • Readiness probe uses: initialDelaySeconds: 5, periodSeconds: 10

For startup probe, consider using:

  • initialDelaySeconds: 5 (similar to readiness probe since it's the initial check)
  • periodSeconds: 10 (balanced frequency like readiness probe)
  • failureThreshold: 30 (to allow sufficient time for startup)

The probe schema is correctly defined with all necessary fields (exec, httpGet, tcpSocket, grpc handlers and timing parameters).

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any existing probe configurations to compare with new startup probe settings
rg -A 5 "livenessProbe:|readinessProbe:" manifests/

Length of output: 32637


Script:

#!/bin/bash
# Search for probe configurations with their timing values
rg -A 10 "initialDelaySeconds:|periodSeconds:|timeoutSeconds:|failureThreshold:|successThreshold:" manifests/

Length of output: 329287


Script:

#!/bin/bash
# Search for actual probe configurations with their values
rg -A 10 "startupProbe:|livenessProbe:|readinessProbe:" manifests/ --no-filename

Length of output: 63060

apis/v1alpha1/defaulting.go (2)

105-107: LGTM: Default probes enhance container health monitoring.

The addition of StartupProbe, LivenessProbe, and ReadinessProbe with default configurations improves the reliability of the application.


319-321: LGTM: Default probes added to GreptimeDBStandalone.

The inclusion of default probes enhances the health monitoring of the standalone deployment.

apis/v1alpha1/defaulting.go Show resolved Hide resolved
apis/v1alpha1/defaulting.go Show resolved Hide resolved
apis/v1alpha1/defaulting.go Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (6)
apis/v1alpha1/defaulting.go (1)

293-295: Consider enhancing the TODO comment.

While setting probes to nil for flownode is valid, the TODO comment could be more descriptive about why liveness probe isn't needed and what future conditions would warrant adding it.

Consider updating the comment to:

-// TODO(zyy17): The flownode does not need liveness probe and will be added in the future.
+// TODO(zyy17): Flownode currently doesn't require health monitoring as it [explain why].
+// Health probes will be added when [specify conditions/requirements].
docs/api-references/docs.md (1)

566-566: Fix bare URL in documentation.

The URL in the documentation should be properly formatted as a Markdown link.

Apply this diff to fix the formatting:

-More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes
+More info: `https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes`
🧰 Tools
🪛 Markdownlint

566-566: null
Bare URL used

(MD034, no-bare-urls)

config/crd/resources/greptime.io_greptimedbclusters.yaml (1)

2051-2128: Consider documenting recommended probe settings.

While the probe structure is well-defined, it would be helpful to add comments or documentation about recommended settings for the startup probe parameters (like failureThreshold, periodSeconds, etc.) specific to GreptimeDB components.

manifests/crds.yaml (1)

2050-2127: LGTM! The StartupProbe additions enhance container health monitoring.

The addition of startupProbe fields across all container specifications is well-structured and follows Kubernetes probe schema conventions. This enhancement will help better manage container startup health checks.

Consider adding documentation or comments about recommended probe settings for different components (e.g., suggested initialDelaySeconds, periodSeconds, failureThreshold values) to help users configure appropriate startup checks for their deployments.

Also applies to: 4951-5028, 7826-7903, 10731-10808, 13659-13736, 16516-16593, 19882-19959

manifests/bundle.yaml (2)

Line range hint 1-21966: Consider adding default probe values in the operator deployment

While the CRDs now support startupProbe configuration, the operator deployment itself doesn't specify any default values. Consider adding reasonable defaults for stability:

 spec:
   template:
     spec:
       containers:
       - name: manager
         livenessProbe:
           httpGet:
             path: /healthz
             port: 9494
           initialDelaySeconds: 15
           periodSeconds: 20
+        startupProbe:
+          httpGet:
+            path: /healthz
+            port: 9494
+          failureThreshold: 30
+          periodSeconds: 10
         readinessProbe:
           httpGet:
             path: /readyz
             port: 9494
           initialDelaySeconds: 5
           periodSeconds: 10

Line range hint 1-21966: Consider adding probe configuration documentation

The CRDs now support comprehensive probe configurations, but this flexibility might be overwhelming for users.

Consider adding documentation (e.g., in comments or separate docs) that explains:

  • The purpose and benefits of each probe type
  • Recommended probe configurations for different scenarios
  • Best practices for setting probe parameters
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between fd1a016 and f9c3fc7.

📒 Files selected for processing (15)
  • apis/v1alpha1/common.go (1 hunks)
  • apis/v1alpha1/defaulting.go (7 hunks)
  • apis/v1alpha1/testdata/defaulting/greptimedbcluster/merge/test00/expect.yaml (4 hunks)
  • apis/v1alpha1/testdata/defaulting/greptimedbcluster/merge/test00/input.yaml (1 hunks)
  • apis/v1alpha1/testdata/defaulting/greptimedbcluster/setdefaults/test00/expect.yaml (1 hunks)
  • apis/v1alpha1/testdata/defaulting/greptimedbcluster/setdefaults/test01/expect.yaml (1 hunks)
  • apis/v1alpha1/testdata/defaulting/greptimedbcluster/setdefaults/test02/expect.yaml (1 hunks)
  • apis/v1alpha1/testdata/defaulting/greptimedbcluster/setdefaults/test03/expect.yaml (2 hunks)
  • apis/v1alpha1/testdata/defaulting/greptimedbstandalone/test00/expect.yaml (1 hunks)
  • apis/v1alpha1/zz_generated.deepcopy.go (1 hunks)
  • config/crd/resources/greptime.io_greptimedbclusters.yaml (6 hunks)
  • config/crd/resources/greptime.io_greptimedbstandalones.yaml (1 hunks)
  • docs/api-references/docs.md (1 hunks)
  • manifests/bundle.yaml (7 hunks)
  • manifests/crds.yaml (7 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
  • apis/v1alpha1/common.go
  • apis/v1alpha1/testdata/defaulting/greptimedbcluster/merge/test00/input.yaml
  • apis/v1alpha1/testdata/defaulting/greptimedbcluster/setdefaults/test00/expect.yaml
  • apis/v1alpha1/testdata/defaulting/greptimedbcluster/setdefaults/test01/expect.yaml
  • apis/v1alpha1/testdata/defaulting/greptimedbcluster/setdefaults/test02/expect.yaml
  • apis/v1alpha1/testdata/defaulting/greptimedbcluster/setdefaults/test03/expect.yaml
  • apis/v1alpha1/testdata/defaulting/greptimedbstandalone/test00/expect.yaml
🧰 Additional context used
🪛 Markdownlint
docs/api-references/docs.md

566-566: null
Bare URL used

(MD034, no-bare-urls)

🔇 Additional comments (13)
apis/v1alpha1/testdata/defaulting/greptimedbcluster/merge/test00/expect.yaml (3)

69-82: Well-structured probe configurations

The probe configurations are consistently defined across all sections with appropriate thresholds:

  • Startup probe with higher thresholds (30 failures, 10s period) allows sufficient time for initialization
  • Liveness/readiness probes with lower thresholds (10 failures, 5s period) ensure quick detection of issues

Also applies to: 110-123, 155-168


34-47: ⚠️ Potential issue

Port mismatch in base section probes

The base section probes are configured to use port 4000, but the frontend service now uses port 9000. This inconsistency could cause probe failures.

Apply this diff to update the probe ports:

       httpGet:
         path: /health
-        port: 4000
+        port: 9000

Line range hint 8-11: Document port configuration strategy

The configuration uses different HTTP ports across components:

  • Cluster level: 5000
  • Frontend: 9000
  • Meta: 4000
  • Datanode: 4000

This variety of ports could lead to confusion. Consider:

  1. Documenting the purpose and usage of each port
  2. Standardizing ports where possible
  3. Adding comments explaining why different ports are needed

Let's verify if these ports are documented:

Also applies to: 50-53, 95-97, 136-137

apis/v1alpha1/defaulting.go (3)

105-107: LGTM: Base container probe configuration is well-structured.

The addition of StartupProbe, LivenessProbe, and ReadinessProbe follows Kubernetes best practices for container health monitoring.


245-247: Previous nil pointer dereference warning still applies.

The code directly accesses probe fields without nil checks, which could cause panics.

Also applies to: 261-263, 277-279


379-394: Verify if 300s startup timeout is necessary.

While the comment explains the rationale for a long startup time, it would be beneficial to verify this requirement with actual startup metrics.

apis/v1alpha1/zz_generated.deepcopy.go (1)

689-693: LGTM! The deepcopy implementation for StartupProbe is correct.

The auto-generated deepcopy code properly handles the new StartupProbe field, following the same pattern as other probe fields in the MainContainerSpec struct.

docs/api-references/docs.md (1)

566-566: LGTM: StartupProbe documentation is accurate and complete.

The documentation for the startupProbe field is well-written and accurately describes:

  • Its purpose in indicating Pod initialization
  • Its relationship with other probes
  • Its immutability
  • Reference to official Kubernetes documentation

The addition aligns perfectly with the PR's objective to add StartUpProbe functionality.

🧰 Tools
🪛 Markdownlint

566-566: null
Bare URL used

(MD034, no-bare-urls)

config/crd/resources/greptime.io_greptimedbstandalones.yaml (2)

2039-2116: LGTM: StartupProbe schema follows Kubernetes conventions.

The schema definition for startupProbe is well-structured and includes all necessary probe configurations (exec, httpGet, grpc, tcpSocket) with proper type definitions and required fields.


2039-2116: Implement probe configuration validation.

The previous review comment about implementing validation rules for probe configurations is still relevant. Consider implementing a validating webhook to prevent common misconfigurations:

  • Ensure startup probe timeout is less than period
  • Validate that failure threshold is appropriate for startup scenarios
  • Check for reasonable initial delay values
config/crd/resources/greptime.io_greptimedbclusters.yaml (2)

2051-2128: Well-structured addition of startupProbe field across all components.

The implementation of the startupProbe field follows Kubernetes best practices by:

  • Using standard probe structure with all necessary fields
  • Maintaining consistency across all container specs (main, datanode, flownode, frontend, monitoring)
  • Including proper type definitions and validations

Also applies to: 4952-5029, 7827-7904, 10732-10809, 16517-16594


2051-2128: Verify probe configuration in related controllers.

The CRD changes look good, but we should ensure the controllers properly handle these new probe configurations.

Consider providing default probe configurations in the controller for each component type, as different components may have different startup characteristics.

Also applies to: 4952-5029, 7827-7904, 10732-10809, 16517-16594

✅ Verification successful

Let me check the content of the defaulting file to understand how probes are handled.


Let me check how these probe configurations are used in the controllers.


Based on the findings, I can now provide the final response:

Probe configurations are properly handled in the codebase

The probe configurations are well-managed through:

  • Default probe configurations defined in apis/v1alpha1/defaulting.go for startup, liveness, and readiness probes
  • Component-specific probe handling in merge functions for Frontend, Meta, and Datanode components
  • Proper propagation to container specs in controllers/common/common.go
  • Special case handling for Flownode where probes are explicitly disabled
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for controller code that handles probe configuration
rg -l "(?s)Probe.*container" --type go

Length of output: 68


Script:

#!/bin/bash
# Check the probe-related code in defaulting.go
rg -A 10 "Probe" apis/v1alpha1/defaulting.go

Length of output: 4270


Script:

#!/bin/bash
# Search for controller code that uses these probe configurations
rg -A 5 "MainContainer\.(Startup|Liveness|Readiness)Probe" --type go --glob "controllers/**"

Length of output: 608

manifests/bundle.yaml (1)

2057-2134: LGTM: StartupProbe fields are consistently added across components

The addition of startupProbe configuration fields follows Kubernetes best practices and provides all standard probe options (exec, grpc, httpGet, tcpSocket) with appropriate parameters.

Also applies to: 4958-5035, 7833-7910, 10738-10815, 13666-13743, 19889-19966

apis/v1alpha1/defaulting.go Show resolved Hide resolved
@zyy17 zyy17 merged commit 0a1f28a into GreptimeTeam:main Oct 25, 2024
4 checks passed
@zyy17 zyy17 deleted the refactor/add-startup-probe branch October 25, 2024 06:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants