Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: collect slow queries in monitoring mode #198

Merged
merged 2 commits into from
Oct 25, 2024

Conversation

zyy17
Copy link
Collaborator

@zyy17 zyy17 commented Oct 23, 2024

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced a SlowQuery configuration for enhanced logging, allowing users to enable slow query logging with customizable thresholds and sampling ratios across various components.
    • Added documentation for the new SlowQuery settings in the API references.
  • Bug Fixes

    • Improved validation for the new logging fields to ensure correct configuration.
  • Documentation

    • Expanded API documentation to include details about the new SlowQuery logging capabilities.
  • Chores

    • Updated Kubernetes manifests and CRDs to include slowQuery properties for consistent logging configuration across components.

Copy link
Contributor

coderabbitai bot commented Oct 23, 2024

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The changes in this pull request introduce a new configuration for logging slow queries within the GreptimeDB system. A SlowQuery struct is added to the LoggingSpec, allowing for configuration options such as enabling slow query logging, setting a threshold, and defining a sampling ratio. This configuration is integrated into various components, including CRDs, documentation, and the logging configuration logic. Additionally, new constants and methods are introduced to support these changes, enhancing the overall logging capabilities of the system.

Changes

File Path Change Summary
apis/v1alpha1/common.go Added SlowQuery struct and integrated it into LoggingSpec.
apis/v1alpha1/defaulting.go Modified defaultSpec method to include default settings for slow query logging.
apis/v1alpha1/zz_generated.deepcopy.go Added DeepCopyInto and DeepCopy methods for SlowQuery struct and updated LoggingSpec.
config/crd/resources/greptime.io_greptimedbclusters.yaml Added slowQuery property under logging for datanode, flownode, and frontend.
config/crd/resources/greptime.io_greptimedbstandalones.yaml Added slowQuery property under spec.logging for GreptimeDBStandalone.
controllers/constant/constant.go Introduced constant SlowQueriesTableName.
controllers/greptimedbcluster/deployers/common.go Updated GenerateVectorConfigMap to include SlowQueriesTableName.
controllers/greptimedbcluster/deployers/config/pipeline.yaml Added new fields cost, threshold, promql, and sql to the transform section.
controllers/greptimedbcluster/deployers/config/vector-config-template.yaml Enhanced logging configuration with new slow_queries source, transform_slow_queries, and sink_greptimedb_slow_queries.
docs/api-references/docs.md Added documentation for SlowQuery configuration in GreptimeDB API.
manifests/bundle.yaml Added slowQuery configuration under datanode and flownode, and updated status section.
manifests/crds.yaml Enhanced CRDs for GreptimeDBCluster and GreptimeDBStandalone with slowQuery properties and updated status section.
pkg/dbconfig/common.go Enhanced LoggingConfig struct with fields for slow query logging configuration and updated ConfigureLogging method.

Possibly related PRs

Suggested reviewers

  • daviderli614

Poem

In the land of queries, slow and fast,
A new log configuration is here at last.
With thresholds and ratios, we’ll keep track,
Of every slow query, no detail we’ll lack.
So hop along, let’s monitor with glee,
For performance and speed, it’s the way to be! 🐇✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🧹 Outside diff range and nitpick comments (8)
controllers/constant/constant.go (1)

35-36: Add documentation and consider table name format consistency.

  1. For consistency with other constants like LogsTableName, please add a descriptive comment explaining the purpose of this table.
  2. Consider aligning the table name format with LogsTableName:
+// SlowQueriesTableName is the table name for storing slow query metrics.
-SlowQueriesTableName = "gt_slow_queries"
+SlowQueriesTableName = "gtslowqueries"
pkg/dbconfig/common.go (1)

187-191: Fix typo in variable name.

The variable name 'ration' should be 'ratio' for better clarity and consistency.

- ration, err := strconv.ParseFloat(spec.SlowQuery.SampleRatio, 64)
+ ratio, err := strconv.ParseFloat(spec.SlowQuery.SampleRatio, 64)
  if err != nil {
    klog.Warningf("Failed to parse slow query sample ratio '%s', use the default value 1.0", spec.SlowQuery.SampleRatio)
-   ration = 1.0
+   ratio = 1.0
  }
apis/v1alpha1/defaulting.go (1)

155-161: Consider using constants for default slow query configuration values.

The hardcoded values for slow query configuration could be better managed as package-level constants, similar to other defaults in the codebase. This would make it easier to maintain and modify these values in the future.

Apply this diff to introduce constants:

 // Add these constants with the other defaults
+const (
+    DefaultSlowQueryThreshold = "10s"
+    DefaultSlowQuerySampleRatio = "1.0"
+)

 // In the defaultSpec method
 defaultSpec.Logging.SlowQuery = &SlowQuery{
     Enabled:     true,
-    Threshold:   "10s",
-    SampleRatio: "1.0",
+    Threshold:   DefaultSlowQueryThreshold,
+    SampleRatio: DefaultSlowQuerySampleRatio,
 }
apis/v1alpha1/common.go (1)

469-471: Add validation for the SlowQuery field.

Consider adding the +kubebuilder:validation:Optional marker to explicitly document that this field is optional, matching the style used for other optional fields in this file.

 // SlowQuery is the slow query configuration.
 // +optional
+// +kubebuilder:validation:Optional
 SlowQuery *SlowQuery `json:"slowQuery,omitempty"`
config/crd/resources/greptime.io_greptimedbstandalones.yaml (1)

2857-2867: LGTM with suggestions for field validation.

The slowQuery configuration structure is well-defined with appropriate field types. However, consider adding validation patterns for ratio and threshold fields to ensure valid values are provided.

Consider adding these validations:

                  slowQuery:
                    properties:
                      enabled:
                        type: boolean
                      ratio:
                        type: string
+                       pattern: ^(0(\.[0-9]+)?|1(\.0+)?)$
                      threshold:
                        type: string
+                       pattern: ^([0-9]+(\.[0-9]*)?)(ms|s|m)$
                    required:
                    - enabled
                    type: object

This ensures:

  • ratio is a valid float between 0 and 1
  • threshold follows the time duration format (e.g., "100ms", "1s", "1m")
config/crd/resources/greptime.io_greptimedbclusters.yaml (1)

2844-2854: Consider adding examples in comments.

Add examples in comments to help users understand the expected values:

   slowQuery:
     properties:
       enabled:
         type: boolean
+        # example: true
       ratio:
         type: string
+        # example: "0.1"
       threshold:
         type: string
+        # example: "100ms"

Also applies to: 5674-5684, 8487-8497, 11382-11392, 17000-17010

manifests/crds.yaml (1)

2843-2853: Add field descriptions and validation patterns for slow query configuration.

The slow query configuration schema is well-structured and consistent across all components. However, consider the following improvements:

  1. Add descriptions for fields to improve usability:
    slowQuery:
      description: "Configuration for slow query logging"
      properties:
        enabled:
          description: "Enable/disable slow query logging"
          type: boolean
        ratio:
          description: "Sampling ratio for slow queries (e.g. '0.1' for 10%)"
          type: string
  •    pattern: "^(0(\\.\\d+)?|1(\\.0+)?)$"
     threshold:
       description: "Duration threshold for slow queries (e.g. '1s', '100ms')"
       type: string
    
  •    pattern: "^\\d+(\\.\\d+)?(ms|s|m)$"
    
    
    
  1. The validation patterns would:
    • Ensure ratio is a valid decimal between 0 and 1
    • Ensure threshold follows time duration format

Also applies to: 5673-5683, 8486-8496, 11381-11391, 20298-20308

manifests/bundle.yaml (1)

2850-2860: LGTM! Consistent implementation of slowQuery configuration across components.

The slowQuery configuration is well-structured and consistently implemented across all components (datanode, flownode, frontend, meta, and standalone).

Consider adding validation for ratio and threshold values.

While the structure is good, consider adding pattern validation for:

  • ratio: Should likely be a float between 0 and 1
  • threshold: Should be a duration string (e.g., "1s", "100ms")

Example schema addition:

ratio:
  type: string
  pattern: ^(0(\.\d+)?|1(\.0+)?)$
threshold:
  type: string
  pattern: ^([0-9]+(\.[0-9]*)?)(ns|us|µs|ms|s|m|h)$

Consider adding field descriptions.

Add descriptions to document the purpose and expected format of each field:

Example:

slowQuery:
  description: Configuration for slow query logging
  properties:
    enabled:
      description: Whether to enable slow query logging
      type: boolean
    ratio:
      description: Sampling ratio for slow queries (between 0 and 1)
      type: string
    threshold:
      description: Duration threshold above which queries are considered slow (e.g. "1s")
      type: string

Also applies to: 5680-5690, 8493-8503, 11339-11349, 20305-20315

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 46697f2 and 0cd24b8.

📒 Files selected for processing (13)
  • apis/v1alpha1/common.go (1 hunks)
  • apis/v1alpha1/defaulting.go (1 hunks)
  • apis/v1alpha1/zz_generated.deepcopy.go (2 hunks)
  • config/crd/resources/greptime.io_greptimedbclusters.yaml (6 hunks)
  • config/crd/resources/greptime.io_greptimedbstandalones.yaml (1 hunks)
  • controllers/constant/constant.go (1 hunks)
  • controllers/greptimedbcluster/deployers/common.go (1 hunks)
  • controllers/greptimedbcluster/deployers/config/pipeline.yaml (1 hunks)
  • controllers/greptimedbcluster/deployers/config/vector-config-template.yaml (3 hunks)
  • docs/api-references/docs.md (2 hunks)
  • manifests/bundle.yaml (7 hunks)
  • manifests/crds.yaml (7 hunks)
  • pkg/dbconfig/common.go (3 hunks)
🧰 Additional context used
🔇 Additional comments (16)
controllers/greptimedbcluster/deployers/config/pipeline.yaml (2)

21-22: Consider adding size limits for query fields.

For promql and sql fields, consider implementing size limits to prevent excessive storage usage from very long queries. This is particularly important as these fields are indexed.

Let's check if size limits are implemented elsewhere:

#!/bin/bash
# Search for any existing size limit implementations
rg -A 3 'MaxLength|max_length|size_limit'

19-22: Consider numeric types for cost and threshold fields.

While the configuration looks good overall, consider whether cost and threshold might be better represented as numeric types rather than strings, depending on their intended usage. This could enable better aggregation and comparison operations in monitoring scenarios.

Let's check if these fields are used numerically elsewhere:

controllers/greptimedbcluster/deployers/config/vector-config-template.yaml (3)

6-7: LGTM: Improved log pattern specificity.

The change from wildcard pattern to specific log file patterns provides better control and security.


10-14: LGTM: Well-structured slow queries source configuration.

The slow queries source configuration follows the same pattern as the logs source, maintaining consistency in the configuration structure.


95-102: Verify SlowQueriesTableName template variable.

The sink configuration looks good, but we need to ensure that SlowQueriesTableName is properly defined in the template context.

✅ Verification successful

The search results show that SlowQueriesTableName is defined as a constant in controllers/constant/constant.go with the value "gt_slow_queries". Let me verify how this constant is used in the context of the vector configuration template.


Based on the search results, I can now generate the final response since I have found that:

  1. The SlowQueriesTableName constant is properly defined in constant package
  2. The constant package is imported in the deployers
  3. The template variable is correctly assigned in controllers/greptimedbcluster/deployers/common.go

SlowQueriesTableName template variable is properly defined and used

The template variable SlowQueriesTableName is correctly defined as a constant "gt_slow_queries" and is properly assigned in the template context through controllers/greptimedbcluster/deployers/common.go. The configuration is valid and will work as expected.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if SlowQueriesTableName is defined in related Go files

# Search for SlowQueriesTableName definition in Go files
echo "Searching for SlowQueriesTableName definition:"
rg -t go "SlowQueriesTableName.*string" -A 2

# Search for template variable initialization
echo -e "\nSearching for template variable initialization:"
rg -t go "SlowQueriesTableName.*=" -A 2

Length of output: 510


Script:

#!/bin/bash
# Check how the SlowQueriesTableName constant is used in relation to the vector config template

# Search for imports of the constant package in files related to vector config
echo "Searching for constant package imports in vector config related files:"
rg -t go "\".*constant\"" controllers/greptimedbcluster/deployers/

# Search for the struct/type that holds template variables for vector config
echo -e "\nSearching for vector config template struct:"
rg -t go "type.*Config.*struct" controllers/greptimedbcluster/deployers/ -A 10

# Check the actual template variable assignment
echo -e "\nSearching for template variable assignment:"
rg -t go "SlowQueriesTableName.*constant" controllers/greptimedbcluster/deployers/

Length of output: 1292

pkg/dbconfig/common.go (1)

19-21: LGTM: Import additions are appropriate.

The new imports support the slow query configuration functionality - strconv for parsing the sample ratio and klog for logging warnings.

controllers/greptimedbcluster/deployers/common.go (1)

169-174: LGTM! Verify vector template integration.

The addition of SlowQueriesTableName to the vector configuration template variables looks good and follows the existing pattern.

Let's verify that the vector config template contains the corresponding placeholder:

✅ Verification successful

Let me verify the actual content of these files to ensure proper integration.


Integration verified successfully - template and constant are properly configured

The verification confirms that:

  • The vector config template correctly uses {{ .SlowQueriesTableName }} in the sink_greptimedb_slow_queries configuration
  • The constant is properly defined as SlowQueriesTableName = "gt_slow_queries"
  • The template structure follows the same pattern as other sinks in the configuration
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if the vector config template contains the SlowQueriesTableName placeholder
# and if the constant is properly defined

echo "Checking for SlowQueriesTableName placeholder in vector template..."
rg -l "SlowQueriesTableName" "controllers/greptimedbcluster/deployers/config/vector-config-template.yaml"

echo "Checking constant definition..."
rg -l "SlowQueriesTableName.*=.*\"gt_slow_queries\"" "controllers/constant/constant.go"

Length of output: 515


Script:

#!/bin/bash
# Check the actual content of vector template and constant definition

echo "Vector template content around SlowQueriesTableName:"
rg -A 5 -B 5 "SlowQueriesTableName" "controllers/greptimedbcluster/deployers/config/vector-config-template.yaml"

echo -e "\nConstant definition:"
rg "SlowQueriesTableName.*=.*\"gt_slow_queries\"" "controllers/constant/constant.go" -A 1 -B 1

Length of output: 716

apis/v1alpha1/defaulting.go (1)

159-160: Verify the validation logic for Threshold and SampleRatio.

The Threshold and SampleRatio are stored as strings. Please ensure there is proper validation logic to:

  1. Parse and validate the duration format for Threshold
  2. Parse and validate the ratio range (0.0-1.0) for SampleRatio

Let's check for validation logic in the codebase:

apis/v1alpha1/zz_generated.deepcopy.go (2)

626-630: LGTM: Proper deep copy handling for SlowQuery field.

The implementation correctly handles the deep copy of the new SlowQuery pointer field in LoggingSpec, following the standard pattern used for other pointer fields in this auto-generated code.


1055-1068: LGTM: Correct deep copy implementation for SlowQuery struct.

The auto-generated deep copy implementation for the SlowQuery struct follows the standard pattern and correctly handles both value copying and nil checks.

docs/api-references/docs.md (1)

526-526: LGTM: Clear integration with LoggingSpec.

The new slowQuery field is properly documented and integrated into the LoggingSpec struct.

manifests/crds.yaml (5)

2843-2853: LGTM on datanode slow query configuration.

The schema structure for datanode's slow query configuration is correct and properly integrated into the logging spec.


5673-5683: LGTM on flownode slow query configuration.

The schema structure for flownode's slow query configuration is correct and properly integrated into the logging spec.


8486-8496: LGTM on frontend slow query configuration.

The schema structure for frontend's slow query configuration is correct and properly integrated into the logging spec.


11381-11391: LGTM on meta slow query configuration.

The schema structure for meta's slow query configuration is correct and properly integrated into the logging spec.


20298-20308: LGTM on standalone slow query configuration.

The schema structure for standalone's slow query configuration is correct and properly integrated into the logging spec.

pkg/dbconfig/common.go Show resolved Hide resolved
pkg/dbconfig/common.go Show resolved Hide resolved
pkg/dbconfig/common.go Show resolved Hide resolved
pkg/dbconfig/common.go Show resolved Hide resolved
apis/v1alpha1/common.go Show resolved Hide resolved
apis/v1alpha1/common.go Outdated Show resolved Hide resolved
docs/api-references/docs.md Outdated Show resolved Hide resolved
config/crd/resources/greptime.io_greptimedbclusters.yaml Outdated Show resolved Hide resolved
manifests/bundle.yaml Show resolved Hide resolved
@zyy17 zyy17 force-pushed the feat/collect-slow-query branch 2 times, most recently from 6a3438f to 1d11e4d Compare October 25, 2024 03:20
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments.

Actionable comments posted: 5

🧹 Outside diff range and nitpick comments (2)
controllers/greptimedbcluster/deployers/config/vector-config-template.yaml (1)

6-14: LGTM! Consider extracting the max_read_bytes value.

The log patterns are now more precise, and the slow_queries source is properly configured. Since both sources use the same max_read_bytes value, consider extracting it to a variable to maintain DRY principles.

+# At the top of the file
+defaults:
+  max_read_bytes: &default_max_read_bytes 536870912

 sources:
   logs:
     type: file
     data_dir: /logs
     include:
       - /logs/greptimedb-err.*
       - /logs/greptimedb.*
-    max_read_bytes: 536870912
+    max_read_bytes: *default_max_read_bytes

   slow_queries:
     type: file
     data_dir: /logs
     include:
       - /logs/greptimedb-slow-queries.*
-    max_read_bytes: 536870912
+    max_read_bytes: *default_max_read_bytes
apis/v1alpha1/common.go (1)

468-472: Add accessor method for SlowQuery field.

Following the pattern used for other fields in LoggingSpec, add an accessor method:

func (in *LoggingSpec) GetSlowQuery() *SlowQuery {
    if in != nil {
        return in.SlowQuery
    }
    return nil
}
🛑 Comments failed to post (5)
pkg/dbconfig/common.go (1)

186-193: ⚠️ Potential issue

Fix variable naming and improve error handling.

  1. The variable name "ration" is misspelled (should be "ratio")
  2. The error handling could provide more context about valid values

Apply this diff to fix the issues:

 // Turn string to float64
-ration, err := strconv.ParseFloat(spec.SlowQuery.SampleRatio, 64)
+ratio, err := strconv.ParseFloat(spec.SlowQuery.SampleRatio, 64)
 if err != nil {
-  klog.Warningf("Failed to parse slow query sample ratio '%s', use the default value 1.0", spec.SlowQuery.SampleRatio)
-  ration = 1.0
+  klog.Warningf("Failed to parse slow query sample ratio '%s' (expected format: '0.0' to '1.0'), using default value 1.0", spec.SlowQuery.SampleRatio)
+  ratio = 1.0
 }

-c.SlowQuerySampleRatio = pointer.Float64(ration)
+c.SlowQuerySampleRatio = pointer.Float64(ratio)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

		// Turn string to float64
		ratio, err := strconv.ParseFloat(spec.SlowQuery.SampleRatio, 64)
		if err != nil {
			klog.Warningf("Failed to parse slow query sample ratio '%s' (expected format: '0.0' to '1.0'), using default value 1.0", spec.SlowQuery.SampleRatio)
			ratio = 1.0
		}

		c.SlowQuerySampleRatio = pointer.Float64(ratio)
apis/v1alpha1/common.go (1)

474-489: ⚠️ Potential issue

Fix JSON tag and enhance field validations.

The struct has several issues that need to be addressed:

  1. The JSON tag for SampleRatio should match the field name
  2. Missing default value markers for fields with defaults
  3. Missing validation for SampleRatio range

Apply these changes:

 type SlowQuery struct {
     // Enabled indicates whether the slow query is enabled.
     // +required
     Enabled bool `json:"enabled"`
 
     // Threshold is the threshold of the slow query. Default to `10s`.
     // +optional
+    // +kubebuilder:default="10s"
     // +kubebuilder:validation:Pattern="^([0-9]+(\\.[0-9]+)?(ns|us|µs|ms|s|m|h))+$"
     Threshold string `json:"threshold,omitempty"`
 
     // SampleRatio is the sampling ratio of slow query log. The value should be in the range of (0, 1]. Default to `1.0`.
     // +optional
+    // +kubebuilder:default="1.0"
     // +kubebuilder:validation:Pattern="^0*\\.?[0-9]*[1-9][0-9]*$"
+    // +kubebuilder:validation:Minimum=0
+    // +kubebuilder:validation:Maximum=1
+    // +kubebuilder:validation:ExclusiveMinimum=true
     // +kubebuilder:validation:Type=string
-    SampleRatio string `json:"ratio,omitempty"`
+    SampleRatio string `json:"sampleRatio,omitempty"`
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

// SlowQuery defines the slow query configuration. It only works for the datanode component.
type SlowQuery struct {
    // Enabled indicates whether the slow query is enabled.
    // +required
    Enabled bool `json:"enabled"`

    // Threshold is the threshold of the slow query. Default to `10s`.
    // +optional
    // +kubebuilder:default="10s"
    // +kubebuilder:validation:Pattern="^([0-9]+(\\.[0-9]+)?(ns|us|µs|ms|s|m|h))+$"
    Threshold string `json:"threshold,omitempty"`

    // SampleRatio is the sampling ratio of slow query log. The value should be in the range of (0, 1]. Default to `1.0`.
    // +optional
    // +kubebuilder:default="1.0"
    // +kubebuilder:validation:Pattern="^0*\\.?[0-9]*[1-9][0-9]*$"
    // +kubebuilder:validation:Minimum=0
    // +kubebuilder:validation:Maximum=1
    // +kubebuilder:validation:ExclusiveMinimum=true
    // +kubebuilder:validation:Type=string
    SampleRatio string `json:"sampleRatio,omitempty"`
}
manifests/crds.yaml (1)

2843-2855: 🛠️ Refactor suggestion

Consider enhancing the slowQuery configuration schema.

While the slowQuery configuration is consistently defined across all components, there are a few improvements that could be made to enhance usability and prevent potential misconfigurations:

  1. Add descriptions and examples to the schema to improve user experience
  2. Add validation to ensure ratio is <= 1.0 (currently allows values > 100%)

Consider updating the schema like this:

 slowQuery:
   properties:
     enabled:
       type: boolean
+      description: "Enable/disable slow query logging"
+      example: true
     ratio:
       pattern: ^0*\.?[0-9]*[1-9][0-9]*$
+      pattern: ^0*\.?[0-9]*[1-9][0-9]*$|^1\.0*$
+      description: "Sampling ratio for slow queries (0.0-1.0)"
+      example: "0.1"
     threshold:
       pattern: ^([0-9]+(\.[0-9]+)?(ns|us|µs|ms|s|m|h))+$
+      description: "Duration threshold to classify a query as slow"
+      example: "1s"
   required:
   - enabled
   type: object

Also applies to: 5675-5687, 8490-8502, 11338-11350, 11389-11401

manifests/bundle.yaml (2)

2850-2862: ⚠️ Potential issue

Consider making ratio and threshold required when enabled is true.

The slowQuery configuration is consistently defined across all components (datanode, flownode, frontend, etc.) but only marks the enabled field as required. This could lead to incomplete configurations where slow query logging is enabled but missing critical parameters.

Consider making ratio and threshold required fields when enabled is true to ensure proper configuration. This can be achieved through OpenAPI validation or through controller validation.

Example modification for the schema:

 properties:
   enabled:
     type: boolean
   ratio:
     pattern: ^0*\.?[0-9]*[1-9][0-9]*$
     type: string
   threshold:
     pattern: ^([0-9]+(\.[0-9]+)?(ns|us|µs|ms|s|m|h))+$
     type: string
 required:
 - enabled
+- ratio    # When enabled is true
+- threshold # When enabled is true
 type: object

Also applies to: 5682-5694, 8497-8509, 11345-11357, 20317-20329


2854-2856: 💡 Codebase verification

Based on the search results and the test data showing a ratio value of "1.0", along with the regex pattern analysis:

The ratio pattern needs to be updated to enforce sampling ratio constraints

The current pattern ^0*\.?[0-9]*[1-9][0-9]*$ has these issues:

  • Allows values greater than 1 (e.g., 1.5, 2.0, etc.)
  • The test data shows "1.0" being used, which actually doesn't match this pattern
  • The pattern requires at least one non-zero digit, preventing valid values like "0.1"

Suggested fixes:

  • Use pattern ^0*\.[0-9]+$|^1\.0$ to enforce:
    • Values between 0 and 1 (exclusive) with decimal point
    • Special case for exactly "1.0"
  • Or consider using numeric type with minimum/maximum constraints instead of string pattern
🔗 Analysis chain

Verify ratio pattern allows valid sampling ratios.

The pattern ^0*\.?[0-9]*[1-9][0-9]*$ for the ratio field:

  • Allows leading zeros before decimal point
  • Requires at least one non-zero digit
  • Doesn't enforce a maximum value

Consider adding validation to ensure the ratio is between 0 and 1 (exclusive) since it represents a sampling ratio.

Also applies to: 5686-5688, 8501-8503, 11349-11351, 20321-20323


Let me analyze the validation patterns more thoroughly to understand how ratio is being used and validated in the codebase.


I want to check if there are any examples or documentation that indicate the expected range for the ratio value before making a final determination.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify if there are any ratio validations in the controller code
rg -A 5 "ratio.*validation|validate.*ratio" 

Length of output: 55


Script:

#!/bin/bash
# Search for any ratio-related code or validation patterns
rg -A 5 "ratio" --type yaml

# Also check for any validation functions that might be handling ratio
ast-grep --pattern 'function $NAME($_) { $$$ ratio $$$ }'

Length of output: 43938


Script:

#!/bin/bash
# Let's check for any default values or examples that might indicate the expected ratio range
rg -A 2 "ratio.*1\.0|ratio.*0\." --type yaml

# Also search for any test cases or documentation about ratio usage
fd -e yaml -e md -e txt . -x grep -l "ratio" {} \; -exec head -n 50 {} \;

Length of output: 658

@zyy17 zyy17 force-pushed the feat/collect-slow-query branch 2 times, most recently from e7ae002 to 37c70ab Compare October 25, 2024 03:46
@zyy17 zyy17 force-pushed the feat/collect-slow-query branch 2 times, most recently from 70c69e1 to d9feb0d Compare October 25, 2024 06:07
@zyy17 zyy17 merged commit 075b5fe into GreptimeTeam:main Oct 25, 2024
4 checks passed
@zyy17 zyy17 deleted the feat/collect-slow-query branch October 25, 2024 10:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants