-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: collect slow queries in monitoring mode #198
Conversation
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe changes in this pull request introduce a new configuration for logging slow queries within the GreptimeDB system. A Changes
Possibly related PRs
Suggested reviewers
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 12
🧹 Outside diff range and nitpick comments (8)
controllers/constant/constant.go (1)
35-36
: Add documentation and consider table name format consistency.
- For consistency with other constants like
LogsTableName
, please add a descriptive comment explaining the purpose of this table.- Consider aligning the table name format with
LogsTableName
:+// SlowQueriesTableName is the table name for storing slow query metrics. -SlowQueriesTableName = "gt_slow_queries" +SlowQueriesTableName = "gtslowqueries"pkg/dbconfig/common.go (1)
187-191
: Fix typo in variable name.The variable name 'ration' should be 'ratio' for better clarity and consistency.
- ration, err := strconv.ParseFloat(spec.SlowQuery.SampleRatio, 64) + ratio, err := strconv.ParseFloat(spec.SlowQuery.SampleRatio, 64) if err != nil { klog.Warningf("Failed to parse slow query sample ratio '%s', use the default value 1.0", spec.SlowQuery.SampleRatio) - ration = 1.0 + ratio = 1.0 }apis/v1alpha1/defaulting.go (1)
155-161
: Consider using constants for default slow query configuration values.The hardcoded values for slow query configuration could be better managed as package-level constants, similar to other defaults in the codebase. This would make it easier to maintain and modify these values in the future.
Apply this diff to introduce constants:
// Add these constants with the other defaults +const ( + DefaultSlowQueryThreshold = "10s" + DefaultSlowQuerySampleRatio = "1.0" +) // In the defaultSpec method defaultSpec.Logging.SlowQuery = &SlowQuery{ Enabled: true, - Threshold: "10s", - SampleRatio: "1.0", + Threshold: DefaultSlowQueryThreshold, + SampleRatio: DefaultSlowQuerySampleRatio, }apis/v1alpha1/common.go (1)
469-471
: Add validation for theSlowQuery
field.Consider adding the
+kubebuilder:validation:Optional
marker to explicitly document that this field is optional, matching the style used for other optional fields in this file.// SlowQuery is the slow query configuration. // +optional +// +kubebuilder:validation:Optional SlowQuery *SlowQuery `json:"slowQuery,omitempty"`
config/crd/resources/greptime.io_greptimedbstandalones.yaml (1)
2857-2867
: LGTM with suggestions for field validation.The
slowQuery
configuration structure is well-defined with appropriate field types. However, consider adding validation patterns forratio
andthreshold
fields to ensure valid values are provided.Consider adding these validations:
slowQuery: properties: enabled: type: boolean ratio: type: string + pattern: ^(0(\.[0-9]+)?|1(\.0+)?)$ threshold: type: string + pattern: ^([0-9]+(\.[0-9]*)?)(ms|s|m)$ required: - enabled type: objectThis ensures:
ratio
is a valid float between 0 and 1threshold
follows the time duration format (e.g., "100ms", "1s", "1m")config/crd/resources/greptime.io_greptimedbclusters.yaml (1)
2844-2854
: Consider adding examples in comments.Add examples in comments to help users understand the expected values:
slowQuery: properties: enabled: type: boolean + # example: true ratio: type: string + # example: "0.1" threshold: type: string + # example: "100ms"Also applies to: 5674-5684, 8487-8497, 11382-11392, 17000-17010
manifests/crds.yaml (1)
2843-2853
: Add field descriptions and validation patterns for slow query configuration.The slow query configuration schema is well-structured and consistent across all components. However, consider the following improvements:
- Add descriptions for fields to improve usability:
slowQuery: description: "Configuration for slow query logging" properties: enabled: description: "Enable/disable slow query logging" type: boolean ratio: description: "Sampling ratio for slow queries (e.g. '0.1' for 10%)" type: string
pattern: "^(0(\\.\\d+)?|1(\\.0+)?)$" threshold: description: "Duration threshold for slow queries (e.g. '1s', '100ms')" type: string
pattern: "^\\d+(\\.\\d+)?(ms|s|m)$"
- The validation patterns would:
- Ensure ratio is a valid decimal between 0 and 1
- Ensure threshold follows time duration format
Also applies to: 5673-5683, 8486-8496, 11381-11391, 20298-20308
manifests/bundle.yaml (1)
2850-2860
: LGTM! Consistent implementation of slowQuery configuration across components.The slowQuery configuration is well-structured and consistently implemented across all components (datanode, flownode, frontend, meta, and standalone).
Consider adding validation for ratio and threshold values.
While the structure is good, consider adding pattern validation for:
ratio
: Should likely be a float between 0 and 1threshold
: Should be a duration string (e.g., "1s", "100ms")Example schema addition:
ratio: type: string pattern: ^(0(\.\d+)?|1(\.0+)?)$ threshold: type: string pattern: ^([0-9]+(\.[0-9]*)?)(ns|us|µs|ms|s|m|h)$Consider adding field descriptions.
Add descriptions to document the purpose and expected format of each field:
Example:
slowQuery: description: Configuration for slow query logging properties: enabled: description: Whether to enable slow query logging type: boolean ratio: description: Sampling ratio for slow queries (between 0 and 1) type: string threshold: description: Duration threshold above which queries are considered slow (e.g. "1s") type: stringAlso applies to: 5680-5690, 8493-8503, 11339-11349, 20305-20315
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (13)
- apis/v1alpha1/common.go (1 hunks)
- apis/v1alpha1/defaulting.go (1 hunks)
- apis/v1alpha1/zz_generated.deepcopy.go (2 hunks)
- config/crd/resources/greptime.io_greptimedbclusters.yaml (6 hunks)
- config/crd/resources/greptime.io_greptimedbstandalones.yaml (1 hunks)
- controllers/constant/constant.go (1 hunks)
- controllers/greptimedbcluster/deployers/common.go (1 hunks)
- controllers/greptimedbcluster/deployers/config/pipeline.yaml (1 hunks)
- controllers/greptimedbcluster/deployers/config/vector-config-template.yaml (3 hunks)
- docs/api-references/docs.md (2 hunks)
- manifests/bundle.yaml (7 hunks)
- manifests/crds.yaml (7 hunks)
- pkg/dbconfig/common.go (3 hunks)
🧰 Additional context used
🔇 Additional comments (16)
controllers/greptimedbcluster/deployers/config/pipeline.yaml (2)
21-22
: Consider adding size limits for query fields.For
promql
andsql
fields, consider implementing size limits to prevent excessive storage usage from very long queries. This is particularly important as these fields are indexed.Let's check if size limits are implemented elsewhere:
#!/bin/bash # Search for any existing size limit implementations rg -A 3 'MaxLength|max_length|size_limit'
19-22
: Consider numeric types for cost and threshold fields.While the configuration looks good overall, consider whether
cost
andthreshold
might be better represented as numeric types rather than strings, depending on their intended usage. This could enable better aggregation and comparison operations in monitoring scenarios.Let's check if these fields are used numerically elsewhere:
controllers/greptimedbcluster/deployers/config/vector-config-template.yaml (3)
6-7
: LGTM: Improved log pattern specificity.The change from wildcard pattern to specific log file patterns provides better control and security.
10-14
: LGTM: Well-structured slow queries source configuration.The slow queries source configuration follows the same pattern as the logs source, maintaining consistency in the configuration structure.
95-102
: Verify SlowQueriesTableName template variable.The sink configuration looks good, but we need to ensure that
SlowQueriesTableName
is properly defined in the template context.✅ Verification successful
The search results show that
SlowQueriesTableName
is defined as a constant incontrollers/constant/constant.go
with the value"gt_slow_queries"
. Let me verify how this constant is used in the context of the vector configuration template.
Based on the search results, I can now generate the final response since I have found that:
- The
SlowQueriesTableName
constant is properly defined inconstant
package- The
constant
package is imported in the deployers- The template variable is correctly assigned in
controllers/greptimedbcluster/deployers/common.go
SlowQueriesTableName template variable is properly defined and used
The template variable
SlowQueriesTableName
is correctly defined as a constant"gt_slow_queries"
and is properly assigned in the template context throughcontrollers/greptimedbcluster/deployers/common.go
. The configuration is valid and will work as expected.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check if SlowQueriesTableName is defined in related Go files # Search for SlowQueriesTableName definition in Go files echo "Searching for SlowQueriesTableName definition:" rg -t go "SlowQueriesTableName.*string" -A 2 # Search for template variable initialization echo -e "\nSearching for template variable initialization:" rg -t go "SlowQueriesTableName.*=" -A 2Length of output: 510
Script:
#!/bin/bash # Check how the SlowQueriesTableName constant is used in relation to the vector config template # Search for imports of the constant package in files related to vector config echo "Searching for constant package imports in vector config related files:" rg -t go "\".*constant\"" controllers/greptimedbcluster/deployers/ # Search for the struct/type that holds template variables for vector config echo -e "\nSearching for vector config template struct:" rg -t go "type.*Config.*struct" controllers/greptimedbcluster/deployers/ -A 10 # Check the actual template variable assignment echo -e "\nSearching for template variable assignment:" rg -t go "SlowQueriesTableName.*constant" controllers/greptimedbcluster/deployers/Length of output: 1292
pkg/dbconfig/common.go (1)
19-21
: LGTM: Import additions are appropriate.The new imports support the slow query configuration functionality -
strconv
for parsing the sample ratio andklog
for logging warnings.controllers/greptimedbcluster/deployers/common.go (1)
169-174
: LGTM! Verify vector template integration.The addition of
SlowQueriesTableName
to the vector configuration template variables looks good and follows the existing pattern.Let's verify that the vector config template contains the corresponding placeholder:
✅ Verification successful
Let me verify the actual content of these files to ensure proper integration.
Integration verified successfully - template and constant are properly configured
The verification confirms that:
- The vector config template correctly uses
{{ .SlowQueriesTableName }}
in thesink_greptimedb_slow_queries
configuration- The constant is properly defined as
SlowQueriesTableName = "gt_slow_queries"
- The template structure follows the same pattern as other sinks in the configuration
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check if the vector config template contains the SlowQueriesTableName placeholder # and if the constant is properly defined echo "Checking for SlowQueriesTableName placeholder in vector template..." rg -l "SlowQueriesTableName" "controllers/greptimedbcluster/deployers/config/vector-config-template.yaml" echo "Checking constant definition..." rg -l "SlowQueriesTableName.*=.*\"gt_slow_queries\"" "controllers/constant/constant.go"Length of output: 515
Script:
#!/bin/bash # Check the actual content of vector template and constant definition echo "Vector template content around SlowQueriesTableName:" rg -A 5 -B 5 "SlowQueriesTableName" "controllers/greptimedbcluster/deployers/config/vector-config-template.yaml" echo -e "\nConstant definition:" rg "SlowQueriesTableName.*=.*\"gt_slow_queries\"" "controllers/constant/constant.go" -A 1 -B 1Length of output: 716
apis/v1alpha1/defaulting.go (1)
159-160
: Verify the validation logic for Threshold and SampleRatio.The
Threshold
andSampleRatio
are stored as strings. Please ensure there is proper validation logic to:
- Parse and validate the duration format for
Threshold
- Parse and validate the ratio range (0.0-1.0) for
SampleRatio
Let's check for validation logic in the codebase:
apis/v1alpha1/zz_generated.deepcopy.go (2)
626-630
: LGTM: Proper deep copy handling for SlowQuery field.The implementation correctly handles the deep copy of the new
SlowQuery
pointer field inLoggingSpec
, following the standard pattern used for other pointer fields in this auto-generated code.
1055-1068
: LGTM: Correct deep copy implementation for SlowQuery struct.The auto-generated deep copy implementation for the
SlowQuery
struct follows the standard pattern and correctly handles both value copying and nil checks.docs/api-references/docs.md (1)
526-526
: LGTM: Clear integration with LoggingSpec.The new
slowQuery
field is properly documented and integrated into theLoggingSpec
struct.manifests/crds.yaml (5)
2843-2853
: LGTM on datanode slow query configuration.The schema structure for datanode's slow query configuration is correct and properly integrated into the logging spec.
5673-5683
: LGTM on flownode slow query configuration.The schema structure for flownode's slow query configuration is correct and properly integrated into the logging spec.
8486-8496
: LGTM on frontend slow query configuration.The schema structure for frontend's slow query configuration is correct and properly integrated into the logging spec.
11381-11391
: LGTM on meta slow query configuration.The schema structure for meta's slow query configuration is correct and properly integrated into the logging spec.
20298-20308
: LGTM on standalone slow query configuration.The schema structure for standalone's slow query configuration is correct and properly integrated into the logging spec.
controllers/greptimedbcluster/deployers/config/vector-config-template.yaml
Outdated
Show resolved
Hide resolved
6a3438f
to
1d11e4d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments.
Actionable comments posted: 5
🧹 Outside diff range and nitpick comments (2)
controllers/greptimedbcluster/deployers/config/vector-config-template.yaml (1)
6-14
: LGTM! Consider extracting the max_read_bytes value.The log patterns are now more precise, and the slow_queries source is properly configured. Since both sources use the same max_read_bytes value, consider extracting it to a variable to maintain DRY principles.
+# At the top of the file +defaults: + max_read_bytes: &default_max_read_bytes 536870912 sources: logs: type: file data_dir: /logs include: - /logs/greptimedb-err.* - /logs/greptimedb.* - max_read_bytes: 536870912 + max_read_bytes: *default_max_read_bytes slow_queries: type: file data_dir: /logs include: - /logs/greptimedb-slow-queries.* - max_read_bytes: 536870912 + max_read_bytes: *default_max_read_bytesapis/v1alpha1/common.go (1)
468-472
: Add accessor method for SlowQuery field.Following the pattern used for other fields in
LoggingSpec
, add an accessor method:func (in *LoggingSpec) GetSlowQuery() *SlowQuery { if in != nil { return in.SlowQuery } return nil }
🛑 Comments failed to post (5)
pkg/dbconfig/common.go (1)
186-193:
⚠️ Potential issueFix variable naming and improve error handling.
- The variable name "ration" is misspelled (should be "ratio")
- The error handling could provide more context about valid values
Apply this diff to fix the issues:
// Turn string to float64 -ration, err := strconv.ParseFloat(spec.SlowQuery.SampleRatio, 64) +ratio, err := strconv.ParseFloat(spec.SlowQuery.SampleRatio, 64) if err != nil { - klog.Warningf("Failed to parse slow query sample ratio '%s', use the default value 1.0", spec.SlowQuery.SampleRatio) - ration = 1.0 + klog.Warningf("Failed to parse slow query sample ratio '%s' (expected format: '0.0' to '1.0'), using default value 1.0", spec.SlowQuery.SampleRatio) + ratio = 1.0 } -c.SlowQuerySampleRatio = pointer.Float64(ration) +c.SlowQuerySampleRatio = pointer.Float64(ratio)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.// Turn string to float64 ratio, err := strconv.ParseFloat(spec.SlowQuery.SampleRatio, 64) if err != nil { klog.Warningf("Failed to parse slow query sample ratio '%s' (expected format: '0.0' to '1.0'), using default value 1.0", spec.SlowQuery.SampleRatio) ratio = 1.0 } c.SlowQuerySampleRatio = pointer.Float64(ratio)
apis/v1alpha1/common.go (1)
474-489:
⚠️ Potential issueFix JSON tag and enhance field validations.
The struct has several issues that need to be addressed:
- The JSON tag for
SampleRatio
should match the field name- Missing default value markers for fields with defaults
- Missing validation for SampleRatio range
Apply these changes:
type SlowQuery struct { // Enabled indicates whether the slow query is enabled. // +required Enabled bool `json:"enabled"` // Threshold is the threshold of the slow query. Default to `10s`. // +optional + // +kubebuilder:default="10s" // +kubebuilder:validation:Pattern="^([0-9]+(\\.[0-9]+)?(ns|us|µs|ms|s|m|h))+$" Threshold string `json:"threshold,omitempty"` // SampleRatio is the sampling ratio of slow query log. The value should be in the range of (0, 1]. Default to `1.0`. // +optional + // +kubebuilder:default="1.0" // +kubebuilder:validation:Pattern="^0*\\.?[0-9]*[1-9][0-9]*$" + // +kubebuilder:validation:Minimum=0 + // +kubebuilder:validation:Maximum=1 + // +kubebuilder:validation:ExclusiveMinimum=true // +kubebuilder:validation:Type=string - SampleRatio string `json:"ratio,omitempty"` + SampleRatio string `json:"sampleRatio,omitempty"` }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.// SlowQuery defines the slow query configuration. It only works for the datanode component. type SlowQuery struct { // Enabled indicates whether the slow query is enabled. // +required Enabled bool `json:"enabled"` // Threshold is the threshold of the slow query. Default to `10s`. // +optional // +kubebuilder:default="10s" // +kubebuilder:validation:Pattern="^([0-9]+(\\.[0-9]+)?(ns|us|µs|ms|s|m|h))+$" Threshold string `json:"threshold,omitempty"` // SampleRatio is the sampling ratio of slow query log. The value should be in the range of (0, 1]. Default to `1.0`. // +optional // +kubebuilder:default="1.0" // +kubebuilder:validation:Pattern="^0*\\.?[0-9]*[1-9][0-9]*$" // +kubebuilder:validation:Minimum=0 // +kubebuilder:validation:Maximum=1 // +kubebuilder:validation:ExclusiveMinimum=true // +kubebuilder:validation:Type=string SampleRatio string `json:"sampleRatio,omitempty"` }
manifests/crds.yaml (1)
2843-2855: 🛠️ Refactor suggestion
Consider enhancing the slowQuery configuration schema.
While the slowQuery configuration is consistently defined across all components, there are a few improvements that could be made to enhance usability and prevent potential misconfigurations:
- Add descriptions and examples to the schema to improve user experience
- Add validation to ensure ratio is <= 1.0 (currently allows values > 100%)
Consider updating the schema like this:
slowQuery: properties: enabled: type: boolean + description: "Enable/disable slow query logging" + example: true ratio: pattern: ^0*\.?[0-9]*[1-9][0-9]*$ + pattern: ^0*\.?[0-9]*[1-9][0-9]*$|^1\.0*$ + description: "Sampling ratio for slow queries (0.0-1.0)" + example: "0.1" threshold: pattern: ^([0-9]+(\.[0-9]+)?(ns|us|µs|ms|s|m|h))+$ + description: "Duration threshold to classify a query as slow" + example: "1s" required: - enabled type: objectAlso applies to: 5675-5687, 8490-8502, 11338-11350, 11389-11401
manifests/bundle.yaml (2)
2850-2862:
⚠️ Potential issueConsider making ratio and threshold required when enabled is true.
The
slowQuery
configuration is consistently defined across all components (datanode, flownode, frontend, etc.) but only marks theenabled
field as required. This could lead to incomplete configurations where slow query logging is enabled but missing critical parameters.Consider making
ratio
andthreshold
required fields whenenabled
is true to ensure proper configuration. This can be achieved through OpenAPI validation or through controller validation.Example modification for the schema:
properties: enabled: type: boolean ratio: pattern: ^0*\.?[0-9]*[1-9][0-9]*$ type: string threshold: pattern: ^([0-9]+(\.[0-9]+)?(ns|us|µs|ms|s|m|h))+$ type: string required: - enabled +- ratio # When enabled is true +- threshold # When enabled is true type: objectAlso applies to: 5682-5694, 8497-8509, 11345-11357, 20317-20329
2854-2856: 💡 Codebase verification
Based on the search results and the test data showing a ratio value of "1.0", along with the regex pattern analysis:
The ratio pattern needs to be updated to enforce sampling ratio constraints
The current pattern
^0*\.?[0-9]*[1-9][0-9]*$
has these issues:
- Allows values greater than 1 (e.g., 1.5, 2.0, etc.)
- The test data shows "1.0" being used, which actually doesn't match this pattern
- The pattern requires at least one non-zero digit, preventing valid values like "0.1"
Suggested fixes:
- Use pattern
^0*\.[0-9]+$|^1\.0$
to enforce:
- Values between 0 and 1 (exclusive) with decimal point
- Special case for exactly "1.0"
- Or consider using numeric type with minimum/maximum constraints instead of string pattern
🔗 Analysis chain
Verify ratio pattern allows valid sampling ratios.
The pattern
^0*\.?[0-9]*[1-9][0-9]*$
for the ratio field:
- Allows leading zeros before decimal point
- Requires at least one non-zero digit
- Doesn't enforce a maximum value
Consider adding validation to ensure the ratio is between 0 and 1 (exclusive) since it represents a sampling ratio.
Also applies to: 5686-5688, 8501-8503, 11349-11351, 20321-20323
Let me analyze the validation patterns more thoroughly to understand how ratio is being used and validated in the codebase.
I want to check if there are any examples or documentation that indicate the expected range for the ratio value before making a final determination.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify if there are any ratio validations in the controller code rg -A 5 "ratio.*validation|validate.*ratio"Length of output: 55
Script:
#!/bin/bash # Search for any ratio-related code or validation patterns rg -A 5 "ratio" --type yaml # Also check for any validation functions that might be handling ratio ast-grep --pattern 'function $NAME($_) { $$$ ratio $$$ }'Length of output: 43938
Script:
#!/bin/bash # Let's check for any default values or examples that might indicate the expected ratio range rg -A 2 "ratio.*1\.0|ratio.*0\." --type yaml # Also search for any test cases or documentation about ratio usage fd -e yaml -e md -e txt . -x grep -l "ratio" {} \; -exec head -n 50 {} \;Length of output: 658
e7ae002
to
37c70ab
Compare
70c69e1
to
d9feb0d
Compare
d9feb0d
to
234e536
Compare
Summary by CodeRabbit
Release Notes
New Features
SlowQuery
configuration for enhanced logging, allowing users to enable slow query logging with customizable thresholds and sampling ratios across various components.SlowQuery
settings in the API references.Bug Fixes
Documentation
SlowQuery
logging capabilities.Chores
slowQuery
properties for consistent logging configuration across components.