-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Experimental] Add QUIC support and use netpoll for more efficient network handling #1771
Conversation
54ea37b
to
7f84123
Compare
Deploying vald with Cloudflare Pages
|
3a2fbc8
to
8d39ede
Compare
[WARNING:INTCFG] Changes in |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1771 +/- ##
==========================================
- Coverage 23.94% 23.88% -0.06%
==========================================
Files 545 547 +2
Lines 54379 54660 +281
==========================================
+ Hits 13020 13055 +35
- Misses 40582 40819 +237
- Partials 777 786 +9 ☔ View full report in Codecov by Sentry. |
431c353
to
6faac03
Compare
9b3087a
to
4a659f2
Compare
6faac03
to
1b8eb97
Compare
865206c
to
6a6a2a5
Compare
📝 Walkthrough## Walkthrough
The pull request encompasses extensive modifications to the project's codebase, including the addition of new files and directories, particularly in the `internal/net/quic` directory for QUIC protocol functionalities. Numerous updates were made to configuration and service files across various packages, enhancing network capabilities. New YAML configuration files for Kubernetes resources were introduced, and the `tests` directory received updates to improve test coverage. Overall, the changes signify a substantial expansion of functionalities and improvements in the project's structure.
## Changes
| File | Change Summary |
|------|----------------|
| `.gitfiles` | Extensive modifications with new files and directories added, particularly in `internal/net/quic`, and updates to configuration and service files across various packages. |
| `charts/vald-helm-operator/crds/valdrelease.yaml` | Added `network` property to various components, enhancing network configurability. |
| `charts/vald/values.schema.json` | Updated gRPC client configuration properties and added a new `network` property for specifying network types. |
| `charts/vald/values.yaml` | Introduced `network` field in gRPC client configurations and updated comments for DNS settings. |
| `example/client/go.mod` | Updated Go version and several dependency versions. |
| `go.mod` | Updated Go version and multiple dependency versions, including Kubernetes-related packages. |
| `internal/client/v1/client/discoverer/discover.go` | Simplified error handling and refined connection logic in several methods. |
| `internal/config/grpc.go` | Updated `Opts` method to include network type in dialer configuration. |
| `internal/config/net.go` | Added `Network` field to `Net` struct for JSON and YAML serialization. |
| `internal/net/dialer.go` | Enhanced dialer functionality with QUIC support and improved error handling. |
| `internal/net/grpc/option.go` | Updated `WithDialer` function to accept network type as a parameter. |
| `internal/net/net.go` | Introduced type aliases and utility functions for network type checking. |
| `internal/net/quic/conn.go` | Implemented QUIC connection management system with connection caching. |
| `internal/net/quic/listener.go` | Implemented QUIC listener for managing network connections. |
| `internal/servers/server/server.go` | Enhanced error handling in `ListenAndServe` and `Shutdown` methods. |
| `k8s/discoverer/configmap.yaml` | Added `network: tcp` field to discoverer configuration. |
| `k8s/discoverer/deployment.yaml` | Updated checksum annotation for ConfigMap. |
| `k8s/gateway/gateway/lb/configmap.yaml` | Added `network: tcp` field to `dial_option` sections. |
| `k8s/gateway/gateway/lb/deployment.yaml` | Updated checksum annotation for ConfigMap. |
| `k8s/gateway/gateway/mirror/configmap.yaml` | Modified socket options and added `network` field under gateway section. |
| `k8s/gateway/gateway/mirror/deployment.yaml` | Updated checksum annotation for ConfigMap. |
| `k8s/index/job/correction/configmap.yaml` | Added `network: tcp` field to `dial_option` section for gateway and discoverer. |
| `k8s/index/job/creation/configmap.yaml` | Added `network: tcp` field to discoverer.client configuration. |
| `k8s/index/job/save/configmap.yaml` | Added `network: tcp` field to discoverer configuration in saver section. |
| `k8s/manager/index/configmap.yaml` | Added `network: tcp` field to discoverer.client configuration. |
| `k8s/manager/index/deployment.yaml` | Updated checksum annotation for ConfigMap. |
| `k8s/operator/helm/crds/valdrelease.yaml` | Added `network` property to various components in ValdRelease schema. |
| `pkg/discoverer/k8s/service/discover.go` | Changed internal state management to use pointers for maps, enhancing performance. |
| `hack/go.mod.default` | Updated Go version and several Kubernetes-related dependencies. |
| `internal/errors/net.go` | Introduced `ErrInvalidAddress` function for error handling. |
| `internal/net/dialer_test.go` | Enhanced test coverage and error handling for dialer functionality. |
| `example/client/go.mod.default` | Updated Go version and modified dependency replacements for latest versions. |
| `pkg/index/job/correction/usecase/corrector.go` | Removed inline function for address handling in the discoverer. |
| `tests/e2e/crud/crud_test.go` | Enhanced test for Kubernetes pod management with improved error handling. |
| `tests/e2e/operation/operation.go` | Added `IndexDetail` method to `Client` interface for retrieving index details. |
## Possibly related PRs
- **#2674**: The changes in `internal/net/grpc/client.go` involve enhancements to connection monitoring logic, which may relate to the new QUIC functionalities introduced in the main PR, as both involve network connection management.
- **#2693**: This PR adds new gRPC options and connection pool logic, which aligns with the main PR's focus on enhancing network capabilities through the introduction of QUIC, indicating a broader effort to improve network functionalities across the project.
## Suggested labels
`size/XXXL`
## Suggested reviewers
- vankichi Warning There were issues while running some tools. Please review the errors and either fix the tool’s configuration or disable the tool if it’s a critical failure. 🔧 golangci-lintlevel=warning msg="[config_reader] The configuration option Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
887696a
to
485ab60
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🧹 Outside diff range and nitpick comments (7)
k8s/manager/index/pdb.yaml (1)
32-32
: Document the eviction policy choiceConsider adding a comment explaining why
AlwaysAllow
was chosen as the eviction policy for unhealthy pods in the manager-index component. This helps future maintainers understand the operational requirements.selector: matchLabels: app: vald-manager-index + # AlwaysAllow policy ensures quick removal of unhealthy pods + # to maintain service reliability unhealthyPodEvictionPolicy: AlwaysAllowcharts/vald/templates/manager/index/pdb.yaml (1)
38-38
: Document the eviction policy's behaviorConsider adding a comment or documentation about the chosen eviction policy's impact on pod disruptions and service availability.
Add a comment above the field explaining its purpose:
+ # Controls whether pods that have failed health checks can be evicted + # regardless of the PDB. This helps maintain service health by allowing + # quick replacement of unhealthy pods. unhealthyPodEvictionPolicy: {{ $index.unhealthyPodEvictionPolicy }}charts/vald/templates/gateway/lb/pdb.yaml (1)
Line range hint
18-24
: Consider adding minimum version check for policy/v1While the template correctly handles API version selection, it would be helpful to add a note about the minimum Kubernetes version required for
policy/v1
andunhealthyPodEvictionPolicy
.Add a comment before the conditional block:
{{- $gateway := .Values.gateway.lb -}} {{- if $gateway.enabled }} +{{- /* policy/v1 requires Kubernetes 1.21+, unhealthyPodEvictionPolicy requires 1.26+ */}} {{- if (.Capabilities.APIVersions.Has "policy/v1") }}
charts/vald/templates/gateway/filter/pdb.yaml (1)
38-38
: Add validation for unhealthyPodEvictionPolicy valueConsider adding validation to ensure the policy value is one of the allowed values:
AlwaysAllow
orIfHealthyBudget
.- unhealthyPodEvictionPolicy: {{ $gateway.unhealthyPodEvictionPolicy }} + {{- if and $gateway.unhealthyPodEvictionPolicy (or (eq $gateway.unhealthyPodEvictionPolicy "AlwaysAllow") (eq $gateway.unhealthyPodEvictionPolicy "IfHealthyBudget")) }} + unhealthyPodEvictionPolicy: {{ $gateway.unhealthyPodEvictionPolicy }} + {{- end }}charts/vald/templates/gateway/mirror/pdb.yaml (1)
38-38
: Add documentation for the new fieldConsider adding comments above the field to document:
- The purpose of
unhealthyPodEvictionPolicy
- Valid values and their implications
- Version requirements
Apply this diff to add documentation:
selector: matchLabels: app: {{ $gateway.name }} + # unhealthyPodEvictionPolicy controls whether unhealthy pods should be evicted during disruption events + # Requires Kubernetes 1.26+ (alpha) or 1.27+ (beta) + # Values: + # - AlwaysAllow: Unhealthy pods can be evicted + # - IfHealthyBudget: Unhealthy pods can only be evicted if enough healthy pods exist unhealthyPodEvictionPolicy: {{ $gateway.unhealthyPodEvictionPolicy }}internal/net/dialer.go (1)
339-347
: Enhance error handling for QUIC dial failuresWhile the QUIC dial implementation is functional, consider adding specific error type checks and handling for common QUIC connection failures (e.g., version negotiation failures, transport parameter errors).
Apply this diff to improve error handling:
if isQUICDial(network, addr) { conn, err = quic.DialContext(ctx, addr, d.tlsConfig) + if err != nil { + var qErr *quic.Error + if errors.As(err, &qErr) { + log.Warnf("QUIC dial failed with error code %d: %v", qErr.ErrorCode, err) + } + } } else {k8s/operator/helm/crds/valdrelease.yaml (1)
2161-2165
: New pod eviction policy enhances pod lifecycle management.The addition of
unhealthyPodEvictionPolicy
with optionsAlwaysAllow
andIfHealthyBudget
provides better control over pod eviction behavior. This is consistently implemented across agent, discoverer, gateway filter, gateway lb, mirror, and index manager components.Consider documenting the following aspects in the deployment guide:
- When to use each eviction policy option
- Impact on high availability and pod replacement behavior
- Best practices for different deployment scenarios
Also applies to: 4098-4102, 5627-5631, 7136-7140, 8334-8338, 14768-14772
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (2)
go.sum
is excluded by!**/*.sum
rust/Cargo.lock
is excluded by!**/*.lock
📒 Files selected for processing (55)
.gitfiles
(1 hunks)charts/vald-helm-operator/crds/valdrelease.yaml
(26 hunks)charts/vald/templates/agent/pdb.yaml
(1 hunks)charts/vald/templates/discoverer/pdb.yaml
(1 hunks)charts/vald/templates/gateway/filter/pdb.yaml
(1 hunks)charts/vald/templates/gateway/lb/pdb.yaml
(1 hunks)charts/vald/templates/gateway/mirror/pdb.yaml
(1 hunks)charts/vald/templates/manager/index/pdb.yaml
(1 hunks)charts/vald/values.schema.json
(26 hunks)charts/vald/values.yaml
(10 hunks)example/client/go.mod
(1 hunks)example/client/go.mod.default
(1 hunks)go.mod
(7 hunks)hack/go.mod.default
(1 hunks)internal/client/v1/client/discoverer/discover.go
(14 hunks)internal/config/grpc.go
(1 hunks)internal/config/net.go
(2 hunks)internal/errors/net.go
(1 hunks)internal/net/dialer.go
(5 hunks)internal/net/dialer_test.go
(3 hunks)internal/net/grpc/option.go
(1 hunks)internal/net/net.go
(4 hunks)internal/net/net_test.go
(1 hunks)internal/net/quic/conn.go
(1 hunks)internal/net/quic/listener.go
(1 hunks)internal/servers/server/server.go
(3 hunks)internal/servers/servers.go
(2 hunks)internal/servers/servers_test.go
(2 hunks)internal/sync/semaphore/semaphore_bench_test.go
(0 hunks)k8s/agent/pdb.yaml
(1 hunks)k8s/discoverer/configmap.yaml
(1 hunks)k8s/discoverer/deployment.yaml
(1 hunks)k8s/discoverer/pdb.yaml
(1 hunks)k8s/gateway/gateway/lb/configmap.yaml
(2 hunks)k8s/gateway/gateway/lb/deployment.yaml
(1 hunks)k8s/gateway/gateway/lb/pdb.yaml
(1 hunks)k8s/gateway/gateway/mirror/configmap.yaml
(1 hunks)k8s/gateway/gateway/mirror/deployment.yaml
(1 hunks)k8s/gateway/gateway/mirror/pdb.yaml
(1 hunks)k8s/index/job/correction/configmap.yaml
(2 hunks)k8s/index/job/creation/configmap.yaml
(1 hunks)k8s/index/job/save/configmap.yaml
(1 hunks)k8s/manager/index/configmap.yaml
(1 hunks)k8s/manager/index/deployment.yaml
(1 hunks)k8s/manager/index/pdb.yaml
(1 hunks)k8s/operator/helm/crds/valdrelease.yaml
(26 hunks)pkg/discoverer/k8s/service/discover.go
(13 hunks)pkg/index/job/correction/usecase/corrector.go
(0 hunks)pkg/manager/index/service/indexer.go
(2 hunks)pkg/manager/index/service/indexer_test.go
(9 hunks)tests/e2e/crud/crud_test.go
(3 hunks)tests/e2e/operation/operation.go
(2 hunks)versions/CMAKE_VERSION
(1 hunks)versions/PROMETHEUS_STACK_VERSION
(1 hunks)versions/actions/SOFTPROPS_ACTION_GH_RELEASE
(1 hunks)
💤 Files with no reviewable changes (2)
- internal/sync/semaphore/semaphore_bench_test.go
- pkg/index/job/correction/usecase/corrector.go
✅ Files skipped from review due to trivial changes (3)
- versions/CMAKE_VERSION
- versions/PROMETHEUS_STACK_VERSION
- versions/actions/SOFTPROPS_ACTION_GH_RELEASE
🚧 Files skipped from review as they are similar to previous changes (33)
- charts/vald/values.yaml
- example/client/go.mod
- example/client/go.mod.default
- go.mod
- hack/go.mod.default
- internal/client/v1/client/discoverer/discover.go
- internal/config/grpc.go
- internal/config/net.go
- internal/errors/net.go
- internal/net/dialer_test.go
- internal/net/grpc/option.go
- internal/net/net.go
- internal/net/net_test.go
- internal/net/quic/conn.go
- internal/net/quic/listener.go
- internal/servers/server/server.go
- internal/servers/servers.go
- k8s/discoverer/configmap.yaml
- k8s/discoverer/deployment.yaml
- k8s/gateway/gateway/lb/configmap.yaml
- k8s/gateway/gateway/lb/deployment.yaml
- k8s/gateway/gateway/mirror/configmap.yaml
- k8s/gateway/gateway/mirror/deployment.yaml
- k8s/index/job/correction/configmap.yaml
- k8s/index/job/creation/configmap.yaml
- k8s/index/job/save/configmap.yaml
- k8s/manager/index/configmap.yaml
- k8s/manager/index/deployment.yaml
- pkg/discoverer/k8s/service/discover.go
- pkg/manager/index/service/indexer.go
- pkg/manager/index/service/indexer_test.go
- tests/e2e/crud/crud_test.go
- tests/e2e/operation/operation.go
🔇 Additional comments (22)
charts/vald/values.schema.json (2)
2000-2000
: LGTM: DNS cache descriptions updated for protocol agnosticism
The removal of TCP-specific terminology from DNS cache descriptions is appropriate as it aligns with the addition of QUIC support, making the configuration more protocol-agnostic.
Also applies to: 2004-2004, 2008-2008, 2477-2477, 2481-2481, 2485-2485, 7477-7477, 7481-7481, 7485-7485, 9706-9706, 9710-9710, 9714-9714, 10054-10054, 10058-10058, 10062-10062, 12286-12286, 12290-12290, 12294-12294, 14287-14287, 14291-14291, 14295-14295, 14635-14635, 14639-14639, 14643-14643, 16629-16629, 16633-16633, 16637-16637, 16977-16977, 16981-16981, 16985-16985, 18666-18666, 18670-18670, 18674-18674, 19014-19014, 19018-19018, 19022-19022, 22336-22336, 22340-22340, 22344-22344, 22684-22684, 22688-22688, 22692-22692
3547-3551
: Review pod eviction policy implementation
The addition of unhealthyPodEvictionPolicy
with "AlwaysAllow" and "IfHealthyBudget" options provides fine-grained control over pod eviction. This is crucial for maintaining service availability during network transitions.
Also applies to: 6807-6811, 9396-9400, 11957-11961, 13982-13986, 25213-25217
k8s/agent/pdb.yaml (1)
32-32
: LGTM! Good addition of unhealthy pod handling.
The addition of unhealthyPodEvictionPolicy: IfHealthyBudget
is a good practice. This allows the cluster to evict unhealthy pods while still maintaining the disruption budget, which is particularly important for experimental features like QUIC that might need more aggressive pod replacement during issues.
k8s/discoverer/pdb.yaml (1)
32-32
: LGTM! Good addition for pod health management.
Setting unhealthyPodEvictionPolicy: AlwaysAllow
is a good practice as it allows quick replacement of problematic pods, helping maintain service stability even if it temporarily violates the PDB's disruption constraints.
k8s/gateway/gateway/lb/pdb.yaml (2)
32-32
: Ensure consistent PDB configuration across components.
Consider applying the same unhealthyPodEvictionPolicy
configuration to other PDB files in the project for consistency.
#!/bin/bash
# Find all PDB files and check for unhealthyPodEvictionPolicy
fd -e yaml -e yml | xargs grep -l "kind: PodDisruptionBudget" | xargs grep -l "unhealthyPodEvictionPolicy" || echo "Other PDBs might need updates"
32-32
: LGTM! Verify Kubernetes version compatibility.
The addition of unhealthyPodEvictionPolicy: AlwaysAllow
is a good practice as it allows quick eviction of unhealthy pods. This field requires Kubernetes v1.26+ (beta) or v1.27+ (stable).
k8s/manager/index/pdb.yaml (1)
32-32
: Verify Kubernetes version compatibility
The unhealthyPodEvictionPolicy
field was introduced in Kubernetes 1.26. This is a good addition as it helps maintain cluster health by allowing quick eviction of problematic pods.
k8s/gateway/gateway/mirror/pdb.yaml (1)
32-32
: 🛠️ Refactor suggestion
Consider using IfHealthyBudget
for safer pod eviction
The current setting of unhealthyPodEvictionPolicy: AlwaysAllow
allows unhealthy pods to be evicted regardless of PDB constraints. For a mirror gateway service, this aggressive eviction policy combined with maxUnavailable: 50%
could potentially lead to service disruption if multiple pods become unhealthy simultaneously.
Consider using IfHealthyBudget
instead, which only allows eviction of unhealthy pods when PDB constraints can be maintained by the remaining healthy pods.
- unhealthyPodEvictionPolicy: AlwaysAllow
+ unhealthyPodEvictionPolicy: IfHealthyBudget
Run this script to check the Kubernetes version compatibility and current deployment status:
charts/vald/templates/agent/pdb.yaml (2)
38-38
: Verify Kubernetes version compatibility for unhealthyPodEvictionPolicy
The unhealthyPodEvictionPolicy
field was introduced in Kubernetes v1.26 (beta) and v1.27 (stable). While the file correctly handles API version differences between policy/v1 and policy/v1beta1, we should ensure this field is only set for compatible Kubernetes versions.
Consider wrapping this field in a capabilities check similar to the API version check:
- unhealthyPodEvictionPolicy: {{ $agent.unhealthyPodEvictionPolicy }}
+ {{- if semverCompare ">=1.26-0" .Capabilities.KubeVersion.GitVersion }}
+ unhealthyPodEvictionPolicy: {{ $agent.unhealthyPodEvictionPolicy }}
+ {{- end }}
38-38
: Verify Helm values validation for unhealthyPodEvictionPolicy
The unhealthyPodEvictionPolicy
field only accepts two values: "AlwaysAllow" or "IfHealthyBudget". We should ensure proper validation in the Helm values schema.
Run the following script to check if values validation exists:
✅ Verification successful
Validation for unhealthyPodEvictionPolicy is properly implemented
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if values.schema.json includes validation for unhealthyPodEvictionPolicy
# Check for the field in values.schema.json
echo "Checking values.schema.json for unhealthyPodEvictionPolicy validation..."
rg -A 10 "unhealthyPodEvictionPolicy" "values.schema.json"
# Check default value in values.yaml
echo -e "\nChecking values.yaml for default value..."
rg "unhealthyPodEvictionPolicy" "values.yaml"
Length of output: 465
Script:
#!/bin/bash
# Description: Locate values.yaml and values.schema.json files and check for unhealthyPodEvictionPolicy validation
echo "Searching for values.yaml files in the repository..."
fd --type f --name "values.yaml"
echo -e "\nSearching for values.schema.json files in the repository..."
fd --type f --name "values.schema.json"
echo -e "\nChecking for unhealthyPodEvictionPolicy validation in values.schema.json files..."
fd --type f --name "values.schema.json" | xargs grep -H "unhealthyPodEvictionPolicy"
echo -e "\nChecking for default values in values.yaml files..."
fd --type f --name "values.yaml" | xargs grep -H "unhealthyPodEvictionPolicy"
Length of output: 1783
Script:
#!/bin/bash
# Description: Locate values.yaml and values.schema.json files and check for unhealthyPodEvictionPolicy validation
echo "Searching for values.yaml files in the repository..."
fd "values.yaml"
echo -e "\nSearching for values.schema.json files in the repository..."
fd "values.schema.json"
echo -e "\nChecking for unhealthyPodEvictionPolicy validation in values.schema.json files..."
fd "values.schema.json" | xargs grep -H "unhealthyPodEvictionPolicy"
echo -e "\nChecking for default values in values.yaml files..."
fd "values.yaml" | xargs grep -H "unhealthyPodEvictionPolicy"
Length of output: 7853
charts/vald/templates/manager/index/pdb.yaml (1)
38-38
: Verify template variable and K8s version compatibility
The unhealthyPodEvictionPolicy
field is a newer addition to PDB spec. Let's verify the configuration and compatibility.
charts/vald/templates/gateway/filter/pdb.yaml (1)
38-38
: Verify Kubernetes version compatibility and values.yaml
The unhealthyPodEvictionPolicy
field was introduced in Kubernetes 1.26. This change requires:
- Verification of minimum supported Kubernetes version
- Updates to values.yaml to document this new configuration option
charts/vald/templates/gateway/mirror/pdb.yaml (1)
38-38
: Ensure consistent configuration across all PDB templates
The unhealthyPodEvictionPolicy
should be consistently configured across all PDB templates in the Helm charts.
#!/bin/bash
# Description: Verify consistent PDB configuration across templates
# Check all PDB templates for the new field
echo "Checking PDB templates for unhealthyPodEvictionPolicy..."
fd -e yaml -e tpl . charts/vald/templates | xargs rg "PodDisruptionBudget" -l | xargs rg "unhealthyPodEvictionPolicy"
# Verify the gateway mirror values
echo "Checking gateway mirror configuration..."
rg "mirror.*unhealthyPodEvictionPolicy" charts/vald/values.yaml
charts/vald/templates/discoverer/pdb.yaml (1)
38-38
: 🛠️ Refactor suggestion
Verify Kubernetes version compatibility and add validation
The unhealthyPodEvictionPolicy
field was introduced in Kubernetes 1.26 and is only supported with policy/v1
PDBs. A few suggestions to improve the implementation:
- Add validation for the field value (should be either "AlwaysAllow" or "IfHealthyBudget")
- Consider making this field conditional based on Kubernetes version
- Document the minimum Kubernetes version requirement
Run the following script to check if this field is used consistently across other PDB templates and verify the values:
Consider adding validation using a helper template. Here's a suggested implementation:
{{- define "validateUnhealthyPodEvictionPolicy" -}}
{{- $valid := list "AlwaysAllow" "IfHealthyBudget" -}}
{{- $policy := . -}}
{{- if not (has $policy $valid) -}}
{{- fail (printf "unhealthyPodEvictionPolicy must be one of: %s" ($valid | join ", ")) -}}
{{- end -}}
{{- end -}}
Then use it in the PDB template:
- unhealthyPodEvictionPolicy: {{ $discoverer.unhealthyPodEvictionPolicy }}
+ {{- if and ($.Capabilities.APIVersions.Has "policy/v1") $discoverer.unhealthyPodEvictionPolicy }}
+ {{- include "validateUnhealthyPodEvictionPolicy" $discoverer.unhealthyPodEvictionPolicy }}
+ unhealthyPodEvictionPolicy: {{ $discoverer.unhealthyPodEvictionPolicy }}
+ {{- end }}
internal/servers/servers_test.go (1)
182-182
: LGTM! Enhanced error reporting.
The improved error message now includes both expected and actual error values, which will make debugging test failures much easier.
internal/net/dialer.go (2)
314-323
: LGTM: QUIC detection logic is well-implemented
The function correctly identifies QUIC connections by checking:
- UDP network type
- Valid host and port
- Non-DNS traffic (port != 53)
332-337
: LGTM: Robust network type validation
The code properly handles:
- Unknown network types by defaulting to TCP
- Empty addresses with appropriate error messages
.gitfiles (2)
Line range hint 1-1254
: LGTM!
The file is well-organized with proper copyright headers and maintains alphabetical ordering of files.
1254-1255
: Add test files for the QUIC implementation
The new QUIC implementation files need corresponding test files to maintain test coverage:
internal/net/quic/conn_test.go
internal/net/quic/listener_test.go
Let me verify if any test files exist for the QUIC implementation:
#!/bin/bash
# Description: Check for test files in the QUIC implementation
echo "=== Searching for QUIC test files ==="
fd -t f -p "internal/net/quic/.*_test\.go$"
# Also check test coverage requirements in other networking components
echo -e "\n=== Test coverage in other net components ==="
fd -t f -p "internal/net/.*/.*_test\.go$" | grep -v "quic"
charts/vald-helm-operator/crds/valdrelease.yaml (1)
2161-2165
: LGTM: Standard Kubernetes pod eviction policy configuration.
The addition of unhealthyPodEvictionPolicy
with standard Kubernetes options (AlwaysAllow
and IfHealthyBudget
) is correctly implemented across all relevant sections.
Also applies to: 4098-4102, 5627-5631, 7136-7140, 8334-8338, 14768-14772
k8s/operator/helm/crds/valdrelease.yaml (2)
1276-1281
: Skip comment about network protocol changes.
These changes are already covered by a past review comment that suggested adding QUIC protocol support.
2309-2314
: Verify QUIC protocol support is complete.
The network protocol enum has been updated to include QUIC support consistently across all components. This aligns with the PR objective of implementing gRPC over QUIC.
Run the following script to verify all network protocol configurations are updated:
Also applies to: 3191-3196, 4309-4314, 4514-4519, 4714-4719, 5837-5842, 6034-6039, 6233-6238, 7355-7360, 7459-7464, 8538-8543, 8735-8740, 8941-8946, 9859-9864, 10056-10061, 11012-11017, 11209-11214, 13147-13152, 13344-13349
✅ Verification successful
All network protocol enums include QUIC.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify network protocol configurations are consistent
# Test: Search for network protocol enum definitions. Expect: All should include QUIC option
# Check for any network protocol enums missing QUIC
rg -U "network:.*type: string.*enum:" -A 10 | grep -B 5 -A 5 "tcp"
Length of output: 66
Script:
#!/bin/bash
# Description: Verify all network protocol enums include QUIC
# Test: Search for network protocol enum definitions and check for QUIC inclusion
# Find all network protocol enum definitions
rg -U "network:\s*type:\s*string\s*enum:" -A 10 | grep -B 5 -A 5 -i "tcp"
# Additionally, check specifically for the inclusion of QUIC
missing_quic=$(rg -U "network:\s*type:\s*string\s*enum:" -A 10 | grep -B 5 -A 5 -i "quic" | wc -l)
if [ "$missing_quic" -eq 0 ]; then
echo "All network protocol enums include QUIC."
else
echo "Some network protocol enums are missing QUIC."
fi
Length of output: 200606
485ab60
to
c69860c
Compare
Signed-off-by: kpango <kpango@vdaas.org>
c69860c
to
c8c568d
Compare
Profile Report
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (2)
internal/net/dialer_test.go (1)
Line range hint
1428-1480
: Consider adding test cases for QUIC network type.Given that this PR adds QUIC support, consider adding test cases to verify error handling for QUIC-specific network types and addresses.
Example test case to add:
{ + name: "return error if invalid QUIC address", + args: args{ + network: "quic", + addr: "invalid_quic_address", + }, + fields: fields{ + tlsConfig: nil, + der: &net.Dialer{ + Timeout: time.Minute, + KeepAlive: time.Minute, + DualStack: true, + }, + }, + checkFunc: func(w want, got Conn, err error) error { + if !errors.Is(err, w.err) { + return errors.Errorf("got_error: \"%#v\",\n\t\t\t\twant: \"%#v\"", err, w.err) + } + if got != nil { + return errors.New("got is not nil") + } + return nil + }, + want: want{ + err: errors.ErrInvalidAddress("quic", "invalid_quic_address"), + }, +},charts/vald/values.yaml (1)
832-838
: Review DNS cache timing values for optimal performance.The DNS cache configuration uses relatively long durations:
- refresh_duration: 30m
- cache_expiration: 1h
These values are reasonable for stable environments but may need adjustment based on:
- DNS record TTL values in your environment
- Frequency of service updates/deployments
- Network stability requirements
Consider monitoring DNS resolution patterns in your environment to validate these cache durations are optimal for your use case.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (2)
go.sum
is excluded by!**/*.sum
rust/Cargo.lock
is excluded by!**/*.lock
📒 Files selected for processing (55)
.gitfiles
(1 hunks)charts/vald-helm-operator/crds/valdrelease.yaml
(26 hunks)charts/vald/templates/agent/pdb.yaml
(1 hunks)charts/vald/templates/discoverer/pdb.yaml
(1 hunks)charts/vald/templates/gateway/filter/pdb.yaml
(1 hunks)charts/vald/templates/gateway/lb/pdb.yaml
(1 hunks)charts/vald/templates/gateway/mirror/pdb.yaml
(1 hunks)charts/vald/templates/manager/index/pdb.yaml
(1 hunks)charts/vald/values.schema.json
(26 hunks)charts/vald/values.yaml
(10 hunks)example/client/go.mod
(1 hunks)example/client/go.mod.default
(1 hunks)go.mod
(7 hunks)hack/go.mod.default
(1 hunks)internal/client/v1/client/discoverer/discover.go
(14 hunks)internal/config/grpc.go
(1 hunks)internal/config/net.go
(2 hunks)internal/errors/net.go
(1 hunks)internal/net/dialer.go
(5 hunks)internal/net/dialer_test.go
(3 hunks)internal/net/grpc/option.go
(1 hunks)internal/net/net.go
(4 hunks)internal/net/net_test.go
(1 hunks)internal/net/quic/conn.go
(1 hunks)internal/net/quic/listener.go
(1 hunks)internal/servers/server/server.go
(3 hunks)internal/servers/servers.go
(2 hunks)internal/servers/servers_test.go
(2 hunks)internal/sync/semaphore/semaphore_bench_test.go
(0 hunks)k8s/agent/pdb.yaml
(1 hunks)k8s/discoverer/configmap.yaml
(1 hunks)k8s/discoverer/deployment.yaml
(1 hunks)k8s/discoverer/pdb.yaml
(1 hunks)k8s/gateway/gateway/lb/configmap.yaml
(2 hunks)k8s/gateway/gateway/lb/deployment.yaml
(1 hunks)k8s/gateway/gateway/lb/pdb.yaml
(1 hunks)k8s/gateway/gateway/mirror/configmap.yaml
(1 hunks)k8s/gateway/gateway/mirror/deployment.yaml
(1 hunks)k8s/gateway/gateway/mirror/pdb.yaml
(1 hunks)k8s/index/job/correction/configmap.yaml
(2 hunks)k8s/index/job/creation/configmap.yaml
(1 hunks)k8s/index/job/save/configmap.yaml
(1 hunks)k8s/manager/index/configmap.yaml
(1 hunks)k8s/manager/index/deployment.yaml
(1 hunks)k8s/manager/index/pdb.yaml
(1 hunks)k8s/operator/helm/crds/valdrelease.yaml
(26 hunks)pkg/discoverer/k8s/service/discover.go
(13 hunks)pkg/index/job/correction/usecase/corrector.go
(0 hunks)pkg/manager/index/service/indexer.go
(2 hunks)pkg/manager/index/service/indexer_test.go
(9 hunks)tests/e2e/crud/crud_test.go
(3 hunks)tests/e2e/operation/operation.go
(2 hunks)versions/CMAKE_VERSION
(1 hunks)versions/PROMETHEUS_STACK_VERSION
(1 hunks)versions/actions/SOFTPROPS_ACTION_GH_RELEASE
(1 hunks)
💤 Files with no reviewable changes (2)
- internal/sync/semaphore/semaphore_bench_test.go
- pkg/index/job/correction/usecase/corrector.go
🚧 Files skipped from review as they are similar to previous changes (47)
- charts/vald/templates/agent/pdb.yaml
- charts/vald/templates/discoverer/pdb.yaml
- charts/vald/templates/gateway/filter/pdb.yaml
- charts/vald/templates/gateway/lb/pdb.yaml
- charts/vald/templates/gateway/mirror/pdb.yaml
- charts/vald/templates/manager/index/pdb.yaml
- charts/vald/values.schema.json
- example/client/go.mod
- example/client/go.mod.default
- go.mod
- hack/go.mod.default
- internal/client/v1/client/discoverer/discover.go
- internal/config/grpc.go
- internal/config/net.go
- internal/errors/net.go
- internal/net/grpc/option.go
- internal/net/net.go
- internal/net/net_test.go
- internal/net/quic/conn.go
- internal/net/quic/listener.go
- internal/servers/server/server.go
- internal/servers/servers.go
- internal/servers/servers_test.go
- k8s/agent/pdb.yaml
- k8s/discoverer/configmap.yaml
- k8s/discoverer/deployment.yaml
- k8s/discoverer/pdb.yaml
- k8s/gateway/gateway/lb/configmap.yaml
- k8s/gateway/gateway/lb/deployment.yaml
- k8s/gateway/gateway/lb/pdb.yaml
- k8s/gateway/gateway/mirror/configmap.yaml
- k8s/gateway/gateway/mirror/deployment.yaml
- k8s/gateway/gateway/mirror/pdb.yaml
- k8s/index/job/correction/configmap.yaml
- k8s/index/job/creation/configmap.yaml
- k8s/index/job/save/configmap.yaml
- k8s/manager/index/configmap.yaml
- k8s/manager/index/deployment.yaml
- k8s/manager/index/pdb.yaml
- pkg/discoverer/k8s/service/discover.go
- pkg/manager/index/service/indexer.go
- pkg/manager/index/service/indexer_test.go
- tests/e2e/crud/crud_test.go
- tests/e2e/operation/operation.go
- versions/CMAKE_VERSION
- versions/PROMETHEUS_STACK_VERSION
- versions/actions/SOFTPROPS_ACTION_GH_RELEASE
🔇 Additional comments (15)
internal/net/dialer.go (6)
314-323
: LGTM! Clean implementation of QUIC detection.
The isQUICDial
function correctly identifies QUIC-eligible connections by checking:
- UDP network type
- Valid host and port
- Non-DNS traffic (port != 53)
332-337
: LGTM! Good defensive programming.
The code ensures valid network type and address before proceeding with the connection.
445-450
: LGTM! Resource leak fixed.
The error handling now properly closes connections and combines errors, addressing the resource leak issue from previous reviews.
473-482
: LGTM! Nil pointer dereference fixed.
The error logging now safely handles nil cache and provides more context when available.
339-342
: Add test coverage for QUIC functionality.
The QUIC integration needs test coverage for:
- QUIC connection attempts
- Fallback behavior
- Error scenarios
339-347
: Verify QUIC implementation across the codebase.
The QUIC connection logic looks correct, but we should verify:
- QUIC connection handling in upstream code
- Error handling in the QUIC package
- Proper fallback behavior
internal/net/dialer_test.go (3)
1428-1428
: LGTM: Standardized error handling for empty addresses.
The change standardizes the error handling by using errors.ErrInvalidAddress
for empty address cases, which improves consistency in error reporting.
1455-1455
: LGTM: Standardized error handling for invalid network types.
The change maintains consistency by using the same errors.ErrInvalidAddress
error type for invalid network scenarios.
1480-1480
: LGTM: Standardized error handling for empty network types.
The change completes the standardization of error handling by using errors.ErrInvalidAddress
for empty network cases as well.
charts/vald/values.yaml (1)
826-828
:
Add QUIC to supported network types to enable QUIC protocol support.
The network type enum should include QUIC to align with the PR's objective of adding QUIC support.
Apply this diff:
- # @schema {"name": "defaults.grpc.client.dial_option.net.network", "type": "string", "enum": ["tcp", "udp", "unix"]}
+ # @schema {"name": "defaults.grpc.client.dial_option.net.network", "type": "string", "enum": ["tcp", "udp", "unix", "quic"]}
Likely invalid or redundant comment.
charts/vald-helm-operator/crds/valdrelease.yaml (1)
2161-2165
: LGTM: UnhealthyPodEvictionPolicy configuration is properly defined.
The addition of UnhealthyPodEvictionPolicy with appropriate enum values (AlwaysAllow, IfHealthyBudget) across various components is correct and consistent with Kubernetes standards.
Also applies to: 4098-4102, 5627-5631, 7136-7140, 8334-8338, 14768-14772
k8s/operator/helm/crds/valdrelease.yaml (3)
2161-2165
: LGTM: Valid unhealthy pod eviction policy configuration.
The addition of unhealthyPodEvictionPolicy
with standard Kubernetes values AlwaysAllow
and IfHealthyBudget
is correct and follows Kubernetes conventions.
2309-2314
: Add QUIC protocol support consistently across all network configurations.
The same QUIC protocol addition is needed across all network configuration sections to maintain consistency.
Also applies to: 3191-3196, 4309-4314, 4514-4519, 5837-5842, 6034-6039, 6233-6238, 7355-7360, 7459-7464, 8538-8543, 8735-8740, 8941-8946, 9859-9864, 10056-10061, 11012-11017, 11209-11214, 13147-13152, 13344-13349
1276-1281
:
Add QUIC protocol support to network enums.
To fulfill the PR objective of adding QUIC support, the network protocol enums need to be updated to include quic
as a valid option.
Apply this diff:
network:
type: string
enum:
- tcp
- udp
- unix
+ - quic
Likely invalid or redundant comment.
.gitfiles (1)
1254-1255
: New QUIC files correctly added to '.gitfiles'
The inclusion of internal/net/quic/conn.go
and internal/net/quic/listener.go
in .gitfiles
is appropriate and ensures that these new QUIC implementation files are tracked correctly.
Signed-off-by: kpango <kpango@vdaas.org>
Signed-off-by: kpango <kpango@vdaas.org>
Signed-off-by: kpango kpango@vdaas.org
Description:
Vald has been working on minimizing the overhead of the network by enhancing the libraries around each network, including gRPC.
Recently, the final specification of QUIC has been fixed, and Vald would like to verify the approach of gRPC over QUIC, since it will be in practical use soon.
This PR contains the code to verify gRPC over QUIC using quic-go from netpoll and other Network I/O blocking solutions.
Related Issue:
How Has This Been Tested?:
Environment:
Types of changes:
Changes to Core Features:
Checklist:
Summary by CodeRabbit
Release Notes
New Features
network
property in multiple configurations, allowing users to specify network types (TCP, UDP, UNIX) for better flexibility.unhealthyPodEvictionPolicy
settings in various configurations to manage pod eviction based on health status.Bug Fixes
Documentation
Tests