Skip to content

Commit

Permalink
feat: add central cluster collector for tail sampling (#443)
Browse files Browse the repository at this point in the history
  • Loading branch information
povilasv committed Aug 29, 2024
1 parent 3ae53fa commit aeca198
Show file tree
Hide file tree
Showing 6 changed files with 386 additions and 2 deletions.
3 changes: 3 additions & 0 deletions otel-integration/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@

## OpenTelemtry-Integration

### v0.0.98 / 2024-08-29
- [Feat] Add a way to deploy central collector cluster for tail sampling

### v0.0.97 / 2024-08-19
- [Fix] ignore process name not found errors for hostmetrics process preset

Expand Down
7 changes: 6 additions & 1 deletion otel-integration/k8s-helm/Chart.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
apiVersion: v2
name: otel-integration
description: OpenTelemetry Integration
version: 0.0.97
version: 0.0.98
keywords:
- OpenTelemetry Collector
- OpenTelemetry Agent
Expand All @@ -24,6 +24,11 @@ dependencies:
version: "0.90.0"
repository: https://cgx.jfrog.io/artifactory/coralogix-charts-virtual
condition: opentelemetry-cluster-collector.enabled
- name: opentelemetry-collector
alias: opentelemetry-receiver
version: "0.90.0"
repository: https://cgx.jfrog.io/artifactory/coralogix-charts-virtual
condition: opentelemetry-receiver.enabled
- name: opentelemetry-collector
alias: opentelemetry-gateway
version: "0.90.0"
Expand Down
54 changes: 54 additions & 0 deletions otel-integration/k8s-helm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -245,6 +245,60 @@ This change will configure otel-agent pods to send span data to coralogix-opente

When running in Openshift make sure to set `distribution: "openshift"` in your `values.yaml`. When running in Windows environments, please use `values-windows-tailsampling.yaml` values file.

### Deploying Central Collector Cluster for Tail Sampling

If you want to deploy OpenTelemetry Collector in a seperate "central" Kubernetes Cluster, that receives telemetry data via OTLP receivers and does [Tail Sampling](https://opentelemetry.io/docs/concepts/sampling/#tail-sampling) you can install `otel-integration` using `central-tail-sampling-values.yaml` values file. Check the values file for configuration.

This will deploy two deployments:
- opentelemetry-receiver - responsible for receiving otlp data, pushing metrics and logs to Coralogix and loadbalancing spans to opentelemetry-gateway deployment.
- opentelemetry-gateway - a service that receives span data and does Tail Sampling decisions.

The opentelemetry-receiver will need to be exposed to other Kubernetes Clusters for sending data. You can do that by using service of type LoadBalancer, configuring Ingress object, or manually configuring your load balancer. Also, make sure to configure enough replicas and resource requests and limits to handle the load. Next, you will need to configure [tail sampling processor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/tailsamplingprocessor) policies with your custom tail sampling policies.

```bash
helm repo add coralogix-charts-virtual https://cgx.jfrog.io/artifactory/coralogix-charts-virtual
helm upgrade --install otel-coralogix-central-collector coralogix-charts-virtual/otel-integration \
--render-subchart-notes -f central-tail-sampling-values.yaml
```

Once you deploy it, you can validate by sending some otlp data to opentelemetry-receiver Service and checking Coralogix for spans. This can be done via telemetrygen:

```bash
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: telemetrygen-deployment
spec:
replicas: 1
selector:
matchLabels:
app: telemetrygen
template:
metadata:
labels:
app: telemetrygen
spec:
containers:
- name: telemetrygen
image: ghcr.io/open-telemetry/opentelemetry-collector-contrib/telemetrygen:latest
args:
- "traces"
- "--otlp-endpoint=coralogix-opentelemetry-receiver:4317"
- "--otlp-insecure"
- "--rate=10"
- "--duration=120s"
EOF
```

Next, you will need to configure regular `otel-integration` deployment to send data to Central Collector Cluster:

```bash
helm upgrade --install otel-coralogix-integration coralogix-charts-virtual/otel-integration \
--render-subchart-notes -f central-agent-values.yaml
```

#### Why am I getting ResourceExhausted errors when using Tail Sampling?

Typically, the errors look like this:
Expand Down
38 changes: 38 additions & 0 deletions otel-integration/k8s-helm/central-agent-values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
global:
domain: ""
clusterName: ""
defaultApplicationName: "otel"
defaultSubsystemName: "integration"
logLevel: "debug"
collectionInterval: "30s"
version: "0.0.97"

extensions:
kubernetesDashboard:
enabled: true

# set distribution to openshift for openshift clusters
distribution: ""
opentelemetry-agent:
enabled: true
config:
exporters:
otlp:
# configure the public endpoint here
endpoint: coralogix-opentelemetry-receiver:4317
# this is not needed if you have valid tls certificate fronting receivers
tls:
insecure: true
service:
pipelines:
traces:
exporters:
- otlp

opentelemetry-cluster-collector:
enabled: true
opentelemetry-agent-windows:
enabled: false
opentelemetry-gateway:
enabled: false

100 changes: 100 additions & 0 deletions otel-integration/k8s-helm/central-tail-sampling-values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
global:
domain: ""
clusterName: ""
defaultApplicationName: "otel"
defaultSubsystemName: "integration"
logLevel: "warn"
collectionInterval: "30s"

opentelemetry-receiver:
enabled: true
# Receiver needs to be exposed either via Service of type LoadBalancer or Ingress
service:
enabled: true
type: ClusterIP
# type: LoadBalancer
# loadBalancerIP: 1.2.3.4
# loadBalancerSourceRanges: []

# By default, Service of type 'LoadBalancer' will be created setting 'externalTrafficPolicy: Cluster'
# unless other value is explicitly set.
# Possible values are Cluster or Local (https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/#preserving-the-client-source-ip)
# externalTrafficPolicy: Cluster
ingress:
enabled: false
# annotations: {}
# ingressClassName: nginx
# hosts:
# - host: collector.example.com
# paths:
# - path: /
# pathType: Prefix
# port: 4318
# tls:
# - secretName: collector-tls
# hosts:
# - collector.example.com
# For production use-cases please increase replicas
# and resource requests and limits
replicaCount: 3
# resources:
# requests:
# cpu: 0.5
# memory: 256Mi
# limits:
# cpu: 2
# memory: 2G

presets:
loadBalancing:
enabled: true
routingKey: "traceID"
hostname: coralogix-opentelemetry-gateway
# dnsResolverInterval: 20s
# dnsResolverTimeout: 5s

config:
service:
pipelines:
traces:
exporters:
- loadbalancing

opentelemetry-gateway:
enabled: true
# For production use-cases please increase replicas
# and resource requests and limits
replicaCount: 3
# resources:
# requests:
# cpu: 0.5
# memory: 256Mi
# limits:
# cpu: 2
# memory: 2G

config:
processors:
tail_sampling:
# Update configuration here, with your settings and tail sampling policies
# Docs: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/tailsamplingprocessor
policies:
[
#{
# name: errors-policy,
# type: status_code,
# status_code: {status_codes: [ERROR]}
#},
{
name: randomized-policy,
type: probabilistic,
probabilistic: {sampling_percentage: 10}
},
]

opentelemetry-agent:
enabled: false
opentelemetry-cluster-collector:
enabled: false
opentelemetry-agent-windows:
enabled: false
Loading

0 comments on commit aeca198

Please sign in to comment.