Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add central cluster collector for tail sampling #443

Merged
merged 1 commit into from
Aug 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions otel-integration/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@

## OpenTelemtry-Integration

### v0.0.98 / 2024-08-29
- [Feat] Add a way to deploy central collector cluster for tail sampling

### v0.0.97 / 2024-08-19
- [Fix] ignore process name not found errors for hostmetrics process preset

Expand Down
7 changes: 6 additions & 1 deletion otel-integration/k8s-helm/Chart.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
apiVersion: v2
name: otel-integration
description: OpenTelemetry Integration
version: 0.0.97
version: 0.0.98
keywords:
- OpenTelemetry Collector
- OpenTelemetry Agent
Expand All @@ -24,6 +24,11 @@ dependencies:
version: "0.90.0"
repository: https://cgx.jfrog.io/artifactory/coralogix-charts-virtual
condition: opentelemetry-cluster-collector.enabled
- name: opentelemetry-collector
alias: opentelemetry-receiver
version: "0.90.0"
repository: https://cgx.jfrog.io/artifactory/coralogix-charts-virtual
condition: opentelemetry-receiver.enabled
- name: opentelemetry-collector
alias: opentelemetry-gateway
version: "0.90.0"
Expand Down
54 changes: 54 additions & 0 deletions otel-integration/k8s-helm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -245,6 +245,60 @@ This change will configure otel-agent pods to send span data to coralogix-opente

When running in Openshift make sure to set `distribution: "openshift"` in your `values.yaml`. When running in Windows environments, please use `values-windows-tailsampling.yaml` values file.

### Deploying Central Collector Cluster for Tail Sampling

If you want to deploy OpenTelemetry Collector in a seperate "central" Kubernetes Cluster, that receives telemetry data via OTLP receivers and does [Tail Sampling](https://opentelemetry.io/docs/concepts/sampling/#tail-sampling) you can install `otel-integration` using `central-tail-sampling-values.yaml` values file. Check the values file for configuration.

This will deploy two deployments:
- opentelemetry-receiver - responsible for receiving otlp data, pushing metrics and logs to Coralogix and loadbalancing spans to opentelemetry-gateway deployment.
- opentelemetry-gateway - a service that receives span data and does Tail Sampling decisions.

The opentelemetry-receiver will need to be exposed to other Kubernetes Clusters for sending data. You can do that by using service of type LoadBalancer, configuring Ingress object, or manually configuring your load balancer. Also, make sure to configure enough replicas and resource requests and limits to handle the load. Next, you will need to configure [tail sampling processor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/tailsamplingprocessor) policies with your custom tail sampling policies.

```bash
helm repo add coralogix-charts-virtual https://cgx.jfrog.io/artifactory/coralogix-charts-virtual

helm upgrade --install otel-coralogix-central-collector coralogix-charts-virtual/otel-integration \
--render-subchart-notes -f central-tail-sampling-values.yaml
```

Once you deploy it, you can validate by sending some otlp data to opentelemetry-receiver Service and checking Coralogix for spans. This can be done via telemetrygen:

```bash
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: telemetrygen-deployment
spec:
replicas: 1
selector:
matchLabels:
app: telemetrygen
template:
metadata:
labels:
app: telemetrygen
spec:
containers:
- name: telemetrygen
image: ghcr.io/open-telemetry/opentelemetry-collector-contrib/telemetrygen:latest
args:
- "traces"
- "--otlp-endpoint=coralogix-opentelemetry-receiver:4317"
- "--otlp-insecure"
- "--rate=10"
- "--duration=120s"
EOF
```

Next, you will need to configure regular `otel-integration` deployment to send data to Central Collector Cluster:

```bash
helm upgrade --install otel-coralogix-integration coralogix-charts-virtual/otel-integration \
--render-subchart-notes -f central-agent-values.yaml
```

#### Why am I getting ResourceExhausted errors when using Tail Sampling?

Typically, the errors look like this:
Expand Down
38 changes: 38 additions & 0 deletions otel-integration/k8s-helm/central-agent-values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
global:
domain: ""
clusterName: ""
defaultApplicationName: "otel"
defaultSubsystemName: "integration"
logLevel: "debug"
collectionInterval: "30s"
version: "0.0.97"

extensions:
kubernetesDashboard:
enabled: true

# set distribution to openshift for openshift clusters
distribution: ""
opentelemetry-agent:
enabled: true
config:
exporters:
otlp:
# configure the public endpoint here
endpoint: coralogix-opentelemetry-receiver:4317
# this is not needed if you have valid tls certificate fronting receivers
tls:
insecure: true
service:
pipelines:
traces:
exporters:
- otlp

opentelemetry-cluster-collector:
enabled: true
opentelemetry-agent-windows:
enabled: false
opentelemetry-gateway:
enabled: false

100 changes: 100 additions & 0 deletions otel-integration/k8s-helm/central-tail-sampling-values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
global:
domain: ""
clusterName: ""
defaultApplicationName: "otel"
defaultSubsystemName: "integration"
logLevel: "warn"
collectionInterval: "30s"

opentelemetry-receiver:
enabled: true
# Receiver needs to be exposed either via Service of type LoadBalancer or Ingress
service:
enabled: true
type: ClusterIP
# type: LoadBalancer
# loadBalancerIP: 1.2.3.4
# loadBalancerSourceRanges: []

# By default, Service of type 'LoadBalancer' will be created setting 'externalTrafficPolicy: Cluster'
# unless other value is explicitly set.
# Possible values are Cluster or Local (https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/#preserving-the-client-source-ip)
# externalTrafficPolicy: Cluster
ingress:
enabled: false
# annotations: {}
# ingressClassName: nginx
# hosts:
# - host: collector.example.com
# paths:
# - path: /
# pathType: Prefix
# port: 4318
# tls:
# - secretName: collector-tls
# hosts:
# - collector.example.com
# For production use-cases please increase replicas
# and resource requests and limits
replicaCount: 3
# resources:
# requests:
# cpu: 0.5
# memory: 256Mi
# limits:
# cpu: 2
# memory: 2G

presets:
loadBalancing:
enabled: true
routingKey: "traceID"
hostname: coralogix-opentelemetry-gateway
# dnsResolverInterval: 20s
# dnsResolverTimeout: 5s

config:
service:
pipelines:
traces:
exporters:
- loadbalancing

opentelemetry-gateway:
enabled: true
# For production use-cases please increase replicas
# and resource requests and limits
replicaCount: 3
# resources:
# requests:
# cpu: 0.5
# memory: 256Mi
# limits:
# cpu: 2
# memory: 2G

config:
processors:
tail_sampling:
# Update configuration here, with your settings and tail sampling policies
# Docs: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/tailsamplingprocessor
policies:
[
#{
# name: errors-policy,
# type: status_code,
# status_code: {status_codes: [ERROR]}
#},
{
name: randomized-policy,
type: probabilistic,
probabilistic: {sampling_percentage: 10}
},
]

opentelemetry-agent:
enabled: false
opentelemetry-cluster-collector:
enabled: false
opentelemetry-agent-windows:
enabled: false
Loading
Loading