Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Instrument Feast using Prometheus and OpenTelemetry #4366

Merged
merged 1 commit into from
Aug 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@
[![GitHub Release](https://img.shields.io/github/v/release/feast-dev/feast.svg?style=flat&sort=semver&color=blue)](https://github.com/feast-dev/feast/releases)

## Join us on Slack!
tsisodia10 marked this conversation as resolved.
Show resolved Hide resolved

👋👋👋 [Come say hi on Slack!](https://join.slack.com/t/feastopensource/signup)

## Overview
Expand Down Expand Up @@ -231,4 +230,4 @@ Thanks goes to these incredible people:

<a href="https://github.com/feast-dev/feast/graphs/contributors">
<img src="https://contrib.rocks/image?repo=feast-dev/feast" />
</a>
</a>
4 changes: 4 additions & 0 deletions infra/charts/feast-feature-server/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,12 @@ See [here](https://github.com/feast-dev/feast/tree/master/examples/python-helm-d
| imagePullSecrets | list | `[]` | |
| livenessProbe.initialDelaySeconds | int | `30` | |
| livenessProbe.periodSeconds | int | `30` | |
| metrics.enabled | bool | `false` | |
| metrics.otelCollector.endpoint | string | `""` | |
| metrics.otelCollector.port | int | `4317` | |
| nameOverride | string | `""` | |
| nodeSelector | object | `{}` | |
| otel_service.name | string | `"otelcol"` | |
| podAnnotations | object | `{}` | |
| podSecurityContext | object | `{}` | |
| readinessProbe.initialDelaySeconds | int | `20` | |
Expand Down
108 changes: 108 additions & 0 deletions infra/charts/feast-feature-server/opentelemetry.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
## Adding Monitoring
To add monitoring to the Feast Feature Server, follow these steps:

### Workflow

Feast instrumentation Using OpenTelemetry and Prometheus -
tokoko marked this conversation as resolved.
Show resolved Hide resolved
![Workflow](samples/workflow.png)

### Deploy Prometheus Operator
Follow the Prometheus Operator documentation to install the operator -
https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/user-guides/getting-started.md

### Deploy OpenTelemetry Operator
Before installing OTEL Operator, install `cert-manager` and validate the `pods` should spin up --
```
kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml
```

Follow the documentation for further installation steps -
https://github.com/open-telemetry/opentelemetry-operator

### Configure OpenTelemetry Collector
Add the OpenTelemetry Collector configuration under the metrics section in your values.yaml file.

Example values.yaml:

```
metrics:
enabled: true
otelCollector:
endpoint: "otel-collector.default.svc.cluster.local:4317" #sample
headers:
api-key: "your-api-key"
```

### Add instrumentation annotation and environment variables in the deployment.yaml

```
template:
metadata:
{{- with .Values.podAnnotations }}
annotations:
{{- toYaml . | nindent 8 }}
instrumentation.opentelemetry.io/inject-python: "true"
```

```
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: http://{{ .Values.service.name }}-collector.{{ .Release.namespace }}.svc.cluster.local:{{ .Values.metrics.endpoint.port}}
- name: OTEL_EXPORTER_OTLP_INSECURE
value: "true"
```

### Add checks
Add metric checks to all manifests and deployment file -

```
{{ if .Values.metrics.enabled }}
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
name: feast-instrumentation
spec:
exporter:
endpoint: http://{{ .Values.service.name }}-collector.{{ .Release.Namespace }}.svc.cluster.local:4318 # This is the default port for the OpenTelemetry Collector
env:
propagators:
- tracecontext
- baggage
python:
env:
- name: OTEL_METRICS_EXPORTER
value: console,otlp_proto_http
- name: OTEL_LOGS_EXPORTER
value: otlp_proto_http
- name: OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED
value: "true"
{{end}}
```

### Add manifests to the chart
Add Instrumentation, OpenTelemetryCollector, ServiceMonitors, Prometheus Instance and RBAC rules as provided in the [samples/](https://github.com/feast-dev/feast/tree/91540703c483f1cd03b534a1a45bc4ccdcf79f81/infra/charts/feast-feature-server/samples) directory.

For latest updates please refer the official repository - https://github.com/open-telemetry/opentelemetry-operator

### Deploy Feast
Deploy Feast and set `metrics` value to `true`.

Example -
```
helm install feast-release infra/charts/feast-feature-server --set metric=true --set feature_store_yaml_base64=""
```

## See logs
Once the opentelemetry is deployed, you can search the logs to see the required metrics -

```
oc logs otelcol-collector-0 | grep "Name: feast_feature_server_memory_usage\|Value: 0.*"
oc logs otelcol-collector-0 | grep "Name: feast_feature_server_cpu_usage\|Value: 0.*"
```
```
-> Name: feast_feature_server_memory_usage
Value: 0.579426
```
```
-> Name: feast_feature_server_cpu_usage
Value: 0.000000
```
19 changes: 19 additions & 0 deletions infra/charts/feast-feature-server/samples/instrumentation.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
name: feast-instrumentation
spec:
exporter:
endpoint: <endpoint> # eg: http://{{ .Values.service.name }}-collector.{{ .Release.Namespace }}.svc.cluster.local:4318
env:
propagators:
- tracecontext
- baggage
python:
env:
- name: OTEL_METRICS_EXPORTER
value: console,otlp_proto_http
- name: OTEL_LOGS_EXPORTER
value: otlp_proto_http
- name: OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED
value: "true"
53 changes: 53 additions & 0 deletions infra/charts/feast-feature-server/samples/otel-collector.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# API reference https://github.com/open-telemetry/opentelemetry-operator/blob/main/docs/api.md
# Refs for v1beta1 config: https://github.com/open-telemetry/opentelemetry-operator/issues/3011#issuecomment-2154118998
apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
name: otelcol
spec:
mode: statefulset
image: otel/opentelemetry-collector-contrib:0.102.1
targetAllocator:
enabled: true
serviceAccount: opentelemetry-targetallocator-sa
prometheusCR:
enabled: true
podMonitorSelector: {}
serviceMonitorSelector: {}
## If uncommented, only service monitors with this label will get picked up
# app: feast
config:
receivers:
otlp:
protocols:
grpc: {}
http: {}
prometheus:
config:
scrape_configs:
- job_name: 'otelcol-collector'
scrape_interval: 10s
static_configs:
- targets: [ '0.0.0.0:8888' ]

processors:
batch: {}

exporters:
logging:
verbosity: detailed

service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [logging]
metrics:
receivers: [otlp, prometheus]
processors: []
exporters: [logging]
logs:
receivers: [otlp]
processors: [batch]
exporters: [logging]
16 changes: 16 additions & 0 deletions infra/charts/feast-feature-server/samples/otel-sm.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app: feast
name: otel-sm-1
spec:
endpoints:
- port: metrics
namespaceSelector:
matchNames:
- <namespace> # helm value - {{ .Release.Namespace }}
selector:
matchLabels:
app.kubernetes.io/component: opentelemetry-collector
app.kubernetes.io/managed-by: opentelemetry-operator
15 changes: 15 additions & 0 deletions infra/charts/feast-feature-server/samples/prometheus.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
kind: Prometheus
metadata:
name: prometheus
spec:
evaluationInterval: 30s
podMonitorSelector:
matchLabels:
app: feast
portName: web
replicas: 1
scrapeInterval: 30s
serviceAccountName: prometheus-k8s
serviceMonitorSelector:
matchLabels:
app: feast
68 changes: 68 additions & 0 deletions infra/charts/feast-feature-server/samples/rbac.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: opentelemetry-targetallocator-sa
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: opentelemetry-targetallocator-role-1
annotations:
meta.helm.sh/release-name: "feast-release"
meta.helm.sh/release-namespace: "feast-val"
labels:
app.kubernetes.io/managed-by: "Helm"
rules:
- apiGroups:
- monitoring.coreos.com
resources:
- servicemonitors
- podmonitors
verbs:
- '*'
- apiGroups: [""]
resources:
- namespaces
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources:
- nodes
- nodes/metrics
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources:
- configmaps
verbs: ["get"]
- apiGroups:
- discovery.k8s.io
resources:
- endpointslices
verbs: ["get", "list", "watch"]
- apiGroups:
- networking.k8s.io
resources:
- ingresses
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: opentelemetry-targetallocator-rb-1
annotations:
meta.helm.sh/release-name: "feast-release"
meta.helm.sh/release-namespace: "feast-val"
labels:
app.kubernetes.io/managed-by: "Helm"
subjects:
- kind: ServiceAccount
name: opentelemetry-targetallocator-sa
namespace: <namespace> # helm value - {{ .Release.Namespace }}
roleRef:
kind: ClusterRole
name: opentelemetry-targetallocator-role-1
apiGroup: rbac.authorization.k8s.io
16 changes: 16 additions & 0 deletions infra/charts/feast-feature-server/samples/service-monitor.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app: feast
name: otel-sm
spec:
endpoints:
- port: metrics
namespaceSelector:
matchNames:
- <namespace> # helm value - {{ .Release.Namespace }}
selector:
matchLabels:
app.kubernetes.io/component: opentelemetry-collector
app.kubernetes.io/managed-by: opentelemetry-operator
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
13 changes: 12 additions & 1 deletion infra/charts/feast-feature-server/templates/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@ spec:
{{- with .Values.podAnnotations }}
annotations:
{{- toYaml . | nindent 8 }}
{{- if .Values.metrics.enabled }}
instrumentation.opentelemetry.io/inject-python: "true"
{{- end }}
{{- end }}
labels:
{{- include "feast-feature-server.selectorLabels" . | nindent 8 }}
Expand Down Expand Up @@ -48,10 +51,18 @@ spec:
- "feast"
- "serve_registry"
{{- else }}
{{- if .Values.metrics.enlabled }}
- "feast"
- "serve"
- "--metrics"
- "-h"
- "0.0.0.0"
{{- else }}
- "feast"
- "serve"
- "-h"
- "0.0.0.0"
{{- end }}
{{- end }}
ports:
- name: {{ .Values.feast_mode }}
Expand Down Expand Up @@ -88,4 +99,4 @@ spec:
{{- with .Values.tolerations }}
tolerations:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- end }}
6 changes: 6 additions & 0 deletions infra/charts/feast-feature-server/templates/service.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,5 +11,11 @@ spec:
targetPort: {{ .Values.feast_mode }}
protocol: TCP
name: http
{{- if .Values.metrics.enabled }}
- name: metrics
port: 8000
protocol: TCP
targetPort: 8000 # metrics port
{{- end }}
selector:
{{- include "feast-feature-server.selectorLabels" . | nindent 4 }}
6 changes: 6 additions & 0 deletions infra/charts/feast-feature-server/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,12 @@ imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""

metrics:
enabled: false
otelCollector:
endpoint: "" # sample endpoint: "otel-collector.default.svc.cluster.local:4317"
port: 4317

# feature_store_yaml_base64 -- [required] a base64 encoded version of feature_store.yaml
feature_store_yaml_base64: ""

Expand Down
Loading
Loading