Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added OpenTelemetry instrumentation #738

Merged
merged 8 commits into from
Nov 22, 2019

Conversation

jpkrohling
Copy link
Contributor

@jpkrohling jpkrohling commented Oct 29, 2019

This PR adds OpenTelemetry instrumentation to the Jaeger Operator. It also allows the operator to auto-provision a Jaeger instance. A new sidecar was added to the deploy/operator.yaml, pointing to the Jaeger instance that will be provisioned by the operator. As the agent is resilient enough, all spans that are generated via the bootstrap are collected:
image

And this is how the reconciliation loop looks like for the auto-provisioned CR:
image

Closes #737.

Signed-off-by: Juraci Paixão Kröhling juraci@kroehling.de

@jpkrohling jpkrohling requested a review from objectiser October 29, 2019 14:59
@jpkrohling
Copy link
Contributor Author

I built an image based on this branch: quay.io/jpkroehling/jaeger-operator:737-Instrument-Operator. To test this PR, just replace the image in the operator.yaml and deploy the operator as usual.

@jpkrohling jpkrohling changed the title Added OpenTelemetry instrumentation WIP - Added OpenTelemetry instrumentation Oct 31, 2019
@jpkrohling jpkrohling force-pushed the 737-Instrument-Operator branch from 82d1fd2 to 4695441 Compare November 20, 2019 11:32
Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
@jpkrohling jpkrohling force-pushed the 737-Instrument-Operator branch from 4695441 to 785c3e6 Compare November 22, 2019 08:50
@jpkrohling jpkrohling changed the title WIP - Added OpenTelemetry instrumentation Added OpenTelemetry instrumentation Nov 22, 2019
Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
@jpkrohling jpkrohling force-pushed the 737-Instrument-Operator branch from b4eba41 to aaaf9cc Compare November 22, 2019 08:55
Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
@jpkrohling
Copy link
Contributor Author

I just ran this with OLM and could confirm that it works fine. During my first test, I didn't get the bootstrap trace, but if I kill the pod, the new pod does report a bootstrap trace. Here are the YAML files used to create the operator source/group/subscription:

---
apiVersion: operators.coreos.com/v1
kind: OperatorSource
metadata:
  name: jpkroehling-operators
  namespace: marketplace
spec:
  type: appregistry
  endpoint: https://quay.io/cnr
  registryNamespace: jpkroehling
---
apiVersion: operators.coreos.com/v1alpha2
kind: OperatorGroup
metadata:
  name: jpkroehling-operatorgroup
  namespace: marketplace
spec:
  targetNamespaces:
  - marketplace
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: jpkroehling-subsription
  namespace: marketplace
spec:
  channel: stable
  name: jaeger
  source: jpkroehling-operators
  sourceNamespace: marketplace

And here's the CSV: https://paste.centos.org/view/c7b0cfba

Copy link
Contributor

@objectiser objectiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review, will continue later.

deploy/operator.yaml Outdated Show resolved Hide resolved
cfg, err := config.GetConfig()
if err != nil {
span.SetStatus(codes.Internal)
span.SetAttribute(key.String("error", err.Error()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a quick scan of the otel docs, couldn't find a standard definition for error attribute, so unclear whether should be a boolean type as in opentracing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something for @pavolloffay perhaps.

pkg/cmd/start/provision.go Outdated Show resolved Hide resolved
pkg/cmd/start/provision.go Outdated Show resolved Hide resolved
pkg/cmd/start/provision.go Outdated Show resolved Hide resolved
Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
Copy link
Contributor

@objectiser objectiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

End of first pass. Looks good - couple of comments:

  1. Wondering whether some/all of the operation names may need more scope - to make it clear what part of the code they relate to, e.g. upgrade.ManagedInstances rather than just ManagedInstances.
  2. I think only the dependencies processing actually added extra attributes - because it was processing of a loop with multiple dependencies. Possibly not for this initial PR, but it might be useful having additional information about the data being passed to the functions, as attributes.

pkg/controller/jaeger/elasticsearch.go Show resolved Hide resolved
pkg/controller/jaeger/jaeger_controller.go Show resolved Hide resolved
pkg/controller/jaeger/jaeger_controller.go Outdated Show resolved Hide resolved
pkg/strategy/controller.go Outdated Show resolved Hide resolved
@jpkrohling
Copy link
Contributor Author

Wondering whether some/all of the operation names may need more scope - to make it clear what part of the code they relate to, e.g. upgrade.ManagedInstances rather than just ManagedInstances.

Good point. Looking at the screenshots, I have the feeling that the context is indeed there, given that the parent spans do have upgrade in the name. On the other hand, the operation name is already big, and we can probably use slashes / to add more context. Like: upgrade/ManagedInstances, which yields an operation name like operator/bootstrap/upgrade/ManagedInstances.

I think only the dependencies processing actually added extra attributes - because it was processing of a loop with multiple dependencies. Possibly not for this initial PR, but it might be useful having additional information about the data being passed to the functions, as attributes.

+1, but I'd do that on an individual basis

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
test/operator.yaml Outdated Show resolved Hide resolved
Copy link
Contributor

@objectiser objectiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one minor change required on test operator.yaml.

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
@jpkrohling
Copy link
Contributor Author

Merging, e2e passed locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Instrument Jaeger Operator
2 participants