Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom Probes #1105

Open
MrAlias opened this issue Sep 16, 2024 · 24 comments
Open

Custom Probes #1105

MrAlias opened this issue Sep 16, 2024 · 24 comments
Labels
enhancement New feature or request

Comments

@MrAlias
Copy link
Contributor

MrAlias commented Sep 16, 2024

There are many packages that are not instrumented by this project. Some open-source and others closed-source. There should be a way to configure a custom Probe definition that users can provide and would provide instrumentation for these unsupported packages.

@MrAlias MrAlias added the enhancement New feature or request label Sep 16, 2024
@RonFed
Copy link
Contributor

RonFed commented Sep 16, 2024

That's a great idea.
I think one of the steps in getting there is to refine our Probe interface, maybe worth opening another issue for that? I have some ideas for improving the existing interface.

@damemi
Copy link
Contributor

damemi commented Sep 16, 2024

Yeah I think a refined, exposed Probe interface inherently provides the mechanism for custom probes. But calling out that that is intentional is good to do

@MrAlias
Copy link
Contributor Author

MrAlias commented Sep 16, 2024

That's a great idea. I think one of the steps in getting there is to refine our Probe interface, maybe worth opening another issue for that? I have some ideas for improving the existing interface.

I'm happy to have this issue be the place we discuss the needed probe changes. 👍

@RonFed
Copy link
Contributor

RonFed commented Sep 16, 2024

I'll try to list the main points I was thinking about:

Changes to the current functions

Load(*link.Executable, *process.TargetDetails, config.Sampler) error

Currently, this function has 2 main purposes: 1) loading the eBPF programs and maps into the kernel. 2) attaching the programs to the relevant symbols via uprobes.

  • Splitting this function to 2 might be better: Load and Attach. This separation might help make the code clearer, better error handling, and some added functionality (loading some program once and attaching it to different places in runtime).
  • In the future this function will also be responsible for setting the initial configuration (for example sampling, attributes collection, size of perf buffer etc'). If we decide to split the function, this part is probably more fitting in the Attach function. In addition, the general configuration struct will need to be decided instead of passing the sampling config alone.
  • Passing context here.

Run(eventsChan chan<- *Event)

  • The event type will probably change to something from pdata as was suggested.
  • This function might not be relevant for all probes. I'll write more on that in the next section.
  • Do we want to pass context here? currently the cancelation of this function is done by closing the perf buffer which causes the reading loop to exit.

New functions suggestions

Status() Status

This will return the status of the probe (Running, closed, error) It will be useful for constructing a higher API which will report a general status of all the probes.

SupportedConfiguration() Config

Each probe can have a different set of configurations it supports (for example: collecting HTTP headers, including the url.path in span name, collecting db.query.text, sampling by kafka topic, ...). This function will alow each probe to declare which configuration it supports.

ApplyConfig(Config)

Changing the configuration of the probe, by applying the supplied one.

Different types of probes

Our current interface assumes each probe will report events. This is not necessarily the case. For example, in #954 we want to have a probe which will write some value to a boolean variable. This kind of probe acts as a Job - it has a dedicated task and it should not be running in the background. Maybe we want to have a base Probe interface, and 2 additional interfaces which will embed it: ReportingProbe and JobProbe?

@damemi
Copy link
Contributor

damemi commented Sep 16, 2024

@RonFed this is a great overview, thanks for writing that up

  • +1 to the idea of splitting Load and Attach. I think having atomic operations is much clearer and allows for more extension points. We can always have a LoadAndAttach later for syntactic sugar if it's that convenient for users
  • +1 to passing context in Run() rather than implicitly exiting when the perf buffer is closed. But, could this lead to the perf buffer never being cleaned up?
  • What are you thinking for some use cases for the Status() function?
  • What's the purpose of SupportedConfiguration()? Would I call that before instantiating a Probe, then cast it to the type of config for that Probe? Couldn't each Probe just export a Config struct that you can use directly?

Different types of probes makes sense too. Could you generalize it even further to something like SyncProbe (does an action) and AsyncProbe (listens for events)?

@damemi
Copy link
Contributor

damemi commented Sep 17, 2024

From the sig meeting, my thinking of basing Probe config off the k8s scheduler framework (specifically the Plugin API)

If we have 2 types of probe ProbeA and ProbeB we could (roughly) have the following api:

So our root API would provide (something like)

// go.otel.io/auto/api
// Base types for probe and config
type Config interface {
  Package() string
}

type Probe interface {
  ApplyConfig(Config)
}

And we also require a standard New() function that's used by our auto-inst SDK to create the generic probe interface and call functions like Load() and Attach()

Then a user (or us) could import it and implement their custom probe

// github.com/me/foo-auto
// ProbeA implementations
type ProbeA interface {
  Probe
  DoProbeAThing()
}

type ProbeAConfig struct {
  CustomA string
  CustomSetting int
}

func (a *ProbeAConfig) Package() string {
  return a.CustomA
}

func New(otelauto.Config probeConfig) otelauto.Probe {
  probeAconfig := probeConfig.(*ProbeAConfig)
  doThing(probeAconfig.CustomSetting)
  return ProbeA{...}
}

There's some examples of this in place in this scheduler repo. I've just always thought it was a cool approach and think it fits here too

@RonFed
Copy link
Contributor

RonFed commented Sep 18, 2024

@damemi I'm not sure I understood your suggestion. Assuming our current probe for simplicity, can you give an example of how configuration of HTTP or DB will look?

@damemi
Copy link
Contributor

damemi commented Sep 21, 2024

@RonFed I'll try to put together a more concrete prototype, I was looking into it and got kind of sidetracked just thinking about the overall structure.

On that note, I have some ideas of what the final layout could look like for the Probe api that might help as an end goal:

  • Move /internal/pkg/instrumentation/probe to a root /pkg/probe
    • ie go.opentelemetry.io/auto/pkg/probe
    • It makes sense to keep the Probe API decoupled from the entire Instrumentation root package so that custom probes can be defined in isolation and imported without depending on the full auto-instrumentation SDK. In other words, users (and us) should be able to write a Probe that is entirely its own module (similar to collector-contrib)
  • As proof, move our default probes from /internal/pkg/instrumentation/bpf to a root /probes
    • for example go.opentelemetry.io/auto/probes/net/http
    • If we define a modular Probe api that's done right then our default probes should also be built with it. This detangles our codebase and serves as documentation for writing custom probes
    • This also provides them externally as a "probe library". We don't even need to worry about an ownership model for custom probes if users can define them as their own modules in their own repos (unless we want to offer a "contrib build" which I don't think is on the table right now)
  • Finally, wire probes through NewInstrumentation()

Something like:

import (
  "go.opentelemetry.io/auto"
  "go.opentelemetry.io/auto/probes/net/http"
  "github.com/damemi/customprobe"
)

func main() {
  inst, err := auto.NewInstrumentation(ctx,
    auto.WithProbes(
      http.New(),
      customprobe.New())
  )
...
}

Is this kind of what people were thinking for a public Probe api?

@RonFed
Copy link
Contributor

RonFed commented Sep 21, 2024

@damemi Yes, that is what I was thinking about as well in terms of our end goal in this issue.

@damemi
Copy link
Contributor

damemi commented Sep 23, 2024

@RonFed sounds good, so to show what I'm saying I did an example with db/http here: #1123

Could you show what you mean for a SupportedConfig() method like you proposed? I might just be missing the use case for it, or maybe we're talking about the same thing

@damemi
Copy link
Contributor

damemi commented Sep 23, 2024

as part of this (and shifting Probes to NewInstrumentation()) we would also define a method signature for a probe's New() func that enforces passing probe.Config as an argument.

@RonFed
Copy link
Contributor

RonFed commented Sep 24, 2024

@damemi The configuration example looks great to me.
Is the package function in the config interface just an example? I agree with the need for having a config interface, just not sure about its functions.
My initial idea was to have a common struct to all the probes (including some generic maps) - in that scenario, the SupportedConfig() function will return the keys of the map - letting the user know which configurations are available.
However, the approach in your PR is better so that function is not necessary if we'll have all the configuration structs exported.

@damemi
Copy link
Contributor

damemi commented Sep 24, 2024

@RonFed yeah I just used Package() as an example of something that all Probes might need to expose to the Instrumentation machinery (in this case I used it just for logging).

Overall this is really pretty much exactly how the Collector does config for different components. It's kind of similar to the k8s scheduler like I mentioned.

But yes I think it's better to have the configuration exposed in a concrete type that's owned by the Probe itself, so that's the main idea with this approach

@MrAlias
Copy link
Contributor Author

MrAlias commented Sep 24, 2024

SIG meeting notes:

  • We need to decouple the structfield.ID in the Manifest.
    • This needs to look at offset generation
    • Possibly move offsets into a per-probe generation which would require a big overhaul of the offset generator.
  • The Run method needs to be updated to transport ptrace.Traces

@MrAlias
Copy link
Contributor Author

MrAlias commented Sep 24, 2024

An interesting thought: with custom probes, this opens up the possibility to start instrumenting binaries other than Go (i.e. Rust, C++). Especially if the offsets are self contained in the probe.

@RonFed
Copy link
Contributor

RonFed commented Oct 19, 2024

@damemi I think we can try and add the configuration layer similar to your example as a first step.
This will need to be integrated with the definitions at provider.go
WDYT?

@MrAlias
Copy link
Contributor Author

MrAlias commented Nov 5, 2024

The current Run and Close method is defined as follow:

type Probe interface {
	// ...
	Run(tracesChan chan<- ptrace.ScopeSpans)
	Close() error
}

Issues

The channel passed to Run is closed by another process after it calls Close

When a process wants to close probes it first calls Close and then when that call has returned it assumes Run is complete so it can close chan. The telemetryChan is not guaranteed to be unused though. There may be incorrect implementations or just concurrency patterns that mean something is still trying to be send on the telemetryChan.

Sending on a closed channel will cause a panic. Therefore, this design should be changed.

Instead of having Run receive a write-only channel, have it return a read-only channel it is writing to in a spawned goroutine. When Close is called, and the probe is sure it is done writing to the returned channel, it can close that channel and ensure no panic is raised.

The channel currently only supports trace telemetry

If the Probe produces log, metric, or other telemetry in the future it will need a forward compatible way to send this data.

Instead of having Run use a channel defined on ptrace.ScopeSpans, introduce a new probe.Telemetry type:

// Telemetry is the data read by a probe.
type Telemetry struct {
	ScopeSpans ptrace.ScopeSpans
}

This new type can have fields added for additional signals when they are supported.

Proposal

The Run method would become:

type Probe interface {
	// ...
	Run() <-chan Telemetry
	Close() error
}

@MrAlias
Copy link
Contributor Author

MrAlias commented Nov 5, 2024

// Telemetry is the data read by a probe.
type Telemetry struct {
	ScopeSpans ptrace.ScopeSpans
}

Having this include an error may also help communicate errors from the probe. For example:

// Telemetry is the data read by a probe.
type Telemetry struct {
	ScopeSpans ptrace.ScopeSpans

	// Error is the error encountered when reading the telemetry data.
	Error error
}

@MrAlias
Copy link
Contributor Author

MrAlias commented Nov 6, 2024

Proposal

The Run method would become:

type Probe interface {
	// ...
	Run() <-chan Telemetry
	Close() error
}

Based on #1248, this proposal should be revised to:

type Probe interface {
	// ...
	Run(func(Telemetry))
	Close() error
}

@damemi
Copy link
Contributor

damemi commented Nov 18, 2024

@MrAlias and I were talking about this last week, and I think the points we came up with for the design were (let me know if I missed anything):

  • It should be rich enough to cover different Probes and bpf program types
  • It should be extensible and generic enough to cover use cases we haven't thought of yet
  • Probes are the first API we should work out in our broader library refactor, because they unlock design in other areas (like the Manager and OTel Pipeline)

Tyler suggested a factory pattern for this, and I agree that would probably check all the boxes. I can work on a POC for that this week. I'd like to implement this in a way that first wraps the current Probes so that we can transition one by one in smaller PRs.

@MrAlias
Copy link
Contributor Author

MrAlias commented Nov 18, 2024

The factory pattern will be good to start internally. To fully support this from external users we will need to support multi-target running (#197).

@RonFed
Copy link
Contributor

RonFed commented Nov 19, 2024

I think the Probe interface should have a way of stating which versions it supports -
This will help with cases like #1302

@MrAlias
Copy link
Contributor Author

MrAlias commented Nov 19, 2024

I think the Probe interface should have a way of stating which versions it supports - This will help with cases like #1302

Is this something you see the Manifest returned from a probe handling?

@RonFed
Copy link
Contributor

RonFed commented Nov 19, 2024

Is this something you see the Manifest returned from a probe handling?

Yea, that makes sense. We can add library version support and Go version support as part of the Manifest.
That way the manager can figure out what can be loaded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: No status
Development

No branches or pull requests

3 participants