Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't execute otel collector if configuration is "noop" #33680

Closed
cforce opened this issue Jun 20, 2024 · 13 comments · Fixed by #35430
Closed

Don't execute otel collector if configuration is "noop" #33680

cforce opened this issue Jun 20, 2024 · 13 comments · Fixed by #35430
Labels

Comments

@cforce
Copy link

cforce commented Jun 20, 2024

Component(s)

cmd/opampsupervisor

Is your feature request related to a problem? Please describe.

Reduce overhead of overall runtime footprint in large fleets with a default of "wait and listen for commands" but being not operational sending telemetry, do not execute the collector

Describe the solution you'd like

Imagine a scenario where the supervisor is installed as basic part of a host (container or device) broadcasting DNS and searching for an OPAMP Backend until connected.
There no local "non default" config for the collector setup, just the default "Noop" cfg which would not send any telemetry but health of the collector.

The supervisor is just waiting to get connected to opamap Backend and afterwards waiting for a configuration update from remote for the collector.

To reduce overhead until collect receives a "job" the supervisor shall no execute the collector at all. As soon as an config is sent which overwrites noop default, only then execution (deamon) shall be started

Describe alternatives you've considered

No response

Additional context

No response

@cforce cforce added enhancement New feature or request needs triage New item requiring triage labels Jun 20, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@BinaryFissionGames
Copy link
Contributor

BinaryFissionGames commented Jun 20, 2024

This would be nice improvement in more resource restricted environments.

I also think, beyond this being an initial state, it would be nice if the opamp server could also send an empty config (e.g. an empty configmap) to stop running the collector until it gets another config.

@cforce
Copy link
Author

cforce commented Jun 30, 2024

Currently, bootstrapping operates as described below (excerpt from the documentation):

Bootstrapping

To obtain the remote configuration from the OpAMP Backend, the Supervisor must send an AgentDescription to the Backend. Initially, the Supervisor doesn't have this information because the AgentDescription becomes available only after the Collector process is started and the AgentDescription is sent from the opamp extension to the Supervisor. However, it's impossible to start the Collector without a configuration.

To address this issue, the Supervisor starts the Collector with a "noop" configuration that doesn't collect any data but allows the opamp extension to start. The "noop" configuration consists of a single pipeline with an OTLP receiver listening on a random port, a debug exporter, and the opamp extension. The purpose of this "noop" configuration is to ensure that the Collector starts and the opamp extension communicates with the Supervisor.

Once the initial Collector launch is successful and the Supervisor receives the remote configuration, the Supervisor restarts the Collector with the new configuration. The new configuration is also cached by the Supervisor in a local file. This caching means subsequent restarts no longer need to use the "noop" configuration. It also allows the Supervisor to start the Collector without waiting for the OpAMP Backend to provide the remote configuration, mitigating any OpAMP Backend unavailability.


I don't understand why the AgentDescription needs to be managed specifically by the Collector, requiring the Collectors to start in order to connect the Supervisor to an OpAMP Backend. This dependency seems to have only disadvantages, especially if the Supervisor needs to manage multiple Collectors. This will be a particular limitation if the Supervisor has to manage many Collectors simultaneously (see issue #33682).

Why is the Collector considered the agent in a setup where the Supervisor is used? It would make more sense for the Supervisor to be the agent of OpAMP, with any connected Collector being transparent to the OpAMP Backend. Collectors should represent "any" host system as part of a subsystem registered via the Supervisor. In IoT systems, the Supervisor would act as an Hub, which serves as a gateway for all connected Collectors to connect to an OpAMP Backend that they cannot connect to directly for various reasons.

@tigrannajaryan
Copy link
Member

Why is the Collector considered the agent in a setup where the Supervisor is used? It would make more sense for the Supervisor to be the agent of OpAMP, with any connected Collector being transparent to the OpAMP Backend. Collectors should represent "any" host system as part of a subsystem registered via the Supervisor. In IoT systems, the Supervisor would act as an Hub, which serves as a gateway for all connected Collectors to connect to an OpAMP Backend that they cannot connect to directly for various reasons.

The assumption is that users want to manage their Collector not their Supervisor. Supervisor is just the means to do it. OpAMP server needs to know what Collector it is managing so that it supplies the right configuration for example. And knowing what Collector it is requires receiving an AgentDescription that correctly describes the Collector (e.g. Collector's version number). The Supervisor does not have this knowledge and uses the bootstraping process to get that information from the Collector.

@cforce
Copy link
Author

cforce commented Jul 3, 2024

The supervisor must always be aware of the presence of a collector. However, certain registration details, initially set and persistently maintained like the agent ID, should not change, as the supervisor manages the collector.

Bootstrapping can be done without any prerequisites besides the supervisor. This means the collector is downloaded and installed the first time, and the supervisor has information about its capabilities (processors, extensions, receivers, exporters, etc.) through descriptive metadata (e.g., ocb build.yaml). Thus, the supervisor doesn't need to execute the collector to understand its characteristics.

Alternatively, if the collector is already installed (managed by a third-party update) and the "opamp update feature" is off, descriptive metadata—available without execution but requiring maintenance or a persisted state file—is used. This metadata is established after the initial setup, similar to the agent ID. Somebody could even create the file and therefore even skip this creation by running the collector at least one time to describe itself. ALso the metdata could be delivered with the exe download by the opamp backend.

The supervisor should cache this metadata for each agent. Permanent execution of collectors is not mandatory; instead, the supervisor initializes essential groundwork and can start the collector when necessary, optionally based on configuration changes (e.g., when cfg!=noop).

This approach also supports future scenarios where one supervisor may manage multiple collectors, like e.g. the new profiling eBPF client donated by elastic

@tigrannajaryan
Copy link
Member

The implementation of the Supervisor currently follows this design.

What you are describing appears to be a different design. If you would like to propose an alternate design please post a complete design document so that it can be considered by Supervisor maintainers. (Please note: I do not know if the alternate design will be considered and whether it will be accepted, it may be worth attending a Collector SIG to gauge the interest first).

@cforce
Copy link
Author

cforce commented Jul 4, 2024

The idea about "design changes" just came up because i have no idea how else to implement "to not run the collector until cfg chnages arrives which is !=noop, do you?

@BinaryFissionGames
Copy link
Contributor

BinaryFissionGames commented Jul 17, 2024

The idea about "design changes" just came up because i have no idea how else to implement "to not run the collector until cfg chnages arrives which is !=noop, do you?

I think the idea is we keep the bootstrapping logic to get the agent description (this is a very quick, less than a second run of the collector on startup of the supervisor), then we would simply not start the long-running collector process if we don't have a config.

Does that make sense?

@cforce
Copy link
Author

cforce commented Jul 31, 2024

According to @evan-bradley
"The Supervisor will only restart the Collector when it receives new configuration from the OpAMP server; changes to files on disk will not restart the Collector."
"#32959 (comment)"
Restart is handled in stateful "bootstrapped" state.

Bootstrapping:

  • The Supervisor will start the Collector when ther is no agentId persistent.
  • The Supervisor will start the Collector when there is a agentId persistent even if ther might be no subscription for a cfg change (similar like for restart but different behaviour)
    receives a !=nop configuration
    -> this won't work without agentid-> bootstrapping mean intial id creation
    -> agentid bootstrapping seems to require to connect to opamp server in realtime and subscribe to cfg changes at least for a second. If this is no successfull the reconnect will fail because of supervisor does no retry to connect to opamp server forever #33408
    -> only collector is currently capable of uuid creation to register at opamp backend through supervisor. Remark: If the supervisor would be able to do that itself, than it would be not needed to start the collector at all (once per lifetime to create this id)
    -> what happens if opamp backend is no reachable during bootrapping? There is a bootstrap runtime dependency on being able to bootstrap through supervisor relayed to opamp backend.
  • The Supervisor will not stop the Collector when it receives a nop configuration -> this mean two processes need to run continuously even if there is no need (nop cfg)

related #32554

@BinaryFissionGames
Copy link
Contributor

BinaryFissionGames commented Jul 31, 2024

Bootstrapping does not require any connection to an outside OpAMP server. It connects to an OpAMP server that is internal to the supervisor, the communication during bootstrapping is only between the collector and the supervisor.

Bootstrapping also is not to generate an agent ID (the supervisor actually generates the UUID), but rather the AgentDescription message, which contains metadata about the agent (e.g. the "name" of the agent, the version of the agent) that the supervisor doesn't necessarily know without somehow executing the collector.

Bootstrapping is only concerned with getting this AgentDescription message, so once the message is received, the supervisor can (and currently does) stop the collector.

Edit to add:
Bootstrapping like this is useful because it allows the collector, which will be easily updatable through remote updates, to control the AgentDescription message. That means if there's a useful piece of metadata added to the AgentDescription later, it won't require having to re-install a new supervisor everywhere, but just to push a remote update to the collector.

@cforce
Copy link
Author

cforce commented Jul 31, 2024

Tx for clarification

@cforce
Copy link
Author

cforce commented Sep 11, 2024

Is there already a decision on what criterai it is decided if the collector kept running or terminated until another cfg is sent with is !=noop or not empty

Alternative A:
"empty configmap" is sent (default startup state as well )

Alternative B:

condition of below rules is true

receivers:
  nop:

AND

exporters:
  nop:  

AND

extension: []

--

  • I am nor sure if running a collector with only extensions is a valid use case
  • I am not sure if processors without receiver (AND even exporter) is a valid use case

@BinaryFissionGames
Copy link
Contributor

I personally like the empty config map solution. To me it seems natural to expect that having no config to run implies not running anything.

@atoulme atoulme removed the needs triage New item requiring triage label Oct 2, 2024
evan-bradley pushed a commit that referenced this issue Oct 2, 2024
…ded (#35430)

**Description:** <Describe what has changed.>
If an empty config map is received, the supervisor does not run the
agent.

~The current logic here works fine, but I'm considering adding an option
to only do this if the user opts into it. I'm not sure if there's a
reason why a user might want to run the collector with the noop config
though (maybe for the agent's self-telemetry?)~

I've thought about it some more, and I don't think we need a config
option here. If users want the collector to use a noop config, they can
send a basic noop config.

I think we should also implement #32598 (closed as stale, we'll want to
re-open this or open a new issue for it), which would allow users to
configure a backup config to use when no config is provided by the
server, if they would like.

**Link to tracking Issue:** Closes #33680

**Testing:**
e2e test added
Manually tested with a modified OpAMP server to send an empty config map

**Documentation:**
Update spec where it seemed applicable to call out this behavior.
jriguera pushed a commit to springernature/opentelemetry-collector-contrib that referenced this issue Oct 4, 2024
…ded (open-telemetry#35430)

**Description:** <Describe what has changed.>
If an empty config map is received, the supervisor does not run the
agent.

~The current logic here works fine, but I'm considering adding an option
to only do this if the user opts into it. I'm not sure if there's a
reason why a user might want to run the collector with the noop config
though (maybe for the agent's self-telemetry?)~

I've thought about it some more, and I don't think we need a config
option here. If users want the collector to use a noop config, they can
send a basic noop config.

I think we should also implement open-telemetry#32598 (closed as stale, we'll want to
re-open this or open a new issue for it), which would allow users to
configure a backup config to use when no config is provided by the
server, if they would like.

**Link to tracking Issue:** Closes open-telemetry#33680

**Testing:**
e2e test added
Manually tested with a modified OpAMP server to send an empty config map

**Documentation:**
Update spec where it seemed applicable to call out this behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants