Define sdk extension component support in file configuration #3802

jack-berg · 2023-12-22T17:36:23Z

Part of incorporating OTEP #225 into the specification.

Followup to #3744.

This defines how file configuration works with custom SDK extension components (Samplers, Exporters, etc).

It defines the concept of a Component Provider:

Component providers are registered with the type of extension component they provide and a name. Component providers are registered automatically or manually based on what is idiomatic in the language.
Component providers have a Create Plugin method, which passes configuration properties as a parameter and returns the configured component
When Create is called to interpret a file configuration model, and it comes across a reference to a extension component which is not built-in, it invokes Create Plugin on the corresponding component provider. If no corresponding component provider exists, or if Create Plugin returns an Error, Create returns an error.

Prototype implementation in java here: open-telemetry/opentelemetry-java-examples#227

cc @open-telemetry/configuration-maintainers

github-actions · 2023-12-30T03:16:37Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

specification/configuration/file-configuration.md

jsuereth

One NIT, but it's mostly about possible misunderstanding of the spec and is non-blocking.

specification/configuration/file-configuration.md

github-actions · 2024-01-18T03:16:55Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

marcalff

If I understand correctly, in case of SDK extensions:

the component provider accepts a file configuration model (a.k.a a yaml parsed tree node for the extension point) as input, in the Create() operation

There is no way to parse yaml into a memory configuration model, or make up completely a memory configuration model programmatically, and then have the Create() operation take this memory model as input.

I am aware that supporting a memory configuration model for SDK extensions has some extra difficulties, like:

have a generic representation of an "exporter extension" in the memory configuration model (which is built-in), that contains a representation of the yaml fragment for a given (generic, not built-in) exporter
have the exporter extension Create() operation take this generic representation as input

Still, separating parse from create, as in:

Parse takes a file configuration as input, returns a memory configuration as output
Create takes a memory configuration as input, instantiates the proper objects in the SDK extension

is better, because it allows to:

process and adjust the memory configuration model, if needed, between parse and create.
build a memory configuration model programmatically, instead of parsing yaml.

A possibility is to also expand the SDK extension point to provide a CreateModel() operation, where the model returned is a subclass (per language idiomatic constructs) of an abstract extension point (for exporters, samplers, etc)

marcalff · 2024-01-18T13:29:35Z

Also, please do not let this stall.

jack-berg · 2024-01-18T15:24:58Z

the component provider accepts a file configuration model (a.k.a a yaml parsed tree node for the extension point) as input, in the Create() operation

The component provider accepts a properties as a parameter for the Create Plugin operation.

There is no way to parse yaml into a memory configuration model, or make up completely a memory configuration model programmatically, and then have the Create() operation take this memory model as input.

I don't understand. There is a way to do this. The Create operation accepts a configuration parameter, which is an in memory representation of the configuration model which is either parsed from yaml or created programmatically by some other means. If in the course of interpreting the configuration parameter, the Create operation comes across a reference to an extension component, it is expected to translate configuration for that component into a generic properties representation, and call Create Plugin of the respective component provider.

have a generic representation of an "exporter extension" in the memory configuration model (which is built-in), that contains a representation of the yaml fragment for a given (generic, not built-in) exporter

It will be up to each language to determine how to references to non-built-in types in the in memory representation fo the model. For example in java, this will likely be a Map<String, Object>, which we can translate to a properties representation, which provides type safe accessors, before calling Create Plugin.

marcalff · 2024-01-18T21:26:43Z

There is no way to parse yaml into a memory configuration model, or make up completely a memory configuration model programmatically, and then have the Create() operation take this memory model as input.

I don't understand. There is a way to do this. The Create operation accepts a configuration parameter, which is an in memory representation of the configuration model which is either parsed from yaml or created programmatically by some other means. If in the course of interpreting the configuration parameter, the Create operation comes across a reference to an extension component, it is expected to translate configuration for that component into a generic properties representation, and call Create Plugin of the respective component provider.

Ok, so it seem we both agree on how it works, based of these clarifications, thanks for the details.

My comment is on the wording then, as some parts are unclear, at least to me.

In the PR, in section SDK Extension Components, the description gives some yaml with an extension point, and then jumps directly to talk about what Create() does.

I would like some details about what Parse() does first, when encountering an SDK extension, such as creating a Properties object in the in memory configuration model, that represents the extension configuration bits.

Then, Create() takes this Properties object and invokes the plugin as described.

Likewise, in the PR text: "When Create is called to interpret a file configuration model", this is a shortcut.

Parse() is called to interpret a file configuration model.
What Create() is invoked with is an in memory configuration model.

I think this will add clarity, unless I totally missed the picture.

Looking forward to implement this in C++.

ocelotl · 2024-01-18T23:36:53Z

I think this PR tries to give special treatment to plugins when we could handle them in the same way that we handle the rest of the configuration components.

AFAIK what file configuration does is:

Load a schema.
Load a configuration file.
Check that the configuration file matches the schema.
Create the objects using the information from the configuration.

What I think is being proposed here is:

Load a schema.
Load a configuration file.
Check that the configuration file matches the schema.
Load plugins which come with their own sub-schema.
Load the configurations from the plugin objects.
Create the non-plugin objects using the information from the configuration.
When a plugin object is created, the plugin object sub-schema is checked and an error is raised if there is a mismatch between the sub-schema and the plugin object configuration.

We could do something different:

Load a schema.
Load the plugin object subschemas.
Add the plugin object subschemas to the right point in the schema to make a new, bigger schema.
Load the configuraiton.
Load the plugin object configurations.
Add the plugin object configurations to the right point in the configuration to make a new, bigger configuration.
Check that the new, bigger configuration matches the new, bigger schema.
Create the objects using the information from the new, bigger configuration.

The value I see in this is that the approach is general for everything and not particular for plugins and we do the configuration-against-schema checking only once and not multiple times. Also, having create plugin methods would be troublesome for the current Python implementation because there are no particular places where create functions are called (the Python implementation doesn't have any specific "create" functions), but everything is treated in the same way when the configuration tree is being traversed recursively.

specification/configuration/file-configuration.md

jack-berg · 2024-01-19T02:38:35Z

Load a schema.

Load a configuration file.

Check that the configuration file matches the schema.

Create the objects using the information from the configuration.

In practice, at least in the java implementation I've been prototyping, there is no loading of a schema. Instead, types are generated statically corresponding to the schema, and the configuration file is parsed to those statically generated types using off the shelf tooling.

This detail is important because you suggest having each plugin provide its own schema. Presumably they would provide their schemas using JSON schema for symmetry with the rest of the schema. This would necessitate interpreting JSON schemas at runtime. I think this is a bad idea because in my experience, the implementations of JSON schema are all over the map in terms of quality and the JSON schema draft versions they support. I would not want to rely on the java implementations at runtime.

If you buy that, then we don't have a reliable way for components to describe their schema which the parse operation can consume and conform to. This is what lead me down the path of instead having component providers be responsible for enforcing their schema during the "Create Plugin" operation.

I think there's not a lot of difference from the users perspective:

If a component provider describes its own schema, and parse is responsible for enforcing it, then a configuration that fails to conform to the schema triggers an error in parse. But some parts of a schema won't be able to be enforced until create is called anyway - imagine a component provider with an endpoint argument with a string type. Parse would be able to reject configurations which set the endpoint to a number, but you probably can't tell that a string endpoint is invalid until create plugin is called and tries to parse the string as a URL.
If a component provider enforces its schema only during create plugin, then certain simple schema violations make it past the parse operation, but trigger failures during create.

Most users will combine parse and create in a single operation, so they'll see an error in either case.

An additional argument is that requiring components to describe their schema in some syntax like JSON schema increases the threshold for writing one of these things. Its comparatively simpler to have a contract in which a component provider implementation is just expected to read config out of a generic properties container, and either succeed in returning a component or error.

ocelotl · 2024-01-19T22:35:11Z

This detail is important because you suggest having each plugin provide its own schema.

Yes, I suggest so but this PR also suggests that, right?

The set of properties a component provider accepts, along with their requirement level and expected type, comprise a configuration schema.

Presumably they would provide their schemas using JSON schema for symmetry with the rest of the schema. This would necessitate interpreting JSON schemas at runtime. I think this is a bad idea because in my experience, the implementations of JSON schema are all over the map in terms of quality and the JSON schema draft versions they support. I would not want to rely on the java implementations at runtime.

Hmm, I'm a bit confused here. So, it is bad to interpret JSON schemas for the plugins but it is ok to interpret JSON schemas for the normal components?

If you buy that, then we don't have a reliable way for components to describe their schema which the parse operation can consume and conform to.

Hmm, this seems to me like a critical problem that we need to solve first. So, when implementing this in Python I used a third-party library to do the validation of the configuration file against the JSON schema. I don't know about the situation in Java, but in Python I am just trusting that this component does a proper validation. Maybe I should not trust it? If we don't have a reliable way of validating the schema, then what? Should we write our own JSON schema validator? (I really hope not)

Presumably they would provide their schemas using JSON schema for symmetry with the rest of the schema.
If a component provider describes its own schema...

Do these 2 sentences above mean that it is possible for a plugin schema to be defined in a way that is not a JSON schema? So we can have a different type of schema for every plugin?

But some parts of a schema won't be able to be enforced until create is called anyway - imagine a component provider with an endpoint argument with a string type. Parse would be able to reject configurations which set the endpoint to a number, but you probably can't tell that a string endpoint is invalid until create plugin is called and tries to parse the string as a URL.

I am confused here. It seems like once a configuration file is checked against the JSOM schema (and passes the checking) there can be additional checks (like checking a string is a proper URL). I was under the impression that the only checking that the configuration component was supposed to do was the checking against the JSON schema. Maybe there is a motivation to catch errors as soon as possible, but now I don't see a clearly defined set of responsibilities for the configuration component. SDK components do checking of input values too so who's supposed to do the checking now?

Now, for the Python implementation, I'm not sure how mandatory is to have Create Plugin or Component Provider. This doesn't fit our implementation where we don't have a function for the creation of any specific component but for every leaf in the schema tree. Can the way the component for plugins is created be left to the way that fits each language best?

jack-berg · 2024-01-19T23:12:07Z

If we don't have a reliable way of validating the schema, then what? Should we write our own JSON schema validator? (I really hope not)

The java implementation I've been working on generates static types from the JSON schema. There is no JSON schema artifact present at runtime. At runtime, we use a generic YAML parsing library to parse the incoming YAML to the statically generated types. If the YAML does not conform to the structure of the statically generated types, the binding to the static types will fail.

Do these 2 sentences above mean that it is possible for a plugin schema to be defined in a way that is not a JSON schema?

I'm imagining that a plugin expresses its schema by way of examples, and possibly in docs and internally using JSON schema, but it doesn't need to provide its schema to the implementation of file configuration at all as a part of being a Component Provider implementation. The component provider reads configuration from the properties file. If all the configuration it needs is present and passes any internal validation performed by the component provider passes, then you can say that the properties argument conformed to the schema. If a piece of required configuration is missing, or some internal validation fails, then the properties argument did not conform to its schema.

I suppose I would be open to adding an optional component provider operation where a particular implementation (like python) could require components providers to describe their schema in JSON schema format, but I think its important that this is not required across all implementations: OpenTelemetry implementations are low level libraries and should generally be extremely discerning about taking dependencies on external libraries. Its entirely reasonable that for a given language, there is not a ubiquitous JSON schema implementation available which supports the 2020-12 draft version we've been using. I think its critical that we don't force these languages to build json schema parsers.

It seems like once a configuration file is checked against the JSOM schema (and passes the checking) there can be additional checks (like checking a string is a proper URL). I was under the impression that the only checking that the configuration component was supposed to do was the checking against the JSON schema. Maybe there is a motivation to catch errors as soon as possible, but now I don't see a clearly defined set of responsibilities for the configuration component. SDK components do checking of input values too so who's supposed to do the checking now?

I was referring to the checks being done by the SDK components themselves. I don't think that the create operation needs to perform additional validation beyond those done by SDK components. My point was just that just because a piece of YAML conforms to the schema doesn't mean that we'll be able to generate configured SDK components without error.

dashpole

non-blocking comments

specification/configuration/file-configuration.md

…y-specification into config-extension-components

jack-berg · 2024-01-25T16:39:02Z

Now, for the Python implementation, I'm not sure how mandatory is to have Create Plugin or Component Provider. This doesn't fit our implementation where we don't have a function for the creation of any specific component but for every leaf in the schema tree. Can the way the component for plugins is created be left to the way that fits each language best?

@ocelotl What did you have in mind? This concept will definitely vary in implementation by language and I want to be sure that the language is specific enough to make it the intent clear (i.e. I don't want implementers to come away thinking this is impossible because there's no detail), but open enough to give implementers flexibility.

ocelotl · 2024-01-25T17:45:45Z

Apparently I did something wrong while adding a comment yesterday so here I go again.

The Python prototype now supports SDK extension components.

The java implementation I've been working on generates static types from the JSON schema. There is no JSON schema artifact present at runtime. At runtime, we use a generic YAML parsing library to parse the incoming YAML to the statically generated types. If the YAML does not conform to the structure of the statically generated types, the binding to the static types will fail.

The Python prototype uses a JSON schema checker to check the configuration file. If it does not match, it will raise an exception.

I'm imagining that a plugin expresses its schema by way of examples, and possibly in docs and internally using JSON schema, but it doesn't need to provide its schema to the implementation of file configuration at all as a part of being a Component Provider implementation.

the Python prototype doees merge the subschema of the plugin into the larger OTel schema.

The component provider reads configuration from the properties file. If all the configuration it needs is present and passes any internal validation performed by the component provider passes, then you can say that the properties argument conformed to the schema.

I suppose I would be open to adding an optional component provider operation where a particular implementation (like python) could require components providers to describe their schema in JSON schema format...

The Python prototye does not have a properties file. Each plugin specifies its schema with a dictionary that is equivalent to a JSON schema. This dictionary is then merged into the larger OTel schema and the resulting bigger schema is used to validate the configuration file. Here is an example of how this subschema is defined. Each plugin defines also a path that defines where in the larger OTel schema this subschema is to be added.

I was referring to the checks being done by the SDK components themselves. I don't think that the create operation needs to perform additional validation beyond those done by SDK components. My point was just that just because a piece of YAML conforms to the schema doesn't mean that we'll be able to generate configured SDK components without error.

I agree, I intentionally am not adding any other check to the file configuration component that is not JSON schema checking.

ocelotl · 2024-01-25T17:50:46Z

@ocelotl What did you have in mind? This concept will definitely vary in implementation by language and I want to be sure that the language is specific enough to make it the intent clear (i.e. I don't want implementers to come away thinking this is impossible because there's no detail), but open enough to give implementers flexibility.

The Python prototype is using standard entry points which is the normal Python mechanism to dynamically add some external component to a Python project. In this example I created a plugin that adds a sampler that only samples on Mondays but only sometimes. This sampler is an example of a custom component any user may define and add for it to work with OpenTelemetry. We don't have any "Create Plugin" or "Component Provider" per se, but we do have the plugin class SometimesMondaysOnSamplerPlugin which "adds" or "provides" this plugin. Does this fit what this PR is trying to specify?

jack-berg · 2024-01-25T23:47:46Z

@ocelotl it looks like the implementation is similar. The schema_path in your implementation is akin to the type in this PR. The name in this PR is provided by the schema your implementation (i.e. the sometimes_mondays_on portion ensures that this plugin is only used when its selected).

Wondering what kind of changes you envision for this PR such that you feel comfortable with your implementation? My point of view is that as long as the net affect is the same, I don't think it matters much that there is uniformity among languages with this API. Maybe I can adjust the normative language to only include very general "MUSTs", but leave the specifics with "MAY" such that implementations can vary as needed.

marcalff

LGTM. Thanks for the clarifications on Parse().

ocelotl · 2024-01-26T19:55:04Z

@ocelotl it looks like the implementation is similar. The schema_path in your implementation is akin to the type in this PR. The name in this PR is provided by the schema your implementation (i.e. the sometimes_mondays_on portion ensures that this plugin is only used when its selected).

Wondering what kind of changes you envision for this PR such that you feel comfortable with your implementation? My point of view is that as long as the net affect is the same, I don't think it matters much that there is uniformity among languages with this API. Maybe I can adjust the normative language to only include very general "MUSTs", but leave the specifics with "MAY" such that implementations can vary as needed.

Nice ✌️ I am now under the impression that your intention for things as component providers and such is to define a mechanism that adds components dynamically. I am ok with this PR right now as it is, I would only recommend to add a clarification that says the exact nature of component providers and the create plugin interface depends on the specific mechanisms every language has for this particular purpose. Thank you for the clarification 👍

jack-berg · 2024-01-26T21:02:57Z

Will merge Monday 1/29/24 if no further comments.

Define sdk extension component support in file configuration

de868b1

jack-berg requested review from a team December 22, 2023 17:36

github-actions bot assigned bogdandrutu Dec 22, 2023

github-actions bot added the Stale label Dec 30, 2023

pirgeo reviewed Jan 2, 2024

View reviewed changes

specification/configuration/file-configuration.md Outdated Show resolved Hide resolved

carlosalberto removed the Stale label Jan 2, 2024

Fix typo

7303dea

jmacd approved these changes Jan 3, 2024

View reviewed changes

carlosalberto approved these changes Jan 3, 2024

View reviewed changes

jmacd mentioned this pull request Jan 4, 2024

A Sampling Configuration proposal open-telemetry/oteps#240

Open

jack-berg mentioned this pull request Jan 5, 2024

Add spec compliance matrix entries for file configuration #3804

Merged

jsuereth approved these changes Jan 10, 2024

View reviewed changes

specification/configuration/file-configuration.md Outdated Show resolved Hide resolved

SDK -> file configuration implementation

78e0ba6

marcalff mentioned this pull request Jan 15, 2024

[CONFIG] Implement file configuration open-telemetry/opentelemetry-cpp#2481

Open

49 tasks

github-actions bot added the Stale label Jan 18, 2024

marcalff reviewed Jan 18, 2024

View reviewed changes

jack-berg removed the Stale label Jan 18, 2024

jack-berg requested review from codeboten and MrAlias January 18, 2024 15:31

ocelotl mentioned this pull request Jan 18, 2024

Add configuration prototype open-telemetry/opentelemetry-python#3632

Draft

ocelotl suggested changes Jan 18, 2024

View reviewed changes

specification/configuration/file-configuration.md Show resolved Hide resolved

ocelotl mentioned this pull request Jan 18, 2024

Add support for SDK extension configuration open-telemetry/opentelemetry-python#3637

Open

Clarify parse expectations for custom components

c4efc4e

dashpole approved these changes Jan 23, 2024

View reviewed changes

specification/configuration/file-configuration.md Outdated Show resolved Hide resolved

specification/configuration/file-configuration.md Show resolved Hide resolved

jack-berg added 2 commits January 25, 2024 10:36

Fix typo

a31b3e8

Merge branch 'main' of https://github.com/open-telemetry/opentelemetr…

1e4f201

…y-specification into config-extension-components

marcalff reviewed Jan 26, 2024

View reviewed changes

marcalff approved these changes Jan 26, 2024

View reviewed changes

ocelotl approved these changes Jan 26, 2024

View reviewed changes

jack-berg added 3 commits January 26, 2024 14:56

Note expectation of differences between languages

f0b88d8

Add changelog entry

b383258

Add entries to spec compliance matrix

8c0b521

Merge branch 'main' into config-extension-components

9e4bfc2

jack-berg merged commit a6ca2fd into open-telemetry:main Jan 29, 2024
7 checks passed

jack-berg mentioned this pull request Mar 26, 2024

Should file configuration merge environment variable configuration? #3752

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define sdk extension component support in file configuration #3802

Define sdk extension component support in file configuration #3802

jack-berg commented Dec 22, 2023 •

edited

Loading

github-actions bot commented Dec 30, 2023

jsuereth left a comment

github-actions bot commented Jan 18, 2024

marcalff left a comment

marcalff commented Jan 18, 2024

jack-berg commented Jan 18, 2024

marcalff commented Jan 18, 2024

ocelotl commented Jan 18, 2024 •

edited

Loading

jack-berg commented Jan 19, 2024 •

edited

Loading

ocelotl commented Jan 19, 2024

jack-berg commented Jan 19, 2024

dashpole left a comment

jack-berg commented Jan 25, 2024 •

edited

Loading

ocelotl commented Jan 25, 2024

ocelotl commented Jan 25, 2024

jack-berg commented Jan 25, 2024

marcalff left a comment

ocelotl commented Jan 26, 2024

jack-berg commented Jan 26, 2024

Define sdk extension component support in file configuration #3802

Define sdk extension component support in file configuration #3802

Conversation

jack-berg commented Dec 22, 2023 • edited Loading

github-actions bot commented Dec 30, 2023

jsuereth left a comment

Choose a reason for hiding this comment

github-actions bot commented Jan 18, 2024

marcalff left a comment

Choose a reason for hiding this comment

marcalff commented Jan 18, 2024

jack-berg commented Jan 18, 2024

marcalff commented Jan 18, 2024

ocelotl commented Jan 18, 2024 • edited Loading

jack-berg commented Jan 19, 2024 • edited Loading

ocelotl commented Jan 19, 2024

jack-berg commented Jan 19, 2024

dashpole left a comment

Choose a reason for hiding this comment

jack-berg commented Jan 25, 2024 • edited Loading

ocelotl commented Jan 25, 2024

ocelotl commented Jan 25, 2024

jack-berg commented Jan 25, 2024

marcalff left a comment

Choose a reason for hiding this comment

ocelotl commented Jan 26, 2024

jack-berg commented Jan 26, 2024

jack-berg commented Dec 22, 2023 •

edited

Loading

ocelotl commented Jan 18, 2024 •

edited

Loading

jack-berg commented Jan 19, 2024 •

edited

Loading

jack-berg commented Jan 25, 2024 •

edited

Loading