Config validation #3613

astrojuanlu · 2024-02-09T16:51:21Z

Description

Today I spoke to a user that had a very long and treacherous parameters.yml like this:

sensors:
  sensor1:
    name: "Sensor 1"
    type: "temperature"
    stderr: 0.1
  sensor2:
    name: "Sensor 2"
    type: "humidity"
    stderr: 0.1
  sensor3:
    name: "Sensor 3"
    type: "temperature"
    stderr: 0.1
  sensor4:
    name: "Sensor 4"
    type: "temperature"
    stderr: 0.2

And so forth. So, there are several problems:

There's lots of repetition. The user mentioned that it would be ideal to be able to "inherit" in YAML, something like:

_sensor:
  type: "<unknown>"
  stderr: 0.1

sensors:
  sensor1: ${_sensor}  # All defaults are taken
  sensor2: ${_sensor}
    stderr: 0.2  # Try to override default, but 💥 syntax error

It's unclear how to validate this YAML. We did a quick proof of concept combining OmegaConf and Pydantic v2:

from omegaconf import OmegaConf
from pydantic import BaseModel

class Sensor(BaseModel):
    name: str
    sensor_type: str = "<unknown>"
    stderr: t.Optional[float] = 0.1

class Config(BaseModel):
    sensors: t.Dict[str, Sensor]

config = OmegaConf.load("conf/base/parameters.yml")
c = Config.validate(config)
print(c.sensors["sensor3"].stderr)

Which was cool! Because the defaults were filled from the Sensor model.

However, (2a) it's not clear how to keep the defaults in the YAML, which was desirable (although there's maybe a way to achieve that in Pydantic), (2b) it's not clear if this should be in parameters.yml or rather a custom sensors.yml, and most importantly, (2c) it's not clear how or where to perform such validation. There's no after_config_loaded hook.

I think the closest might be what kedro-mlflow does using after_context_created https://github.com/Galileo-Galilei/kedro-mlflow/blob/e88679938b1d4c7633c3f631f6b402ff11ab61fe/kedro_mlflow/framework/hooks/mlflow_hook.py#L78-L79 but then it's trying to inject the config in the KedroContext https://github.com/Galileo-Galilei/kedro-mlflow/blob/e88679938b1d4c7633c3f631f6b402ff11ab61fe/kedro_mlflow/framework/hooks/mlflow_hook.py#L129-L134, with all the problems discussed in #3214.

How can we better support this use case?

Paging @datajoely, @Galileo-Galilei

The text was updated successfully, but these errors were encountered:

astrojuanlu · 2024-02-09T16:53:36Z

Tangentially related: #2481

datajoely · 2024-02-09T17:23:42Z

So we actually have a solution for config inheritance and mixins in some of our internal tooling (hopefully open sourcing in 2024 🤞) for assembling Kedro projects.

It's proven a useful feature for achieving DRY concepts, powered by OmegaConf and Python devs find it easy to grok since it follows MRO to the dot.

For QB contributors I can link to the docs and the specific implementation here.

Please find a screenshot of the docs of the extends keyword below:

At rest	Run-time resolved

As well as a mixin concept:

At rest	Run-time resolved

datajoely · 2024-02-09T17:30:47Z

Regarding validation I would actually apply the Pydantic models to the functions bound in the node function - not couple it to the configuration system.

Galileo-Galilei · 2024-02-09T21:40:59Z

This is indeed a question that have been asked a couple of times. Apart from this specific use case (where it may eventually makes sense to have a specific configuration file as you suggest), my usual go to solution is to create a custom resolver, which would look like this:

_sensor:
  type: "<unknown>"
  stderr: 0.1

_sensor2:
  stderr: 0.2

sensors:
  sensor1: ${_sensor}  # All defaults are taken
  sensor2: ${merge_dict: _sensor, _sensor2} # in settings. py : register_new_resolver("merge_dict", lambda d1, d2: {**d1, **d2})

We could probably make the syntax easier with a more complex resolution logic (e.g. find if the key exists in a parent node to "automatically" check for a parent key with the same name (look for _parent_ on this page https://omegaconf.readthedocs.io/en/2.1_branch/custom_resolvers.html), and to validate the results with pydantic.

noklam · 2024-02-10T21:39:01Z

Apple rolls their own DSL for configuration

https://github.com/apple/pkl

datajoely · 2024-02-12T10:33:50Z

Apple rolls their own DSL for configuration
https://github.com/apple/pkl

So I find the niche space of "Configuration languages" like Jsonnet, Dhall, Cue and now pkl super interesting. I've posted this before but it's genuinely one of the best articles on how all the lessons learned in the K8s world apply here. I think #891 is the bible on this and ultimately lays out why we ultimately settled on OmegaConf (I was pitching Jsonnet at the time).

I'm more convinced than ever that:

We should lean into extending OmegaConf so that configuration at rest is more powerful (e.g. adding inheritance)
For validation we need to lean into Pythonic frameworks on the function/node level like Pydantic and Pandera. This has the added benefit of not being an integration, just best practice.
(stretch goal) We should explore Viz as a configuration assistant taking the idea of kedro catalog resolve further perhaps with side by side config at rest vs runtime

astrojuanlu added the Issue: Feature Request New feature or improvement to existing feature label Feb 9, 2024

github-actions bot mentioned this issue Mar 1, 2024

Monthly issue metrics report #3671

Open

kedro-org locked and limited conversation to collaborators Apr 8, 2024

astrojuanlu converted this issue into discussion #3788 Apr 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Config validation #3613

Config validation #3613

astrojuanlu commented Feb 9, 2024

astrojuanlu commented Feb 9, 2024

datajoely commented Feb 9, 2024

datajoely commented Feb 9, 2024

Galileo-Galilei commented Feb 9, 2024 •

edited

Loading

noklam commented Feb 10, 2024

datajoely commented Feb 12, 2024

This issue was moved to a discussion.

This issue was moved to a discussion.

Config validation #3613

Config validation #3613

Comments

astrojuanlu commented Feb 9, 2024

Description

astrojuanlu commented Feb 9, 2024

datajoely commented Feb 9, 2024

datajoely commented Feb 9, 2024

Galileo-Galilei commented Feb 9, 2024 • edited Loading

noklam commented Feb 10, 2024

datajoely commented Feb 12, 2024

This issue was moved to a discussion.

Galileo-Galilei commented Feb 9, 2024 •

edited

Loading