Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Config validation #3613

Closed
astrojuanlu opened this issue Feb 9, 2024 · 6 comments
Closed

Config validation #3613

astrojuanlu opened this issue Feb 9, 2024 · 6 comments
Labels
Issue: Feature Request New feature or improvement to existing feature

Comments

@astrojuanlu
Copy link
Member

Description

Today I spoke to a user that had a very long and treacherous parameters.yml like this:

sensors:
  sensor1:
    name: "Sensor 1"
    type: "temperature"
    stderr: 0.1
  sensor2:
    name: "Sensor 2"
    type: "humidity"
    stderr: 0.1
  sensor3:
    name: "Sensor 3"
    type: "temperature"
    stderr: 0.1
  sensor4:
    name: "Sensor 4"
    type: "temperature"
    stderr: 0.2

And so forth. So, there are several problems:

  1. There's lots of repetition. The user mentioned that it would be ideal to be able to "inherit" in YAML, something like:
_sensor:
  type: "<unknown>"
  stderr: 0.1

sensors:
  sensor1: ${_sensor}  # All defaults are taken
  sensor2: ${_sensor}
    stderr: 0.2  # Try to override default, but 💥 syntax error
  1. It's unclear how to validate this YAML. We did a quick proof of concept combining OmegaConf and Pydantic v2:
from omegaconf import OmegaConf
from pydantic import BaseModel

class Sensor(BaseModel):
    name: str
    sensor_type: str = "<unknown>"
    stderr: t.Optional[float] = 0.1

class Config(BaseModel):
    sensors: t.Dict[str, Sensor]

config = OmegaConf.load("conf/base/parameters.yml")
c = Config.validate(config)
print(c.sensors["sensor3"].stderr)

Which was cool! Because the defaults were filled from the Sensor model.

However, (2a) it's not clear how to keep the defaults in the YAML, which was desirable (although there's maybe a way to achieve that in Pydantic), (2b) it's not clear if this should be in parameters.yml or rather a custom sensors.yml, and most importantly, (2c) it's not clear how or where to perform such validation. There's no after_config_loaded hook.

I think the closest might be what kedro-mlflow does using after_context_created https://github.com/Galileo-Galilei/kedro-mlflow/blob/e88679938b1d4c7633c3f631f6b402ff11ab61fe/kedro_mlflow/framework/hooks/mlflow_hook.py#L78-L79 but then it's trying to inject the config in the KedroContext https://github.com/Galileo-Galilei/kedro-mlflow/blob/e88679938b1d4c7633c3f631f6b402ff11ab61fe/kedro_mlflow/framework/hooks/mlflow_hook.py#L129-L134, with all the problems discussed in #3214.

How can we better support this use case?

Paging @datajoely, @Galileo-Galilei

@astrojuanlu astrojuanlu added the Issue: Feature Request New feature or improvement to existing feature label Feb 9, 2024
@astrojuanlu
Copy link
Member Author

Tangentially related: #2481

@datajoely
Copy link
Contributor

So we actually have a solution for config inheritance and mixins in some of our internal tooling (hopefully open sourcing in 2024 🤞) for assembling Kedro projects.

It's proven a useful feature for achieving DRY concepts, powered by OmegaConf and Python devs find it easy to grok since it follows MRO to the dot.

For QB contributors I can link to the docs and the specific implementation here.

Please find a screenshot of the docs of the extends keyword below:

image

At rest Run-time resolved
image image

As well as a mixin concept:

image

At rest Run-time resolved
image image

@datajoely
Copy link
Contributor

Regarding validation I would actually apply the Pydantic models to the functions bound in the node function - not couple it to the configuration system.

@Galileo-Galilei
Copy link
Member

Galileo-Galilei commented Feb 9, 2024

This is indeed a question that have been asked a couple of times. Apart from this specific use case (where it may eventually makes sense to have a specific configuration file as you suggest), my usual go to solution is to create a custom resolver, which would look like this:

_sensor:
  type: "<unknown>"
  stderr: 0.1

_sensor2:
  stderr: 0.2

sensors:
  sensor1: ${_sensor}  # All defaults are taken
  sensor2: ${merge_dict: _sensor, _sensor2} # in settings. py : register_new_resolver("merge_dict", lambda d1, d2: {**d1, **d2})

We could probably make the syntax easier with a more complex resolution logic (e.g. find if the key exists in a parent node to "automatically" check for a parent key with the same name (look for _parent_ on this page https://omegaconf.readthedocs.io/en/2.1_branch/custom_resolvers.html), and to validate the results with pydantic.

@noklam
Copy link
Contributor

noklam commented Feb 10, 2024

Apple rolls their own DSL for configuration

https://github.com/apple/pkl

@datajoely
Copy link
Contributor

Apple rolls their own DSL for configuration
https://github.com/apple/pkl

So I find the niche space of "Configuration languages" like Jsonnet, Dhall, Cue and now pkl super interesting. I've posted this before but it's genuinely one of the best articles on how all the lessons learned in the K8s world apply here. I think #891 is the bible on this and ultimately lays out why we ultimately settled on OmegaConf (I was pitching Jsonnet at the time).

I'm more convinced than ever that:

  • We should lean into extending OmegaConf so that configuration at rest is more powerful (e.g. adding inheritance)
  • For validation we need to lean into Pythonic frameworks on the function/node level like Pydantic and Pandera. This has the added benefit of not being an integration, just best practice.
  • (stretch goal) We should explore Viz as a configuration assistant taking the idea of kedro catalog resolve further perhaps with side by side config at rest vs runtime

@kedro-org kedro-org locked and limited conversation to collaborators Apr 8, 2024
@astrojuanlu astrojuanlu converted this issue into discussion #3788 Apr 8, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
Issue: Feature Request New feature or improvement to existing feature
Projects
None yet
Development

No branches or pull requests

4 participants