Add schema validation and configuration for conda-forge config yaml #1756

viniciusdc · 2023-08-01T13:54:39Z

Checklist

Added a news entry

In this pull request, we propose a comprehensive refactor of the configure_feedstock::_load_forge_config function and its related dependencies. The primary objective is to replace the current reliance on a hardcoded dictionary with a more robust approach using Pydantic Models for the forge YAML configuration.

Key Changes:

I've included a new schema file in the codebase, housing Pydantic models that represent various aspects of the ConfigModel, which is the core schema for the forge YAML. These models also include schemas for nested fields like bot, azure, and more.
Created a separate function called _read_forge_config responsible for loading the YAML file itself and initializing the models with the respective configurations as well as handling the deep_merge dictionary section.
Moved legacy checks to their dedicated function, _legacy_compatibility_checks.
Ive added test cases to ensure the correctness and reliability of the new schema and code changes.
Edited
Given the nature of having pydanticas a runtime dependency, we adopted a mixed mode of jsonschema and the model to generate both a valid json schema mirroring the model, and versioned, conda-forge.v2.json and a default value that is dynamically generated by the model itself, conda-forge.v2.yml. This is used for type validation inside the new _read_forge_config function and to help future users who would like to access the schema by an API (which we can now do, by hosting the new json schema file)

Why this now?

The rationale behind this refactor is to address issues that have surfaced over the years with the YAML content in forge configurations across numerous hosted feedstocks. These issues have resulted in minor exceptions and inconsistencies. Some problems were mitigated through hardcoded conditionals, while others slipped through, causing disruptions in the ecosystem.

The primary objectives of this refactor are as follows:

By employing a Pydantic schema, we aim to significantly reduce unforeseen configuration errors from syntax errors or incorrect attributes in the forge YAML.
This refactor will make it easier for future maintainers to introduce new settings, such as improving the list of available platforms or other heavily coupled configurations across the build. These changes can be seamlessly integrated into the configuration fields as validators, simplifying the management of previously scattered checks throughout the codebase. As well as deprecation warnings and migration checks can also be performed by the validators.

In summary, this pull request represents a substantial step towards enhancing the reliability and maintainability of the configuration handling in conda-smithy.

What is not included in this PR?

To keep the diff minimal in the first iteration, this PR only integrates the new schema (pydantic) and its artifacts (*.json and .yml files). In a subsequent PR, we can re-arrange some internal validations to be part of the generated schema, reducing some hard-coded logic from within the code.

viniciusdc · 2023-08-23T17:02:22Z

Hi @jaimergp, this will be on draft this week as I fix some issues with the Providers enum logic (to accept and validate aliases). In the meantime, if you want to have a look at what is currently available, free free to do so 😄

conda_smithy/schema/models.py

jaimergp · 2023-08-31T08:33:22Z

conda_smithy/schema/models.py

+
+
+class CondaForgeDocker(BaseModel):
+    model_config = ConfigDict(extra="allow")


Do we want to allow extra keys? Why?

Conda-forge docker is a dict where the user/consumer passes keys to the docker build arg, which should follow the docker schema. As this would be an exact copy of what already exists, I was not sure if we wanted to import the Pydantic schema from docker and use it instead, so I kept it as general as possible to check with you which of those fields we expect to be customizable.

removed and moved all fields to optional values, in case one of the supported config values is passed validation will occur, in case of extra variables in the config, right now the behavior is to ignore it (will not show up when dumping to dict/json)

conda_smithy/schema/models.py

conda_smithy/schema/platforms.py

tests/test_condaforge_config_schema.py

jaimergp

Thanks Vini! As you can see I have added a few comments, but most of them are repeated and then I stopped the spam. It all boils down to these two items:

Default values that are mutable (e.g. lists or dicts) should use default_factory to return a fresh instance anytime. Otherwise all instances of the model share the same instance (defined at class definition time when the module was imported), which is then a nightmare. I don't think we would have concurrent use of more than one instance in the code, but you never know and it's good practice anyway.
You are mixing usage of Field both as a value and Annotated. I am not familiar with Annotated; I think it might have performance benefits, but I do find it trickier to read. I prefer it as the value of the attribute, instead of in-annotation. If there are strong reasons for in-annotation, I am willing to listen, but what we can't do is to mix two styles because it's confusing :D

Let's fix these items and then we can keep iterating.

jaimergp · 2023-08-31T08:50:32Z

Ah, and:

we need to merge back the changes added in Add switch to make a swap file #1751 to resolve the conflicts. Please update accordingly 🙏
fix the CI to install pydantic as it's required now

Co-authored-by: ytausch <ytausch@users.noreply.github.com>

conda-forge/conda-smithy#1756 (comment)

jaimergp

I think this is ready to go!

Thanks everyone involved, this has been one of the good ones :P

I'll merge later today or tomorrow in case there are any last minute comments.

jaimergp · 2024-03-12T14:04:06Z

Docs PR ready for review at conda-forge/conda-forge.github.io#2095.
Preview it at https://deploy-preview-2095--conda-forge-previews.netlify.app/docs/maintainer/conda_forge_yml/

conda-forge/conda-smithy#1756 (comment)

jakirkham · 2024-03-15T01:14:59Z

This was neat to see: conda-forge/coverage-feedstock#112 (comment)

Very descriptive 🤩

bollwyvl · 2024-03-15T12:15:26Z

Yes, have started getting those, looking great.

Some notes:

it doesn't tell which file this is all about
- it could include a link to that exact file in the PR branch, such that a maintainer could click to edit
as the output is for humans, it seems like rendering title, description, etc. as Markdown would be valuable.
- however, as it was written (or dumped to) JSON Schema, it's would possible to generate a canonical link to the generated documentation for that schema item, once available
- similarly, the raw repr that gets dumped could be a table
  - or, indeed, the whole thing could be a table
- it may be worth putting these inside details tags

Now all we gotta do is schematize the other user-edited .yml :)

jaimergp · 2024-03-15T13:03:50Z

Thanks @bollwyvl, I'll take your comment and open a new issue for tracking! Thanks!

0xbe7a · 2024-03-16T20:58:40Z

pydantic/pydantic#8237 has been merged and will probably be released as part of pydantic 2.7 and will make the custom deprecation validator obsolete.

conda-forge/conda-smithy#1756 (comment)

jaimergp marked this pull request as draft August 1, 2023 13:54