-
-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add schema validation and configuration for conda-forge config yaml #1756
Add schema validation and configuration for conda-forge config yaml #1756
Conversation
Hi @jaimergp, this will be on draft this week as I fix some issues with the |
conda_smithy/schema/models.py
Outdated
|
||
|
||
class CondaForgeDocker(BaseModel): | ||
model_config = ConfigDict(extra="allow") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to allow extra keys? Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conda-forge docker is a dict where the user/consumer passes keys to the docker build arg, which should follow the docker schema. As this would be an exact copy of what already exists, I was not sure if we wanted to import the Pydantic schema from docker and use it instead, so I kept it as general as possible to check with you which of those fields we expect to be customizable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed and moved all fields to optional values, in case one of the supported config values is passed validation will occur, in case of extra variables in the config, right now the behavior is to ignore it (will not show up when dumping to dict/json)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Vini! As you can see I have added a few comments, but most of them are repeated and then I stopped the spam. It all boils down to these two items:
- Default values that are mutable (e.g. lists or dicts) should use
default_factory
to return a fresh instance anytime. Otherwise all instances of the model share the same instance (defined at class definition time when the module was imported), which is then a nightmare. I don't think we would have concurrent use of more than one instance in the code, but you never know and it's good practice anyway. - You are mixing usage of
Field
both as a value andAnnotated
. I am not familiar withAnnotated
; I think it might have performance benefits, but I do find it trickier to read. I prefer it as the value of the attribute, instead of in-annotation. If there are strong reasons for in-annotation, I am willing to listen, but what we can't do is to mix two styles because it's confusing :D
Let's fix these items and then we can keep iterating.
Ah, and:
|
Co-authored-by: ytausch <ytausch@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is ready to go!
Thanks everyone involved, this has been one of the good ones :P
I'll merge later today or tomorrow in case there are any last minute comments.
Docs PR ready for review at conda-forge/conda-forge.github.io#2095. |
This was neat to see: conda-forge/coverage-feedstock#112 (comment) Very descriptive 🤩 |
Yes, have started getting those, looking great. Some notes:
Now all we gotta do is schematize the other user-edited |
Thanks @bollwyvl, I'll take your comment and open a new issue for tracking! Thanks! |
pydantic/pydantic#8237 has been merged and will probably be released as part of |
Checklist
news
entryIn this pull request, we propose a comprehensive refactor of the
configure_feedstock::_load_forge_config
function and its related dependencies. The primary objective is to replace the current reliance on a hardcoded dictionary with a more robust approach using Pydantic Models for the forge YAML configuration.Key Changes:
I've included a new schema file in the codebase, housing Pydantic models that represent various aspects of the ConfigModel, which is the core schema for the forge YAML. These models also include schemas for nested fields like bot, azure, and more.
Created a separate function called
_read_forge_config
responsible for loading the YAML file itself and initializing the models with the respective configurations as well as handling the deep_merge dictionary section.Moved legacy checks to their dedicated function,
_legacy_compatibility_checks
.Ive added test cases to ensure the correctness and reliability of the new schema and code changes.
Edited
Given the nature of having
pydantic
as a runtime dependency, we adopted a mixed mode ofjsonschema
and the model to generate both a valid json schema mirroring the model, and versioned,conda-forge.v2.json
and a default value that is dynamically generated by the model itself,conda-forge.v2.yml
. This is used for type validation inside the new_read_forge_config
function and to help future users who would like to access the schema by an API (which we can now do, by hosting the new json schema file)Why this now?
The rationale behind this refactor is to address issues that have surfaced over the years with the YAML content in forge configurations across numerous hosted feedstocks. These issues have resulted in minor exceptions and inconsistencies. Some problems were mitigated through hardcoded conditionals, while others slipped through, causing disruptions in the ecosystem.
The primary objectives of this refactor are as follows:
By employing a Pydantic schema, we aim to significantly reduce unforeseen configuration errors from syntax errors or incorrect attributes in the forge YAML.
This refactor will make it easier for future maintainers to introduce new settings, such as improving the list of available platforms or other heavily coupled configurations across the build. These changes can be seamlessly integrated into the configuration fields as validators, simplifying the management of previously scattered checks throughout the codebase. As well as deprecation warnings and migration checks can also be performed by the validators.
In summary, this pull request represents a substantial step towards enhancing the reliability and maintainability of the configuration handling in conda-smithy.
What is not included in this PR?
*.json
and.yml
files). In a subsequent PR, we can re-arrange some internal validations to be part of the generated schema, reducing some hard-coded logic from within the code.