-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Pydantic validation function and manual validation documentation #87
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, there are some things that should be changed before merging it. Great that you spotted the bug in the folder loader method!
pyproject.toml
Outdated
@@ -55,6 +55,7 @@ transformers = { version = "^4.35", optional = true } | |||
typer = { version = ">=0.9, <1.0", extras = ["all"], optional = true } | |||
uvicorn = { version = ">=0.24, <1.0", optional = true, extras = ["standard"] } | |||
xmltodict = ">=0.13, <1.0" | |||
pydantic = { version = ">=2.0.0", optional = true } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since cbrkit.loaders
contains the statement from pydantic import BaseModel
, it makes sense to add it as a required dependency for now.
pydantic = { version = ">=2.0.0", optional = true } | |
pydantic = "^2.0" |
cbrkit/loaders.py
Outdated
cb[file.name] = loader(file) | ||
|
||
if len(cb) == 0: | ||
return None | ||
|
||
return cb | ||
|
||
|
||
def validate(data: dict[str, Any] | object, validation_model: BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def validate(data: dict[str, Any] | object, validation_model: BaseModel): | |
def validate(data: Casebase[Any, Any] | Any, validation_model: BaseModel): |
cbrkit/loaders.py
Outdated
""" | ||
if data is None: | ||
raise ValueError("Data is None") | ||
if isinstance(data, DataFrameCasebase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if isinstance(data, DataFrameCasebase): | |
elif isinstance(data, DataFrameCasebase): |
if data is None: | ||
raise ValueError("Data is None") | ||
if isinstance(data, DataFrameCasebase): | ||
data = data.df.to_dict("index") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is extremely slow. Is there a way to just iterate over all Series entries instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible to do it in the following manner:
for item in df.iterrows(): validation_model(**item[1])
However, this is much slower at 50ms vs 6ms using df.to_dict() for the cars example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then just leave it as it is for the time being. In case someone reports performance issues we can revisit it in the future.
cbrkit/loaders.py
Outdated
raise ValueError("Data is None") | ||
if isinstance(data, DataFrameCasebase): | ||
data = data.df.to_dict("index") | ||
if isinstance(data, dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if isinstance(data, dict): | |
if isinstance(data, Mapping): |
cbrkit/loaders.py
Outdated
To manually use Pydantic with CBRkit to validate your case base, you can use an appropriate | ||
Pydantic model instead of the CBRkit loaders (see example below). | ||
Alternatively, the dataframe, path, file and folder accept an optional validation_model argument | ||
to validate the Casebase entries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description should be updated for the new cbrkit.loaders.validate
function.
cbrkit/loaders.py
Outdated
Example: | ||
>>> from pydantic import BaseModel, PositiveInt, NonNegativeInt | ||
>>> from data.cars_validation_model import Car | ||
>>> data = csv("data/cars-1k.csv") | ||
>>> for row in data.values(): | ||
... assert isinstance(Car.model_validate(row), Car) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we have the validation function, maybe we should just use refer to its docstring instead?
cbrkit/loaders.py
Outdated
|
||
>>> from pydantic import BaseModel, PositiveInt, NonNegativeInt | ||
>>> from pathlib import Path | ||
>>> file_path = Path("./data/cars-1k.csv") | ||
>>> result = file(file_path) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this snippet achieve? The model does not seem to be used here.
cbrkit/loaders.py
Outdated
>>> from data.cars_validation_model import Car | ||
>>> folder_path = Path("./data") | ||
>>> result = folder(folder_path, ".csv") | ||
>>> result = folder(folder_path, "*.csv") | ||
>>> assert result is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, what is the goal here?
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thank you!
I also fixed a small bug in folder loader