Added Pydantic validation function and manual validation documentation #87

kilianbartz · 2024-03-28T13:40:40Z

I also fixed a small bug in folder loader

mirkolenz

Thanks for the PR, there are some things that should be changed before merging it. Great that you spotted the bug in the folder loader method!

mirkolenz · 2024-04-01T19:59:14Z

pyproject.toml

@@ -55,6 +55,7 @@ transformers = { version = "^4.35", optional = true }
 typer = { version = ">=0.9, <1.0", extras = ["all"], optional = true }
 uvicorn = { version = ">=0.24, <1.0", optional = true, extras = ["standard"] }
 xmltodict = ">=0.13, <1.0"
+pydantic = { version = ">=2.0.0", optional = true }


Since cbrkit.loaders contains the statement from pydantic import BaseModel, it makes sense to add it as a required dependency for now.

Suggested change

pydantic = { version = ">=2.0.0", optional = true }

pydantic = "^2.0"

mirkolenz · 2024-04-01T19:59:37Z

cbrkit/loaders.py

            cb[file.name] = loader(file)

    if len(cb) == 0:
        return None

    return cb
+
+
+def validate(data: dict[str, Any] | object, validation_model: BaseModel):


Suggested change

def validate(data: dict[str, Any] | object, validation_model: BaseModel):

def validate(data: Casebase[Any, Any] | Any, validation_model: BaseModel):

mirkolenz · 2024-04-01T20:01:30Z

cbrkit/loaders.py

+    """
+    if data is None:
+        raise ValueError("Data is None")
+    if isinstance(data, DataFrameCasebase):


Suggested change

if isinstance(data, DataFrameCasebase):

elif isinstance(data, DataFrameCasebase):

mirkolenz · 2024-04-01T20:02:02Z

cbrkit/loaders.py

+    if data is None:
+        raise ValueError("Data is None")
+    if isinstance(data, DataFrameCasebase):
+        data = data.df.to_dict("index")


This is extremely slow. Is there a way to just iterate over all Series entries instead?

It is possible to do it in the following manner:
for item in df.iterrows(): validation_model(**item[1])
However, this is much slower at 50ms vs 6ms using df.to_dict() for the cars example.

Then just leave it as it is for the time being. In case someone reports performance issues we can revisit it in the future.

mirkolenz · 2024-04-01T20:02:11Z

cbrkit/loaders.py

+        raise ValueError("Data is None")
+    if isinstance(data, DataFrameCasebase):
+        data = data.df.to_dict("index")
+    if isinstance(data, dict):


Suggested change

if isinstance(data, dict):

if isinstance(data, Mapping):

mirkolenz · 2024-04-01T20:03:22Z

cbrkit/loaders.py

+To manually use Pydantic with CBRkit to validate your case base, you can use an appropriate 
+Pydantic model instead of the CBRkit loaders (see example below). 
+Alternatively, the dataframe, path, file and folder accept an optional validation_model argument
+to validate the Casebase entries.


The description should be updated for the new cbrkit.loaders.validate function.

mirkolenz · 2024-04-01T20:04:03Z

cbrkit/loaders.py

+Example:
+    >>> from pydantic import BaseModel, PositiveInt, NonNegativeInt
+    >>> from data.cars_validation_model import Car
+    >>> data = csv("data/cars-1k.csv")
+    >>> for row in data.values():
+    ...     assert isinstance(Car.model_validate(row), Car)


Now that we have the validation function, maybe we should just use refer to its docstring instead?

mirkolenz · 2024-04-01T20:04:38Z

cbrkit/loaders.py

+
+        >>> from pydantic import BaseModel, PositiveInt, NonNegativeInt
+        >>> from pathlib import Path
+        >>> file_path = Path("./data/cars-1k.csv")
+        >>> result = file(file_path)
+


What does this snippet achieve? The model does not seem to be used here.

mirkolenz · 2024-04-01T20:05:29Z

cbrkit/loaders.py

+        >>> from data.cars_validation_model import Car
        >>> folder_path = Path("./data")
-        >>> result = folder(folder_path, ".csv")
+        >>> result = folder(folder_path, "*.csv")
+        >>> assert result is not None


Same as above, what is the goal here?

for more information, see https://pre-commit.ci

mirkolenz

Looks good to me, thank you!

pppu added 8 commits March 26, 2024 12:26

added pydantic validation

1ffb885

black formatting

389ada9

separate validation function

d24df32

reverted change for folder loader

fe79097

cleaned up docs

c67ca33

added pydantic dep

dfe23a4

removed unnecessary assertion

7b0b43e

fixed docs formatting

e1045aa

mirkolenz requested changes Apr 1, 2024

View reviewed changes

pppu and others added 2 commits April 2, 2024 06:54

suggested changes from pr

0480051

[pre-commit.ci] auto fixes from pre-commit.com hooks

94f3207

for more information, see https://pre-commit.ci

mirkolenz approved these changes Apr 2, 2024

View reviewed changes

mirkolenz merged commit b90f006 into wi2trier:main Apr 2, 2024
6 of 7 checks passed

mirkolenz mentioned this pull request Apr 2, 2024

Validation with Pydantic #82

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added Pydantic validation function and manual validation documentation #87

Added Pydantic validation function and manual validation documentation #87

kilianbartz commented Mar 28, 2024

mirkolenz left a comment

mirkolenz Apr 1, 2024

mirkolenz Apr 1, 2024

mirkolenz Apr 1, 2024

mirkolenz Apr 1, 2024

kilianbartz Apr 2, 2024 •

edited

Loading

mirkolenz Apr 2, 2024

mirkolenz Apr 1, 2024

mirkolenz Apr 1, 2024

mirkolenz Apr 1, 2024

mirkolenz Apr 1, 2024

mirkolenz Apr 1, 2024

mirkolenz left a comment

	pydantic = { version = ">=2.0.0", optional = true }
	pydantic = "^2.0"

	def validate(data: dict[str, Any] \| object, validation_model: BaseModel):
	def validate(data: Casebase[Any, Any] \| Any, validation_model: BaseModel):

	if isinstance(data, DataFrameCasebase):
	elif isinstance(data, DataFrameCasebase):

Added Pydantic validation function and manual validation documentation #87

Added Pydantic validation function and manual validation documentation #87

Conversation

kilianbartz commented Mar 28, 2024

mirkolenz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kilianbartz Apr 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mirkolenz left a comment

Choose a reason for hiding this comment

kilianbartz Apr 2, 2024 •

edited

Loading