-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
should our data models be hashable / immutable #24
Comments
I'm not sure if it's related, but if in place mutation is necessary for the needed extensibility, I'd value that over hashability. |
I think the recommendation for extensions would be to use subclassing, e.g. |
If we made our classes frozen, we would need to ensure that it's easy to make a copy with new, validated, attributes. We can do this with the following design class Axis(Base):
"""
Model for an element of `Multiscale.axes`.
See https://ngff.openmicroscopy.org/0.4/#axes-md.
"""
name: str
type: str | None = None
unit: str | None = None
def with_name(name: str):
"""
return a copy of the axis with a new name
"""
return type(self)(**self.model_dump(exclude={'name'}), name=name)
def with_type(name: str):
"""
return a copy of the axis with a new type
"""
return type(self)(**self.model_dump(exclude={'type'}), type=type)
ax = Axis(name='foo', type='space')
# this will make a new axis with name 'bar', and type time
new_ax = ax.with_name('bar').with_type('time') obviously we would complete this with |
I'm 👎 on having to manually write those |
I'm happy to write them, and I they have to be instance methods because they need the original values |
pydantic has In [12]: Axis(name='x').model_copy(update={'name': 'Y'})
Out[12]: Axis(name='Y', type=None, unit=None) however it doesn't perform validation: In [15]: Axis(name='x').model_copy(update={'name': ()})
Out[15]: Axis(name=(), type=None, unit=None) when I want immutability, but also want the convenience of easy updates, I often add a class FrozenModel(BaseModel):
model_config = {'frozen': True}
def replace(self, **kwargs: Any) -> "Self":
"""Return a new instance replacing specified kwargs with new values."""
return type(self).model_validate({**self.model_dump(), **kwargs}) and then if you want model specific type hinting on each subclass, one option is to use d = {k: getattr(self, k) for k in self.model_fields}
return type(self).model_validate({**d, **kwargs}) I've generally been happy when I've made things immutable... particularly when they are objects may be passed as arguments to various listeners/consumers. So I'd certainly try to make them immutable, but yeah, like @joshmoore said, if it really gets in the way then meh |
thanks @tlambert03 those examples are super helpful! I love that top-level |
I updated the title of this issue to include an explicit reference to immutability. Another point in favor of immutability for these data models: we have logically coupled fields, namely:
If we allow mutation, and we attempt to validate when attributes change, then it's not possible to mutate just one of these fields without causing a validation error. We would have to mutate all the logically coupled fields in one go, which is not possible with vanilla Consider the case of adding new axes in So if we want to support operations like "add axes", then we have to treat the models as immutable at least within the scope of those operations, in which case IMO we should just be consistent and treat the models as immutable for all transformations. (this came up in a conversation on zulip) |
yeah, but it is probably possible with this is not to argue in favor of mutability, but if the counter argument here is more of a "how would we do that?" argument, i think it's possible? |
@tlambert03 can you explain how this would work? Suppose i have a model like this: class Model(BaseModel):
foo: tuple[int, ...]
bar: tuple[int, ...]
@model_validator(mode='after')
def ensure_foo_bar_same_length(cls, v):
if len(v.foo) != len(v.bar):
raise ValidationError
return v how would you support something like this: my_model = Model(foo=(1,2,3), bar=(4,5,6))
my_model.foo = (1,2,3,4) # this will raise if we validate on `__setattr__`
my_model.bar = (4,5,6,7) |
sure, first off, if you want pydantic to validate field assignments at all after model creation, you need to use from pydantic import BaseModel, model_validator, ConfigDict
class Model(BaseModel):
model_config = ConfigDict(validate_assignment=True) With that alone, your code actually would raise an exception on the second line: my_model = Model(foo=(1, 2, 3), bar=(4, 5, 6))
my_model.foo = (1, 2, 3, 4) Traceback (most recent call last):
File "/Users/talley/Desktop/x.py", line 19, in <module>
my_model.foo = (1, 2, 3, 4)
^^^^^^^^^^^^
File "/Users/talley/micromamba/envs/all/lib/python3.13/site-packages/pydantic/main.py", line 923, in __setattr__
self.__pydantic_validator__.validate_assignment(self, name, value)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.ValidationError: 1 validation error for Model
Value error, foo and bar must have the same length [type=value_error, input_value=Model(foo=(1, 2, 3, 4), bar=(4, 5, 6)), input_type=Model]
For further information visit https://errors.pydantic.dev/2.10/v/value_error side-note on mode='after'If you're using @model_validator(mode='after')
def ensure_foo_bar_same_length(self) -> Self:
if len(self.foo) != len(self.bar):
raise ValidationError
return self however there, is still a problem here, (it's an open issue at pydantic/pydantic#7105), which is that, while a validation error was raised, the model is left in an inconsistent state: my_model = Model(foo=(1, 2, 3), bar=(4, 5, 6))
with contextlib.suppress(Exception): # swallow the validation error
my_model.foo = (1, 2, 3, 4)
print(my_model) foo=(1, 2, 3, 4) bar=(4, 5, 6) # 😢 so, you have to use from typing import Any
from pydantic import BaseModel, model_validator, ConfigDict
class Model(BaseModel):
model_config = ConfigDict(validate_assignment=True)
foo: tuple[int, ...]
bar: tuple[int, ...]
@model_validator(mode="before")
@classmethod
def ensure_foo_bar_same_length(cls, data: dict[str, Any]) -> dict[str, Any]:
foo = data.get("foo", ())
bar = data.get("bar", ())
if len(foo) != len(bar):
raise ValueError("foo and bar must be the same length")
return data
my_model = Model(foo=(1, 2, 3), bar=(4, 5, 6))
try:
my_model.foo = (1, 2, 3, 4)
except Exception as e:
print(e)
print("after setting:", my_model) 1 validation error for Model
Value error, foo and bar must be the same length [type=value_error, input_value={'foo': (1, 2, 3, 4), 'bar': (4, 5, 6)}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.10/v/value_error
after setting: foo=(1, 2, 3) bar=(4, 5, 6) |
Thanks for the examples, that's very helpful. The end of your example produces a pydantic class where validation runs on assignment, but because of the coupling between the fields, this assignment can never succeed. Suppose we filled out the model with more fields, and most of them were not coupled. Then assignment would work for some fields, but never for others, and this is exactly the outcome I want to avoid by making the models immutable! (that's what I was trying to express in this comment). I think it's super valuable of the API for making changes to these models is uniform, which means we should avoid telling users "you can modify some attributes with direct assigment, unless they are special, in which case do it very differently" I think this setup would be much better: from typing import Any
from pydantic import BaseModel, model_validator, ConfigDict
class Model(BaseModel, frozen=True):
foo: tuple[int, ...]
bar: tuple[int, ...]
baz: int
@model_validator(mode="before")
@classmethod
def ensure_foo_bar_same_length(cls, data: dict[str, Any]) -> dict[str, Any]:
foo = data.get("foo", ())
bar = data.get("bar", ())
if len(foo) != len(bar):
raise ValueError("foo and bar must be the same length")
return data
@classmethod
def with_baz(cls, new_baz) -> Self:
"""Create a new model with a new baz attribute"""
...
@classmethod
def with_foo_and_bar(cls, new_foo, new_bar) -> Self:
"""Create a new model with new foo and bar attributes"""
... Essentially for each set of coupled attributes (a set which could have 1 element), we expose a convenience method that takes new values for those attributes as parameters and returns a brand-new model. No need to worry about invalid models with this approach. We do lose direct assignment, but IMO it's too problematic to keep. |
Ahhh yes I see now (sorry I missed that point earlier!). Yeah if you have fields that MUST change together... then one would need to use custom setter methods that take one or more arguments and validate them together (honestly, it kinda seems like you'd need that even for immutable objects with a "with_" method). And one should never directly modify the attribute I know you have less control here over the model itself (inasmuch as it's just mirroring the spec). But if you can avoid mutually constrained fields altogether on the runtime model side, that would make the data model a bit easier to work with. |
This is the job of libraries higher up in the stack :) for this library, our scope is narrowly constrained to modelling the exact metadata in OME-Zarr as it is defined there, which means we will have to deal with logically coupled fields |
Well not necessarily, you could always have computed fields that do strictly model upstream, with input fields that are not mutually constrained. Point being you can absolutely model something strictly mirroring upstream, while providing an api that makes it easier to work with mutually constrained stuff. (Again not endorsing either approach here, just exploring options :) ) |
we are pretty specifically constrained with respect to our input fields (i.e., they are mutually constrained), because (at least at the moment) we want to construct these models from raw JSON documents, and in those documents the fields are mutually constrained. So if we are to introduce a "simplified" API on top of that, it can't manifest in the signature of |
Yep definitely makes sense! I agree, it all brings it back to immutable models being probably a good idea |
Finally read this thread! Seems like we should make all our models immutable for now, and if we want to make them mutable down the line it's easier to go that way than mutable > immutable. |
I have a test in
pydantic-ome-ngff
that checks if my multiscale model is hashable. I don't remember why I added this test, but evidently it can be useful for these metadata models to be hashable, e.g. so that we can store them in python sets.Making metadata models hashable requires using frozen pydantic models / dataclasses. We previously decided against using frozen models, because we wanted to allow in-place mutation of those models. How do we weigh the value of in-place mutation vs the advantages of having hashable models? (Maybe we think in place mutation is much more useful...)
The text was updated successfully, but these errors were encountered: