-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Union type in structured configs. #381
Conversation
Cool.
|
a0d3cc3
to
075cb37
Compare
075cb37
to
20dced3
Compare
👉 View analysis in DeepCode’s Dashboard | Configure the bot |
I've done several things. I've removed validation with other functions since as you pointed out, it doesn't make much sense. I'll try to add more test cases and wait for your review because I think you might want to comment a lot. Still pickle not done. |
e05452e
to
b6812bc
Compare
Hey! I'll comment every change by the tasks:
I've removed the function (662772b) to check whether the value is a valid type and changed it so it's checked when wraping the value with each element_type(until it finds a valid one, if not, it raises error).
In this case it didn't need that much work. I you add a new value to a DictConfig or ListConfig it will wrap with type_ Union and union will handle the rest.
Optional[Union[int ,str]] returns Union[int, str, NoneType] directly so if none is inside Union, UnionNode's optional will be true then, it removes None from Union[int, str] because optional it's checked in Node's validate.
I added test for this one in 9037ea0 and the tests passed.
After adding tests here I foound that I didn't test structured config assignment to union nodes and I had to add specific validation for it. Moreover since I use wrap node to check if a value is correct I didn't want to wrap Nodes again(it will return the node and not check) so,I used the value of the node to wrap.
The way I faced this is transforming Union[vt1, vt2] to the list of element_types and then when loading I use the list to reconstruct the Union type as before. It looks strange but for example: I believe there's refactoring to do in Also, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests/test_nodes.py
Outdated
{"foo": 3.14}, | ||
), | ||
( | ||
UnionNode(element_types=[bool, float], is_optional=True), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional[X] is Union[NoneType, X].
Can you make sure that that form of Union is interpreted as Optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look into it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, ref_type stays as Union[NoneType, X], elements_types is without Nonetype 'Union[x]' and is_optional is True if NoneType is inside ref_type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So actually, I just suggestd to forbid Union[NoneType, X]
to keep things simple.
@omry do you see a good reason to support it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Union[X] is exactly the same as Union[NoneType, X]:
In [1]: from typing import Union, Optional
In [2]: class X: ...
In [3]: Optional[X]
Out[3]: typing.Union[__main__.X, NoneType]
In [4]: Union[X, type(None)] == Optional[X]
Out[4]: True
In [5]: Union[type(None), X] == Optional[X]
Out[5]: True
optional fields has special handling in OmegaConf (because their support predated Union support) but in theory it could be implemented in terms of a Union instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional[X] is exactly the same as Union[NoneType, X]:
Yes, that's why I suggested to support only Optional[X]
and not Union[NoneType, X]
: there is no need to support both.
Although we could probably support both, it makes things a bit more complex because now we have a Union
that shouldn't translated into a UnionNode
(it should instead be an XNode
with optional=True
).
Keeping this conversation resolved but feel free to reopen if you disagree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, unresolving this conversation because I just realized that in Python, Optional[X]
is actually translated into Union[NoneType, X]
. Thus I assume that OmegaConf is already catching this kind of Union somewhere to turn it into an XNode
with optional=True
. As a result, the error triggered at https://github.com/pereman2/omegaconf/blob/7d6e39ebed85279ef9cae6cd91bcb77748b2df22/omegaconf/_utils.py#L457 should never happen => it probably should be an assert (I don't think there was a test for it, actually).
I added more test cases for Union inside Dict and List |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really familiar with the structured config code so there's some stuff I'm not too sure about.
omegaconf/_utils.py
Outdated
element_types: List[Any] | ||
if ref_type is not Union and args is not None and args[0] is not Any: | ||
element_types = list(args) | ||
assert element_types is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code looks a bit weird to me:
- Not sure what the
if ref_type is not Union
test is supposed to catch: if I remove it, all tests still pass - The assert seems to ensure that the
if
condition always succeeds. However, if that is not the case, thenelement_types
will actually not be defined and you'll get anUnboundLocalError
instead of anAssertionError
. It would seem more natural to me to instead replace theif
by anassert
(on the same condition).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what the if ref_type is not Union test is supposed to catch: if I remove it, all tests still pass
It tested Unions without types(which don't make sense, like List or Dict alone now that I think about it).
I think this should be better to return Any in case it's not a union like get_element_list. Correct me if I'm wrong.
PS: Thanks for your feedback!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what the if ref_type is not Union test is supposed to catch: if I remove it, all tests still pass
It tested Unions without types(which don't make sense, like List or Dict alone now that I think about it).
I think this should be better to return Any in case it's not a union like get_element_list. Correct me if I'm wrong.
Ah ok, personally I think it'd be best to raise an exception if this function is called with anything that's not a Union[something]
, because I can't think of any use case where this would be intended.
Also I think that something
should have at least two elements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is a union without a type?
like this?
@dataclass
class X:
a: Union
This seems like a legitimate Python (at least it runs and mypy --strict does not complain).
not sure what it's supposed to mean though. Unless someone can tell me what this is for, I think we should raise an error for a Union without a list of types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should raise an error for a Union without a list of types.
Agreed -- just tested and currently it's accepted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fixes, added a few more comments based on 720e405
Co-authored-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a minor comment for now.
I've been spending some time on 45f72c0 today but it's a tricky one, touching OmegaConf's internals I'm not familiar with. Will need a bit more time to wrap my head around it.
omegaconf/nodes.py
Outdated
if is_structured_config(ref_type): | ||
value_type = ref_type | ||
if any( | ||
issubclass(value_type, union_type) for union_type in self.element_types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So there's actually an issue with this (sorry, it was my suggestion). If union_type
is something like List[X]
then this will crash (for instance, try executing issubclass(int, List[int])
to see the error).
I would suggest adding a test to catch this error, then either:
- Write code that is robust to this (e.g., test type equality when issubclass fails)
- Stick to type equality like you had earlier
A potential option:
if isinstance(value, BaseContainer):
ref_type = value._metadata.ref_type
if ref_type is not Any:
value_type = ref_type
for union_type in self.element_types:
try:
if issubclass(value_type, union_type):
return value
except TypeError:
if value_type == union_type:
return value
self._raise_invalid_value(value, value._metadata.ref_type)
Note that I made another change, besides the subclass check: replacing if is_structured_config(ref_type)
with if ref_type is not Any
. To be honest I don't fully understand the role of ref_type
, but intuitively I'm understanding it as "the type value
is impersonating", which is the motivation behind this change (it seems more generic). Feel free to let me know if you disagree or think I'm misunderstanding how this works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been adding tests and it fails as you pointed out. The code you wrote could be useful, I'll try it.
I'm finishing a sprint this week so sometime this week i expect to have a fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I've been adding tests and encountered wrapping like this:
node = DictConfig({"foo": "invalid"})
_maybe_wrap(value=node._value(), ref_type=Dict[str, int], is_optional=True, key=None, parent= None)
which fails because "invalid" is AnyNode and validate_set doesn't take it well. I solved this in #443 so this might need to wait that PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@odelalleau, ref_type is the annotated type.
for example:
@dataclass
class Foo:
x : BaseClass = SubClass()
BaseClass is the ref_type.
Subclass is the type.
ref_type is checked against during assignment, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been adding tests and it fails as you pointed out. The code you wrote could be useful, I'll try it.
I'm finishing a sprint this week so sometime this week i expect to have a fix.
I would rather that we checked if the field is a Subscripted generic and apply different logic than to catch the error and recover somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I've been adding tests and encountered wrapping like this:
node = DictConfig({"foo": "invalid"}) _maybe_wrap(value=node._value(), ref_type=Dict[str, int], is_optional=True, key=None, parent= None)which fails because "invalid" is AnyNode and validate_set doesn't take it well. I solved this in #443 so this might need to wait that PR.
Not sure I understand, isn't it expected to fail? (since "invalid"
is not an int
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is expected to fail. and it is fixed in that other PR @pereman2 mentioned.
omegaconf/omegaconf.py
Outdated
) -> Node: | ||
# if already a node, update key and parent and return as is. | ||
# NOTE: that this mutate the input node! | ||
if isinstance(value, Node): | ||
if isinstance(value, Node) and ref_type is Any: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this is the change I'm most concerned about. Until now _node_wrap()
was never called (I think) with value
already being a Node, and I'm a bit worried of what might happen if we allow this (especially if the code doesn't explicitly restricts this to unions).
I don't have a good solution yet, I'm still trying to figure out exactly how and when _maybe_wrap()
is being used thoughout the codebase. But if there was a way to make sure that the changes required to support Union won't have unintended side effects elsewhere, I would be more confident in letting them in.
I may not have time to look more closely at it in the next few days, sorry. @omry feel free to chime in :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrapping of Nodes is causing me trouble.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrapping of Nodes is causing me trouble.
Yeah, honestly, it seems brittle to me, because the rest of the code wasn't written with that in mind. There are a few weird things going on that I can see, and probably more that I can't.
I submitted a PR to your branch in pereman2#1 with a few things I believe make it at least a bit better. For some of my changes I struggled to write a meaningful test as I got deep into some rabbit holes, so eventually I gave up and decided that just "looking better" would be good enough ;)
(in particular, now _node_wrap()
will return a UnionNode
as soon as type_
is a union type, even if called from outside of UnionNode
)
I doubt we'll manage to get to something 100% satisfying without a deep re-thinking of some OmegaConf internals (or completely changing the approach taken here), so as far as I'm concerned I'm ok with accepting this PR (with my suggested changes as well as the fixes you mentioned in #381 (comment)), unless @omry thinks otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, The reason maybe_wrap exists is exactly to avoid that.
wrap this object if it's not already a node. so generally I think _node_wrap should not be used on nodes (and should probably assert that).
I am going to look at the code after going through the comment and form a more informed opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think _node_wrap should not be used on nodes (and should probably assert that).
Just to give some context, currently we need it because otherwise with something like:
x: Union[int, DictConfig] = field(default_factory=lambda: DictConfig({}))
the type of node x
is DictConfig
instead of UnionNode
, and this may break things later when we try to assign other values to x
.
UnionNode
is an odd beast because it's a ValueNode
(not a Container
) but it wraps another node -- something that didn't exist before.
* Fix assignment of a ValueNode to a UnionNode, which could sometimes results in an incorrect type * Remove the `use_type` argument to simplify the code * Keep the wrapping of a `Node` instance in `_maybe_wrap()` limited to Unions, to reduce the risk of unintended side effects
A few improvements to Unions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed until the tests.
Looking pretty good so far.
Thanks @pereman2 and thanks @odelalleau for your help reviewing.
omegaconf/nodes.py
Outdated
if is_structured_config(ref_type): | ||
value_type = ref_type | ||
if any( | ||
issubclass(value_type, union_type) for union_type in self.element_types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been adding tests and it fails as you pointed out. The code you wrote could be useful, I'll try it.
I'm finishing a sprint this week so sometime this week i expect to have a fix.
I would rather that we checked if the field is a Subscripted generic and apply different logic than to catch the error and recover somehow.
return value_node | ||
|
||
def _raise_invalid_value(self, value: Any, value_type: Any) -> None: | ||
raise ValidationError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefer fstrings, this is Python 3 :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in 1b74272
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can get rid of the return None
after self._raise_invalid_value(value, type(value))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also regarding fstrings, Omry meant something like:
f"Value '{value}' with value_type '{value_type}' is not in '{self.element_types}'"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh shoot my bad I did it without thinking.
) | ||
) | ||
|
||
def _get_element_types_lead_by(self, type_: Any) -> List[Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function name is unclear. what does it do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It returns the list of union types (self.element_types
) reordered so that type_
comes in first position.
omegaconf/nodes.py
Outdated
element_types[0], element_types[pos] = element_types[pos], element_types[0] | ||
return element_types | ||
|
||
def __eq__(self, other: Any) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think sadly python __eq__
is supposed to return Any. (because it can return NotImplemented, which is not a boolean).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can it return NotImplemented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can it return NotImplemented?
Scroll down a bit to the comparison operators in https://docs.python.org/3/reference/datamodel.html#basic-customization
@@ -163,6 +169,127 @@ def __deepcopy__(self, memo: Dict[int, Any] = {}) -> "AnyNode": | |||
return res | |||
|
|||
|
|||
class UnionNode(ValueNode): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nodes is becoming pretty big.
as a followup diff, we can refactor it and split it into a file per node (probably in omegaconf.nodes module).
e.g:
omegaconf/nodes/
any.py
str.py
union.py
...
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. We should do it.
omegaconf/nodes.py
Outdated
for union_type in element_types: | ||
try: | ||
value_node = self._wrap_node(value=value, ref_type=union_type) | ||
valid_value = True | ||
break | ||
except Exception: | ||
pass | ||
if not valid_value: | ||
self._raise_invalid_value(value, type(value)) | ||
return value_node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit cleaner:
for union_type in element_types: | |
try: | |
value_node = self._wrap_node(value=value, ref_type=union_type) | |
valid_value = True | |
break | |
except Exception: | |
pass | |
if not valid_value: | |
self._raise_invalid_value(value, type(value)) | |
return value_node | |
for union_type in element_types: | |
try: | |
return self._wrap_node(value=value, ref_type=union_type) | |
except Exception: | |
pass | |
self._raise_invalid_value(value, type(value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done 1b74272.
is_dict = ( | ||
is_dict_annotation(type_) | ||
or type_ is dict | ||
or (type_ is Any and type(value) is dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think type_
here should actually be called ref_type.
See how it's used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it and it fails tests with ref_type
# If already a node, update key and parent and return as is. | ||
# NOTE: that this mutates the input node! | ||
# If `ref_type` is a union we still need to wrap the node into a `UnionNode`. | ||
if isinstance(value, Node) and not _is_union(ref_type): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a concern about this.
can this lead to multiple wrapping in a Union node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, right now in the test suite value
is never a UnionNode
here at least.
That being said, I do have concerns with unions of unions (ex: Uniont[int, Union[str, float]]
). I suspect some things may not work properly. However it may not be worth supporting them since we can combine them in a single Union
(maybe we should catch it and raise an error though, to be safe).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Python is doing it automatically. try that in an interpreter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oooh nice :) In that case I think it should be ok -- maybe we could add an assert not isinstance(value, UnionNode)
in the else:
block to be safe (unless you can see a way to trigger it in a normal use case?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can block this until we see an actual use case. Better having someone complain about something raising an exception than have internal structure that goes haywire in certain scenarios.
@attr.s(auto_attribs=True) | ||
class UnionWithContainer: | ||
union_dict: Union[DictConfig, int] = DictConfig({"foo": 1}) | ||
union_list: Union[ListConfig, int] = ListConfig([1, 2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a test for a case for dataclasses as union elements sharing a common base class.
It's weird but people are doing it.
@dataclass
class Base: ...
@dataclass
class Subclass1(Base): ...
@dataclass
class Subclass2(Base): ...
@dataclass
class UnionOfSubclasses:
foo: Union[Subclass1, Subclass2]
Test are failing now because of I added test cases which include the bug solved in #443. Should the try block be wrapped in a function. There are two instances of this in the same method. |
cfg.foo = module.Subclass1 | ||
assert cfg.foo == module.Subclass1() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit surprised this works, since cfg.foo
is set to the class and not an instance. What's going on there?
return self._wrap_node( | ||
value=value._value(), ref_type=union_type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it's a big deal but I'd rather ask @omry his opinion on this line, because it is changing the original ref_type
of value
to union_type
(not in-place though since we create a new node, this is why it doesn't worry me too much).
FYI: |
Thanks for the info. Anyways, I thought I had more time but I'm full of college work for this vacations. |
I bounced the Union support to OmegaConf 2.2, we will resume working on it after 2.1 is released. Since many things have changed in 2.1, we will likely start from scratch but the tests will for sure be very useful. |
To add support for UnionTypes a new type of node is created
UnionNode
. This node holds a value which is another node so it's easier to check values. I plan to modify_validate_value
which validates that the input is correct with the values implementation ofvalidate_and_convert
because UnionNode should not know what a ListConfig or DictConfig is.I added some tests to check the behaviour. At this time I don't know which ones I missed, I'll check later.
Still have to check optional unions like
Optional[Union[str, int]]
and I will change when pickling unions on 3.6 still fails.Tests list
A list of use cases to cover with tests, add more here if you think of more.
Union[str, List[str]]
List[Union[int, str]]
Closes #144 .