-
Notifications
You must be signed in to change notification settings - Fork 905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make KedroContext
a frozen dataclass
#1465
Make KedroContext
a frozen dataclass
#1465
Conversation
Signed-off-by: noklam <nok.lam.chan@quantumblack.com>
Signed-off-by: noklam <nok.lam.chan@quantumblack.com>
Signed-off-by: noklam <nok.lam.chan@quantumblack.com>
Signed-off-by: noklam <nok.lam.chan@quantumblack.com>
Signed-off-by: noklam <nok.lam.chan@quantumblack.com>
Signed-off-by: noklam <nok.lam.chan@quantumblack.com>
Great work getting to the bottom of this! I also didn't realise it would be so difficult. A few quick comments:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to agree with @AntonyMilneQB that it would be okay not to use frozen
. We'll just need to make it clear to users what the implications of overwriting are.
kedro/framework/context/context.py
Outdated
def __post_init__(self, package_name, project_path, hook_manager, extra_params): | ||
object.__setattr__(self, "_package_name", package_name) | ||
object.__setattr__( | ||
self, "project_path", Path(project_path).expanduser().resolve() | ||
) | ||
object.__setattr__(self, "_hook_manager", hook_manager) | ||
object.__setattr__(self, "_extra_params", deepcopy(extra_params)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the logic behind which attributes are added here and the ones that aren't? For example, why is package_name
added but not env
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good questions!
Personally, I am not too worried about it, but it is still a change in API since we now open up the possibility to overwrite it (though we can argue users could overwrite the self._xxxx
if they are hacky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the 2nd point question, It was just trying to keep the API unchanged.
env
, project_path
, config_loader
, params
(additional logic compose with self._extra_params
, was class property, _hook_manger
, _extra_params
, _package_name
was always internal.
frozen=True
is basically an attempt to achieve similar effect as @property
, but there are some limitations. Mainly due to dataclass
does not offer a nice solution to immutability at the field level, but only class level.
With a normal class, we just do _attribute_name
to signify it is an internal attribute and expose it with an attribute_name
with @property
.
An example can be found here.
https://noklam.github.io/blog/python/2022/04/22/python-dataclass-partiala-immutable.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh right now I see what you mean. Thinking about this a bit more, maybe we should do it like you implemented and be a bit more careful with making properties changeable. It will be harder to make them frozen
once they're mutable and users actually make use of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation is approach b, which works functionally, but it will also break IDE support & linters they seem to have trouble understand obj.__setattr__(self, 'a', 1)
has the same semantic as self.a = 1
. Things like code navigation no longer work.
dataclass
is essentially a code generator, my hesitation is that it seems not benefiting too much in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would love to hear some opinion from @idanov
Updated: Have a brief chat with Ivan and we agree
dataclasses
does not seem to be a good fit,attrs
is promising,pydantic
is another one that does a similar job but more focuses on validation. Also Validation in run-time is not necessary for Kedro, it's more suitable for API that receive data.
Another possible workaround (c) by implementing the frozen mechanism, the advantage is it doesn't break code completion/linting and no weird @dataclass
class KedroContext:
"""``KedroContext`` is the base class which holds the configuration and
Kedro's main functionality.
"""
package_name: InitVar[str]
project_path: InitVar[Union[Path, str]]
config_loader: ConfigLoader
hook_manager: InitVar[PluginManager]
env: Optional[str] = None
extra_params: InitVar[Dict[str, Any]] = None
def __post_init__(self, package_name, project_path, hook_manager, extra_params):
self.project_path = Path(project_path).expanduser().resolve()
self._package_name = package_name
self._hook_manager = hook_manager
self._extra_params = deepcopy(extra_params)
self._proteced_attributes = ["package_name", "project_path", "config_loader"]
# `__setattr__`, `__delattr__` is the dataclass way to mimic class's property
def __setattr__(self, name, value):
if name in self._proteced_attributes:
raise FrozenInstanceError(f"cannot assign to field {name!r}")
def __delattr__(self, name):
if name in self._proteced_attributes:
raise FrozenInstanceError(f"cannot delete field {name!r}")
self.__setattr__ = __setattr__
self.__delattr__ = __delattr__ |
I am more happy with the The syntax may looks unfamiliar at first, but basically class A:
def __init__(self, x):
self._x = x attrs version would be @define
class A:
_x You can see the Class init signature here, which is unchanged. |
Signed-off-by: noklam <nok.lam.chan@quantumblack.com>
Signed-off-by: noklam <nok.lam.chan@quantumblack.com>
Signed-off-by: noklam <nok.lam.chan@quantumblack.com>
Signed-off-by: noklam <nok.lam.chan@quantumblack.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Don't forget to add these changes to the release file 👍
Signed-off-by: noklam <nok.lam.chan@quantumblack.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! ⭐ I fully agree that attrs
is a better solution than the built in dataclasses
. I've done some reading that compares both solutions (and also pydantic
) and I think attrs
will serve us better not only for the KedroContext
but also for other classes that we might want to convert to this format (e.g. `Node).
(Some of the stuff I read in case others find it useful:
- https://www.attrs.org/en/stable/why.html
- https://threeofwands.com/why-i-use-attrs-instead-of-pydantic/
- https://jackmckew.dev/dataclasses-vs-attrs-vs-pydantic.html )
Don't forget to update the release notes!
kedro/framework/context/context.py
Outdated
@@ -1,11 +1,13 @@ | |||
# pylint: disable=no-member |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to disable this for the whole file or is it possible to add it to the specific line(s) it applies to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I change it back to the specific line(s). It's not clear why pylint complains about "no-member" for config_loader
only but not the other attributes.
Signed-off-by: noklam <nok.lam.chan@quantumblack.com>
…loader-as-a-property' of https://github.com/kedro-org/kedro into feat/1459-make-kedrocontext-a-dataclass-and-add-config_loader-as-a-property Signed-off-by: noklam <nok.lam.chan@quantumblack.com>
Just to make a point : as a user, I really want be able to at least add some attributes to the context (especially to enable passing object between hooks, but also to have a clean API to expose objects to user with an interactive workflow). If I were to override one of the existing attributes I would certainly do it in purpose, but I understand your concerns about possible (albeit unlikely) involuntary overriding by a user. Maybe a compromise would be to make the existing attributes frozen, but still enable to add new ones? I have absolutely no idea if it is easy to impement with @noklam What is the rational about making the class frozen (while it is currently not)? Do you think my concerns make sense or do I misunderstand something? |
@Galileo-Galilei Thank you for your comments.
How would you retrieve these attributes after adding new attributes to the context?
It would be still possible by just doing
You are correct that it is the
These are my quick thoughts, I may miss something about interactive workflow, would love to hear more from you! |
Thanks very much for the comments @Galileo-Galilei. Before I comment on whether or not we should make this class frozen let me just come up with some concrete examples to make sure I understand what you're saying here and in your latest comment #506 (comment). This took quite a bit of careful thought so please do let me know whether I've got these right! Example 1: passing objects between hooks in same hook classIn this case you would like to use
The above requires being able to add a new attribute to Currently it's possible to do this as follows:
Example 2: passing objects between hooks in different hook classesIn this case you would like to use
We know that In summary, we have:
The above is currently achieved as follows:
Here we have:
... which is better than above (since Example 3: passing objects into the interactive workflowSimilar to Example 1 above, but now As with Example 1, in theory I think you could also achieve this without using the Quick comments@Galileo-Galilei Is the main use of
Examples 1 and 3 seem straightforward, but Example 2 seems fundamentally difficult due to the way that hooks work independently. Is this an important one at all? Overall it would be useful to you here to have some sort of shared state between different hook classes but I'm not sure what the right way to do this is:
|
KedroContext
a frozen dataclass and add config loader as a propertyKedroContext
a frozen dataclass
…add-config_loader-as-a-property
TL; DR
Thank you both for your answers, it helps a lot.
My idea was exactly to do as what Antony suggests in his example 2. This is "tricky and fragile" as he explains, but it does currently works and I've seen some real world example, albeit this is uncommon. I am totally in line with your point 1 (context is now exposed to user and as a consequence it must be more protected than it is now) and your point 2 (you have a workaround if a user really wants to add extra attributes - I did not know this was possible with Regarding your point 3, I have more mixed feelings: I think it would feel very natural to kedro users to be able to do @AntonyMilneQB : you're reading my mind 😄 I was not very precise but your made a wonderful summary of the 3 use cases I envision.
from a dev experience perspective, it makes much more sense if they can access directly with On your comments
It's more about points 1 & 2 : I can put new configuration in custom variables (and I already do), but I think it is much more "user friendly" to expose it through the context (it is where I would naturally look for extra configuration + autocompletion would help to discover it) I agree example 1 & 3 are straightforward. Regarding example 2, I don't think this is something kedro should try to enforce specifically (and it indeed sounds like an anti pattern), but since it is currently doable it should be nice to preserve the behaviour if possible. |
…add-config_loader-as-a-property
Signed-off-by: noklam <nok.lam.chan@quantumblack.com>
…add-config_loader-as-a-property
Thanks for the comments. Storing plugins-related in I will proceed to merge this PR. |
Final decision - this is going into A separate PR will be made for |
…nd-add-config_loader-as-a-property
Thanks @Galileo-Galilei for all your comments, and sorry for the very slow response! What you say about adding attributes to the context being more user- and developer-friendly than independent variables definitely makes sense 👍 Just a comment on the jupyter notebook point: we have been working on the interactive workflow and are planning to work on it further. There's a category for such issues, and there will be more added to this milestone when I get round to it over the next few weeks. As ever we're very interested in hearing your feedback on it and any suggestions you have! Part of this work which has already happened (released fully in 0.18, but actually the ipython extension has been around for a while before that) is to simplify the flow for exactly the case you describe of managed jupyter instances. Basically users should no longer have to manually set up a |
Description
Fix #1459
Context
As
KedroSession
now control the lifecycle of Kedro's run,KedroContext
acts as a data container and stores important attributes.Since we now dropped Python 3.6 support, we can make use of Python's
dataclasses
.The benefit of it is
KedroContext
will be just a data container.Development Changes
This PR is a bit long so I try to keep the summary of changes at the top.
KedroContext
a dataclass withattrs
, Python'sdataclasses
was considered but it does not fit well. See more in the discussion.@property
that we use to doself._x = x
is now just a one line defintion at the top withattrs.setters.frozen
config_loader
is now a public property ofKedroContext
.KedroContext
attributes remains immutable in case we refactor this in future.The Class init signature is unchanged as evidenced by this:
More Development Notes
KedroContext
a dataclass, signifying it will be simply a container as we gradually move the logic out from it.dataclass
doesn't support partial "read-only" property, so there are 2 options to achieve that.@property
to achieve the read-only attribute, just don't usedataclass
or use alternatives likeattrs
(ancestor ofdataclass
, but it's not inside the standard library).frozen=True
, but use__setattr__
for__post_init__
. The benefit of using it is to lock the context, socontext.arbitary_attr = 'xyz'
assignment is not possible, it also reduce some boilerplate code to set@property
, but the tradeoff is the more complicated__post_init__
method. (See https://stackoverflow.com/questions/59222092/how-to-use-the-post-init-method-in-dataclasses-in-python).dataclass
but not frozen, mimic the frozen by custom implementation instead.The
__post_init__
look like this if we follow (b):Originally I favor method (b) as I am okay with a cleaner class with the complicated bit just in
__post_init__
, however, it seems to create trouble formypy
and other linter as it doesn't understand the attribute assignment.todo
Checklist
RELEASE.md
file