-
Notifications
You must be signed in to change notification settings - Fork 37
Decorator to specify value and dag node inputs #162
Comments
There are a few use cases. For example, to implement an adder or shifter function. def add(field: pd.Series, n: int):
return field + n def shift(field: pd.Series, window: int):
return field.shift(window) We would want to parameterize More generally, many |
So its basically a way to do 1 function where it explodes out to multiple use-cases, right? It's an interesting tension between repetitive code (E.G. repeating multiple input/output parameterizations) and making everything readable. A common way we've seen this is something like: def _shift_helper(**kwargs):
# this is a slight limitation of the `@does` API -- happy to think this through a little more
series_arg = [item for item in kwargs.values() if isinstance(kwarg, pd.Series)]
window_args = [item for item in kwargs.values() if isinstance(kwarg, int)]
return series_arg.shift(window_arg)
@does(_shift_helper)
def shift_artifact_1(field_1: pd.Series, window: int=3) -> pd.Series:
pass
@does(_shift_helper)
def shift_artifact_2(field_2: pd.Series, window: int=5) -> pd.Series:
pass
# And so on This is nice as everything is written down in a function but there's a lot of added code. I think when there gets to be enough parametrizations, what you suggested would be pretty nice. E.G. something like: @parametrized_everything_for_lack_of_a_better_name(
field_1=({"field" : "field_1", "shift" : 3}, "field_1 shifted by 3"),
field_2=({"field" : "field_2", "shift" : 5}, "field_2 shifted by 5"),
def shift(field: pd.Series, window: int):
return field.shift(window) The cool part about this from an implementation perspective is we could simplify the code by gutting Also, what exactly breaks with the two combined? Feels like that could be an approach as well. |
Yes, exactly. I have settled on something like the former you mentioned for now, just to have something that works. The preferred, imo, is definitely the latter, which I find is clearer and more concise. |
Awesome, I'm inclined to agree. Might give this a stab in the next week -- if I do, would you mind testing out on a branch/giving feedback? There are a few APIs I'm thinking about -- the interesting thing is we have to specify if an input is a literal (E.G. a string), or coming from an input. One option could be... @parametrized_full(
field_1=({"field" : dependency("field_1"), "shift" : 3}, "field_1 shifted by 3"),
field_2=({"field" : dependency("field_2"), "shift" : 5}, "field_2 shifted by 5"),
def shift(field: pd.Series, window: int):
return field.shift(window) Or its alternative: @parametrized_full(
field_1=({"field" : "field_1", "shift" : literal(3)}, "field_1 shifted by 3"),
field_2=({"field" : "field_2", "shift" : literal(5)}, "field_2 shifted by 5"),
def shift(field: pd.Series, window: int):
return field.shift(window) Any other ideas? I'm kinda into requiring this decorator to have both: @parametrized_full(
field_1=({"field" : dependency("field_1"), "shift" : literal(3)}, "field_1 shifted by 3"),
field_2=({"field" : dependency("field_2"), "shift" : literal(5)}, "field_2 shifted by 5"),
def shift(field: pd.Series, window: int):
return field.shift(window) The advantage of this is that it allows something really expressive, and easy delegation to The annoying thing is that |
Absolutely - happy to test out and give feedback! I also kind of like explicitly specifying a node vs. a literal. Although, I do find the expression API of Other thing I would note is whether it would make sense to keep the parameterized docstring approach as with |
OK, had a lot of fun with this. Not complete yet, but on its way: #163.
|
Hey! Ready for review -- would love your thoughts on the API! #163 |
@wangkev in case it's helpful, you can either checkout the branch and do |
Hi -- thanks for the update, this is neat! The overall functionality seems to be exactly what we discussed. I have a few thoughts from playing around:
Thanks again for the quick update on this! |
Thanks for taking it for a spin! Re: (1) the idea was to have it be a more speciifc API, but its hard to say it shouldn't accept them? E.G. why would it not allows Re (2) slightly torn -- appreciate consistency but |
Agree.
I was thinking the same thing about naming. Elsewhere in the documentation (https://github.com/stitchfix/hamilton/blob/main/basics.md?plain=1#L16), |
Oh interesting. So:
Then keep Maybe
Thanks for the feedback! |
🤔 I think this is a power user feature, so We should be aiming for naming where if someone has used it a couple of times, they wont need to look at the documentation to remember what the distinctions are... So current options are:
Any others? |
Are internally consistent, logical, and easy to understand. |
Another option is field vs value. |
OK, so I think I'm going to go with the above:
IMO its the clearest and most concise - any objections? |
Is your feature request related to a problem? Please describe.
I am looking for a decorator to specify both value and dag node inputs.
parameterized
allows for value inputs,parameterized_inputs
allows for dag node inputs, but there are no decorators that allow you to do both.Describe the solution you'd like
A decorator that allows for both value and dag node inputs.
Describe alternatives you've considered
I tried just decorating a function with both decorators, but that is not supported.
Additional context
The text was updated successfully, but these errors were encountered: