Skip to content
This repository has been archived by the owner on Jul 3, 2023. It is now read-only.

Decorator improvements #316

Merged
merged 5 commits into from
Feb 18, 2023
Merged

Decorator improvements #316

merged 5 commits into from
Feb 18, 2023

Conversation

elijahbenizzy
Copy link
Collaborator

@elijahbenizzy elijahbenizzy commented Feb 17, 2023

A few decorator changes. Some are user-facing and some are relevant till later.

Changes

  1. The target parameter for NodeTransformer/NodeDecorator
  2. Ensuring decorators declare their required configurations
  3. Documentation improvements

How I tested this

Automated tests.

Notes

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.

This allows for inspection/visibility, and will be essential in
checkpointing/storage options. We want decorators to declare these
upfront, rather than relying on the internals of the decorators to
break, enabling clearer error messages for more usable
configuration-driven pipelines.

Furthermore, we also allow for some "optional" configuration parameters.
These enable us to provide a default. The use-case here is config.when
-- which currently defaults to None. Allowing this from the start is not
ideal, but it's what we did and users rely on it.

There *is* an out, however. If someone is using the base `@config`,
they were able to declare a lambda function, which yields no insight
into the parameters required by the function. To bypass this, we pass
None out, and the framework knows not to filter the required parameters.
@elijahbenizzy elijahbenizzy changed the base branch from checkpoint to main February 17, 2023 18:44
@elijahbenizzy elijahbenizzy changed the title Adds target to decorator Decorator improvements Feb 17, 2023
@elijahbenizzy elijahbenizzy force-pushed the target-decorator branch 6 times, most recently from 95bdfc9 to 4a72db0 Compare February 17, 2023 20:39
The `target` parameter specifies *which* nodes will be transformed.
This can take 4 values:
- None -- this is the old default behavior, just decorating the 'output'
  (I.E. sync nodes)
- Ellipsis (...) -- this decorates everything
- string -- this decorates *just* the specified node
- Collection[string] -- this decorates all nodes in the collection
non-final.

Our new target decorator nullifies this, and to be honest, this never
really did what we thought from the beginning. The nodes taht were
decorated with non-final were *actually* non-final, meaning that they
were never sink nodes (E.G. always depended on).
@elijahbenizzy elijahbenizzy marked this pull request as ready for review February 17, 2023 21:31
@@ -126,6 +127,15 @@ def validate(self, fn: Callable):
)


@deprecation.deprecated(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why deprecate? Seems like this is much better for mass application?

decorators.md Outdated Show resolved Hide resolved
Comment on lines +577 to +599
2. Decorators that *turn* a function into a set of nodes (E.G. `@does`)
3. Decorators that modify a set of nodes, turning it into another set of nodes (E.G. `@parameterize`, `@extract_columns`, and `@check_output`)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't sound like there's a difference between (2) & (3)?
Otherwise going between function and nodes is confusing I think -- all the decorators operate on a function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a difference. The decorators in (3) don't operate on functions, they operate on nodes.

The decorators in (2) have access to the function and use it to determine the subdag shape, the decorators in (2) don't.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but does that explanation make sense to an end user? I don't think we've articulated the distinction...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, TBH I think we need to work on making it clearer, and its difficult for me to do when I'm in the nodes headspace. So, let's revisit that and see how users react? Added a note about functions/nodes, but let's see how users react when we're rewriting documentation.

Copy link
Collaborator

@skrawcz skrawcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested this -- seems to look reasonable.

:return: A collection of nodes that are not in the set of nodes to transform but are in the
subdag
"""
return [node_ for node_ in all_nodes if node_ not in nodes_to_transform]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure -- could also use a set operation here.

We get at the mental model for decorators/how layering works. Its
complicated, so we don't go into too much detail (the code is there to
browse), but should give users enough to be effective at using
layering/`target_`
This is a log nicer. See discussion here for full context:
#86.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize @check_output (pandera) when using with @extract_columns
2 participants