Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added Builder.with_materializers() #911

Merged
merged 5 commits into from
May 21, 2024
Merged

added Builder.with_materializers() #911

merged 5 commits into from
May 21, 2024

Conversation

zilto
Copy link
Collaborator

@zilto zilto commented May 16, 2024

Following #877

Added the ability to add materializer nodes to the FunctionGraph at Driver build time.

Use cases:

  • Easier to view materializers as part of .display_all_functions()
  • Allows to call materializers by name with .execute(). Would allow some users to completely ignore .materialize()
  • Possible to combine "static" materializers at build time and "dynamic" materializers at execution time
  • Catch invalid dataflows earlier than dr.materialize()

Changes

  • Builder now accepts .with_materializers(*materializers)
  • Builder.build() adds materializer nodes after creating the Driver and returns the updated Driver. The logic is derived from Driver.materialize()
  • Builder.copy() copies the materializers

How I tested this

  • Added tests to check materializer nodes (savers and loaders) are properly added
  • test that "static" and "dynamic" materializers can be used together

Notes

  • each time materializers are added post_graph_construct hooks are triggered
  • Should be include materializers in the version hashing of the dataflow? should we treat "static" and "dynamic" differently. IMO, we might want two create two versions: one for "the dataflow transform, ignoring materializers" and one for "all nodes" since they answer different equality / diffing questions
  • there's duplicate logic between Builder.build() and Driver.materialize(). A better approach could be to have Driver.add_materializers() and Driver.execute_materializers(). Both Builder.build() and Driver.materialize() could call Driver.add_materializers()
  • Related to Allow materializer targets to specify inputs #713, users need to be aware that materializers need a valid target or dependencies (type and name) to connect to.

@zilto zilto added the enhancement New feature or request label May 16, 2024
@zilto zilto requested a review from elijahbenizzy May 16, 2024 13:42
@elijahbenizzy
Copy link
Collaborator

OK, I like the API. I think the implementation is overly built towards the current way it works. From what I understand:

  1. This builds the standard driver
  2. This then appends the materializers
  3. This is due to the idea of keeping the graph version consistent (right?)
  4. It ends up calling post_graph_construct twice (once in the construction of the driver, once otherwise)
  5. Has a final graph version with materializers

IMO this should:

  1. Build the standard driver with the materializers
  2. Only call it once
  3. Have only one version

So, how would this work?

Idea:

  1. Take the code out of the driver constructor
  2. Put it in a static function
  3. Make all driver parameters optional (E.G. have an internal one -- _graph)
  4. Have there be two paths -- one if you want to construct the graph outside, and the other if you want to do it inside

Alternatively (and this is cleaner, but we have to be careful with some of the SDK stuff, or just say only use the builder/adapter API):

  1. Create a superclass (DriverBase, that takes in graph/other info)
  2. Move just the constructor (currently compatible) into a subclass Driver(user-facing for backwards compatibility with the non-builder API)
  3. Have the Builder build it using the DriverBase API

Then we can put more logic in the builder. Perhaps the simplest solution might be:

  1. Pass in materializers to the driver, default to empty (See the _ parameters
  2. Build it in the driver constructor

I think this is the easiest, and will save on complexity.

Regarding multiple versions that is, IMO, too complex now. We should add it later?

@skrawcz
Copy link
Collaborator

skrawcz commented May 18, 2024

I'm in favor of seeing if we can make it part of the constructor with this approach. That way post graph construct is only called once.

@DAGWorks-Inc DAGWorks-Inc deleted a comment from sweep-ai bot May 18, 2024
Copy link
Collaborator

@skrawcz skrawcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good. Just need to fix the pre-commit hook.

@skrawcz skrawcz merged commit 2924ed0 into main May 21, 2024
23 checks passed
@skrawcz skrawcz deleted the feat/with-materializers branch May 21, 2024 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants