[speculative idea] enable node redefinitions #65

HamiltonRepoMigrationBot · 2023-02-26T02:06:52Z

Issue by skrawcz
Sunday Jan 29, 2023 at 06:49 GMT
Originally opened as stitchfix/hamilton#290

Is your feature request related to a problem? Please describe.
it is common outside of hamilton for people to "redefine columns" without mutation in value, e.g. in the case of filtering & joining data sets, the values wont change, but the indexes might. This right now forces one to rename things in hamilton if you want
to stitch together a single DAG.

Describe the solution you'd like
Idea:

module crawling order matters. e.g. data_loaders, transforms, model_pipelines, etc. vs model_pipelines, transforms, data_loaders...
enable one to explicitly "redefine" columns via some sort of decorator. If it's not explicit, Hamilton should error.
then any downstream functions that depend on those columns, would depend on the least previously defined version.
this means you can effectively also treat your modules as stages in a pipeline too.

@redefinition
def date(date:pd.Series) -> pd.Series:
     return pd.datetime(date)

@redefinition
@extract_columns("date", "bar")
def some_filter_step(date:pd.Series, bar: pd.Series, filter: str) -> pd.Series:
     df = pd.DataFrame({"date": date, "bar" : bar})
     df = df[data < filter]
     return df

Describe alternatives you've considered
You rename columns in these cases.

Additional context
Not sure how good an idea this is. But it's something to potentially try out and see how it looks.

HamiltonRepoMigrationBot · 2023-02-26T02:06:53Z

Comment by skrawcz
Sunday Jan 29, 2023 at 07:05 GMT

Some of this could also be mitigated by reusing a driver/sub-dag reuse e.g.
stitchfix/hamilton#86 (comment)

HamiltonRepoMigrationBot · 2023-02-26T02:06:54Z

Comment by elijahbenizzy
Sunday Jan 29, 2023 at 20:52 GMT

OK, so this makes implementation/onboarding easier, but doesn't help readability/maintainability of pipeline... However, its nice to not have to draw up a bunch of different functions.

The cool thing about this is this could actually be syntactic sugar for merging nodes together/composing -- doesn't even have to be its own nodes.

Implementation could be tricky -- with the current decorator model (decorator => multiple nodes, no context), we'd have to have some notion of state, otherwise we'd have no idea what to call them. OTOH, we could take a click-like approach, which might make this easier to implement:

What this does is it allows us to attach this function to the end of the date function

def date() -> pd.Series:
    return ['20230101', '20230102', ...']

@date.redefine
def date(date:pd.Series) -> pd.Series:
     return pd.datetime(date)

@date.redefine
@extract_columns("date", "bar")
def some_filter_step(date:pd.Series, bar: pd.Series, filter: str) -> pd.Series:
     df = pd.DataFrame({"date": date, "bar" : bar})
     df = df[data < filter]
     return df

This also clarifies what is being redefined...
The way this would work is:
(1) the date function (all but the last) don't actually get outputted by the module -- they turn into None or get name-mangled in some way
(2) Each date refers to the prior one -- we could use __ if we want them around for later
(3) Each date effectively prepends the DAG created by the upstream node to the overall DAG created by that node.

This gets ugly when we have extract_columns/multiple outputs that get redefined, but I'm happy for that not to be allowed. That said, we could always do:

@some_filter_step.bar.redefine
// or
@some_filter_step.redefine('bar')
def bar(bar: pd.Series) -> pd.Series:
    return bar*2

Which would give us the information we need.

elijahbenizzy · 2023-10-19T05:19:18Z

Closing this (for now) as pipe does this in, what I believe to be, a much cleaner way. It gaurentees immutability, but allows chaining/redefinition. see #398

HamiltonRepoMigrationBot added enhancement New feature or request product idea labels Feb 26, 2023

elijahbenizzy mentioned this issue Feb 26, 2023

[speculative idea] enable node redefinitions stitchfix/hamilton#290

Closed

elijahbenizzy added the migrated-from-old-repo Migrated from old repository label Feb 26, 2023

elijahbenizzy mentioned this issue Oct 17, 2023

@pipe decorator #398

Merged

9 tasks

elijahbenizzy closed this as completed Oct 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[speculative idea] enable node redefinitions #65

[speculative idea] enable node redefinitions #65

HamiltonRepoMigrationBot commented Feb 26, 2023

HamiltonRepoMigrationBot commented Feb 26, 2023

HamiltonRepoMigrationBot commented Feb 26, 2023

elijahbenizzy commented Oct 19, 2023

[speculative idea] enable node redefinitions #65

[speculative idea] enable node redefinitions #65

Comments

HamiltonRepoMigrationBot commented Feb 26, 2023

HamiltonRepoMigrationBot commented Feb 26, 2023

HamiltonRepoMigrationBot commented Feb 26, 2023

elijahbenizzy commented Oct 19, 2023