Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: inline data saver & loaders #983

Merged
merged 11 commits into from
Jul 22, 2024
Merged

Conversation

skrawcz
Copy link
Collaborator

@skrawcz skrawcz commented Jun 24, 2024

This change enables one to have inline loaders / savers and to expose metadata for them.

import pandas as pd
from sklearn import datasets

from hamilton.function_modifiers import dataloader, datasaver
from hamilton.io import utils as io_utils


@dataloader()
def raw_data() -> tuple[pd.DataFrame, dict]:
    data = datasets.load_digits()
    df = pd.DataFrame(data.data, columns=[f"feature_{i}" for i in range(data.data.shape[1])])
    metadata = io_utils.get_dataframe_metadata(df)
    return df, metadata


def transformed_data(raw_data: pd.DataFrame) -> pd.DataFrame:
    return raw_data


@datasaver()
def saved_data(transformed_data: pd.DataFrame, filepath: str) -> dict:
    transformed_data.to_csv(filepath)
    metadata = io_utils.get_file_and_dataframe_metadata(filepath, transformed_data)
    return metadata

This results in the following being captured in the tracker / UI
Screen Shot 2024-07-22 at 1 38 45 PM
Screen Shot 2024-07-22 at 1 38 50 PM
Screen Shot 2024-07-22 at 1 38 57 PM
Screen Shot 2024-07-22 at 1 39 09 PM
Screen Shot 2024-07-22 at 1 46 43 PM

Changes

  • adds decorators & docs
  • tests
  • example

How I tested this

  • locally
  • unit tests

Notes

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.

@skrawcz
Copy link
Collaborator Author

skrawcz commented Jun 25, 2024

Okay let's instead go for the following if it's simpler to implement:

from hamilton.function_modifiers import loader, saver
from hamilton.io import utils as io_utils

@loader # injects node to pull out result
def foo() -> tuple[pd.DataFrame, dict]:
   ...
   metadata = io_utils....(file, df)
   return DF, metadata

@saver # all it does is add the right tags
def write_foo(...) -> dict:
   ...
   metadata = io_utils....(file)
   return metadata

@elijahbenizzy
Copy link
Collaborator

Okay let's instead go for the following if it's simpler to implement:

from hamilton.function_modifiers import loader, saver

from hamilton.io import utils as io_utils



@loader # injects node to pull out result

def foo() -> tuple[pd.DataFrame, dict]:

   ...

   metadata = io_utils....(file, df)

   return DF, metadata



@saver # all it does is add the right tags

def write_foo(...) -> dict:

   ...

   metadata = io_utils....(file)

   return metadata

I also find it clearer :)

@skrawcz skrawcz force-pushed the feature/inlinedatasaverloaders branch from 47063c4 to 794c2c4 Compare July 13, 2024 04:28
@skrawcz skrawcz changed the title Example showing inline data saver & loaders Feature: inline data saver & loaders Jul 13, 2024
Copy link
Collaborator

@elijahbenizzy elijahbenizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't look for polish, otherwise a few comments, looks good

hamilton/function_modifiers/__init__.py Outdated Show resolved Hide resolved
hamilton/function_modifiers/macros.py Outdated Show resolved Hide resolved
hamilton/graph.py Show resolved Hide resolved
hamilton/function_modifiers/macros.py Outdated Show resolved Hide resolved
hamilton/function_modifiers/macros.py Outdated Show resolved Hide resolved
hamilton/function_modifiers/macros.py Outdated Show resolved Hide resolved
skrawcz added 3 commits July 22, 2024 09:54
This is a proof of concept.

What needs to be actually done:

1. ideally we expand/wrap the function with the dataloader type appropriately,
to mirror the current process (I think that's what we want).

I made them classes to make it easy to add from_X functions to create
the metadata. Otherwise I don't type the metadata dictionaries --
so maybe we should do that / provide a way to push people to putting
standard things in it.

Otherwise I think this is more ergonomic for most people getting
started.
They enable one to annotate a function as loading or
saving data and then having that metadata available
for capture.

This also removes older code -- hopefully all of it...
@skrawcz skrawcz force-pushed the feature/inlinedatasaverloaders branch from 794c2c4 to a63c330 Compare July 22, 2024 17:11
@skrawcz skrawcz marked this pull request as ready for review July 22, 2024 20:44
@skrawcz skrawcz merged commit e2a3154 into main Jul 22, 2024
24 checks passed
@skrawcz skrawcz deleted the feature/inlinedatasaverloaders branch July 22, 2024 20:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants