-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Driver-level materialization #235
Comments
SpecRequirements
API
We Consider the following DAG: @tag(final_data_product="features")
def foo() -> pd.Series:
pass
@tag(final_data_product="features")
def bar() -> pd.Series:
pass
def baz() -> pd.Series:
pass
def foo_bar_baz() -> pd.DataFrame:
def model() -> Model:
pass Then let's say we want to save this to parquet: dr = driver.Driver({}, modules)
# materialize to Parquet
# can have multiple, each one is a saver
# tbd on the namespace for Parquet
dr.materialize(
materializers.Parquet("foo", "bar", "baz", # nodes in the dataset
path="./out.parquet", # parameter to ParquetDataLoader
join=DataFrameResult(), # only needed if we have multiple and the results_builder of the DAG doesn't apply...
# We can probably kill this
),
materializers.Parquet("foo", "bar", "baz", # nodes in the dataset
path="./out.parquet", # parameter to ParquetDataLoader
# no need for a results builder as its just a dataframe
),
) Or to MLFlow dr = driver.Driver({}, modules)
# materialize to Parquet
# can have multiple, each one is a saver
# tbd on the namespace for Parquet
dr.materialize(
materializers.Parquet("foo", "bar", "baz", # nodes in the dataset
path="./out.parquet", # parameter to ParquetDataLoader
join=DataFrameResult(), # only needed if we have multiple and the results_builder of the DAG doesn't apply...
# We can probably kill this
),
materializers.Parquet("foo", "bar", "baz", # nodes in the dataset
path="./out.parquet", # parameter to ParquetDataLoader
# no need for a results builder as its just a dataframe
),
) Or we want to save to MLFlow: dr.materialize(
materializers.MLFlowRegistry(
"model",
train=source("training_data"), # needed for signature -- we could probably hardcode it
predictions=source("predictions"),
# other parameters
)
) Say we want to save all items in a production dataset: dr.materialize(
materializers.Parquet(tag_query(final_data_product="features"), path="./out.parquet")
),
) The trick here is attaching the materialize to the |
See issue for more detailed notes. Overall design: 1. Add a .materialize(...) function 2. Materializers are dynamically registered with the same mechanism as data savers 3. This manipulates the DAG and calls the materialization node 4. The materialization node can also have a results builder associated with it
See issue for more detailed notes. Overall design: 1. Add a .materialize(...) function 2. Materializers are dynamically registered with the same mechanism as data savers 3. This manipulates the DAG and calls the materialization node 4. The materialization node can also have a results builder associated with it Left todo: 1. Documentation 2. More work on data savers/loaders
See issue for more detailed notes. Overall design: 1. Add a .materialize(...) function 2. Materializers are dynamically registered with the same mechanism as data savers 3. This manipulates the DAG and calls the materialization node 4. The materialization node can also have a results builder associated with it Left todo: 1. Documentation 2. More work on data savers/loaders
See issue for more detailed notes. Overall design: 1. Add a .materialize(...) function 2. Materializers are dynamically registered with the same mechanism as data savers 3. This manipulates the DAG and calls the materialization node 4. The materialization node can also have a results builder associated with it Left todo: 1. Documentation 2. More work on data savers/loaders
See issue for more detailed notes. Overall design: 1. Add a .materialize(...) function 2. Materializers are dynamically registered with the same mechanism as data savers 3. This manipulates the DAG and calls the materialization node 4. The materialization node can also have a results builder associated with it Left todo: 1. Documentation 2. More work on data savers/loaders
See issue for more detailed notes. Overall design: 1. Add a .materialize(...) function 2. Materializers are dynamically registered with the same mechanism as data savers 3. This manipulates the DAG and calls the materialization node 4. The materialization node can also have a results builder associated with it Left todo: 1. Documentation 2. More work on data savers/loaders
See issue for more detailed notes. Overall design: 1. Add a .materialize(...) function 2. Materializers are dynamically registered with the same mechanism as data savers 3. This manipulates the DAG and calls the materialization node 4. The materialization node can also have a results builder associated with it Left todo: 1. Documentation 2. More work on data savers/loaders
See issue for more detailed notes. Overall design: 1. Add a .materialize(...) function 2. Materializers are dynamically registered with the same mechanism as data savers 3. This manipulates the DAG and calls the materialization node 4. The materialization node can also have a results builder associated with it Left todo: 1. Documentation 2. More work on data savers/loaders
See issue for more detailed notes. Overall design: 1. Add a .materialize(...) function 2. Materializers are dynamically registered with the same mechanism as data savers 3. This manipulates the DAG and calls the materialization node 4. The materialization node can also have a results builder associated with it Left todo: 1. Documentation 2. More work on data savers/loaders
See issue for more detailed notes. Overall design: 1. Add a .materialize(...) function 2. Materializers are dynamically registered with the same mechanism as data savers 3. This manipulates the DAG and calls the materialization node 4. The materialization node can also have a results builder associated with it Left todo: 1. Documentation 2. More work on data savers/loaders
See issue for more detailed notes. Overall design: 1. Add a .materialize(...) function 2. Materializers are dynamically registered with the same mechanism as data savers 3. This manipulates the DAG and calls the materialization node 4. The materialization node can also have a results builder associated with it Left todo: 1. Documentation 2. More work on data savers/loaders
See issue for more detailed notes. Overall design: 1. Add a .materialize(...) function 2. Materializers are dynamically registered with the same mechanism as data savers 3. This manipulates the DAG and calls the materialization node 4. The materialization node can also have a results builder associated with it Left todo: 1. Documentation 2. More work on data savers/loaders
See issue for more detailed notes. Overall design: 1. Add a .materialize(...) function 2. Materializers are dynamically registered with the same mechanism as data savers 3. This manipulates the DAG and calls the materialization node 4. The materialization node can also have a results builder associated with it Left todo: 1. Documentation 2. More work on data savers/loaders
See issue for more detailed notes. Overall design: 1. Add a .materialize(...) function 2. Materializers are dynamically registered with the same mechanism as data savers 3. This manipulates the DAG and calls the materialization node 4. The materialization node can also have a results builder associated with it Left todo: 1. Documentation 2. More work on data savers/loaders
See issue for more detailed notes. Overall design: 1. Add a .materialize(...) function 2. Materializers are dynamically registered with the same mechanism as data savers 3. This manipulates the DAG and calls the materialization node 4. The materialization node can also have a results builder associated with it Left todo: 1. Documentation 2. More work on data savers/loaders
See issue for more detailed notes. Overall design: 1. Add a .materialize(...) function 2. Materializers are dynamically registered with the same mechanism as data savers 3. This manipulates the DAG and calls the materialization node 4. The materialization node can also have a results builder associated with it Left todo: 1. Documentation 2. More work on data savers/loaders
See issue for more detailed notes. Overall design: 1. Add a .materialize(...) function 2. Materializers are dynamically registered with the same mechanism as data savers 3. This manipulates the DAG and calls the materialization node 4. The materialization node can also have a results builder associated with it Left todo: 1. Documentation 2. More work on data savers/loaders
See issue for more detailed notes. Overall design: 1. Add a .materialize(...) function 2. Materializers are dynamically registered with the same mechanism as data savers 3. This manipulates the DAG and calls the materialization node 4. The materialization node can also have a results builder associated with it Left todo: 1. Documentation 2. More work on data savers/loaders
See issue for more detailed notes. Overall design: 1. Add a .materialize(...) function 2. Materializers are dynamically registered with the same mechanism as data savers 3. This manipulates the DAG and calls the materialization node 4. The materialization node can also have a results builder associated with it Left todo: 1. Documentation 2. More work on data savers/loaders
See issue for more detailed notes. Overall design: 1. Add a .materialize(...) function 2. Materializers are dynamically registered with the same mechanism as data savers 3. This manipulates the DAG and calls the materialization node 4. The materialization node can also have a results builder associated with it Left todo: 1. Documentation 2. More work on data savers/loaders
See issue for more detailed notes. Overall design: 1. Add a .materialize(...) function 2. Materializers are dynamically registered with the same mechanism as data savers 3. This manipulates the DAG and calls the materialization node 4. The materialization node can also have a results builder associated with it Left todo: 1. Documentation 2. More work on data savers/loaders
See issue for more detailed notes. Overall design: 1. Add a .materialize(...) function 2. Materializers are dynamically registered with the same mechanism as data savers 3. This manipulates the DAG and calls the materialization node 4. The materialization node can also have a results builder associated with it Left todo: 1. Documentation 2. More work on data savers/loaders
See issue for more detailed notes. Overall design: 1. Add a .materialize(...) function 2. Materializers are dynamically registered with the same mechanism as data savers 3. This manipulates the DAG and calls the materialization node 4. The materialization node can also have a results builder associated with it Left todo: 1. Documentation 2. More work on data savers/loaders
See issue for more detailed notes. Overall design: 1. Add a .materialize(...) function 2. Materializers are dynamically registered with the same mechanism as data savers 3. This manipulates the DAG and calls the materialization node 4. The materialization node can also have a results builder associated with it Left todo: 1. Documentation 2. More work on data savers/loaders
See issue for more detailed notes. Overall design: 1. Add a .materialize(...) function 2. Materializers are dynamically registered with the same mechanism as data savers 3. This manipulates the DAG and calls the materialization node 4. The materialization node can also have a results builder associated with it Left todo: 1. Documentation 2. More work on data savers/loaders
See issue for more detailed notes. Overall design: 1. Add a .materialize(...) function 2. Materializers are dynamically registered with the same mechanism as data savers 3. This manipulates the DAG and calls the materialization node 4. The materialization node can also have a results builder associated with it Left todo: 1. Documentation 2. More work on data savers/loaders
See issue for more detailed notes. Overall design: 1. Add a .materialize(...) function 2. Materializers are dynamically registered with the same mechanism as data savers 3. This manipulates the DAG and calls the materialization node 4. The materialization node can also have a results builder associated with it Left todo: 1. Documentation 2. More work on data savers/loaders
See issue for more detailed notes. Overall design: 1. Add a .materialize(...) function 2. Materializers are dynamically registered with the same mechanism as data savers 3. This manipulates the DAG and calls the materialization node 4. The materialization node can also have a results builder associated with it Left todo: 1. Documentation 2. More work on data savers/loaders
See issue for more detailed notes. Overall design: 1. Add a .materialize(...) function 2. Materializers are dynamically registered with the same mechanism as data savers 3. This manipulates the DAG and calls the materialization node 4. The materialization node can also have a results builder associated with it Left todo: 1. Documentation 2. More work on data savers/loaders
See issue for more detailed notes. Overall design: 1. Add a .materialize(...) function 2. Materializers are dynamically registered with the same mechanism as data savers 3. This manipulates the DAG and calls the materialization node 4. The materialization node can also have a results builder associated with it Left todo: 1. Documentation 2. More work on data savers/loaders
See issue for more detailed notes. Overall design: 1. Add a .materialize(...) function 2. Materializers are dynamically registered with the same mechanism as data savers 3. This manipulates the DAG and calls the materialization node 4. The materialization node can also have a results builder associated with it Left todo: 1. Documentation 2. More work on data savers/loaders
See issue for more detailed notes. Overall design: 1. Add a .materialize(...) function 2. Materializers are dynamically registered with the same mechanism as data savers 3. This manipulates the DAG and calls the materialization node 4. The materialization node can also have a results builder associated with it Left todo: 1. Documentation 2. More work on data savers/loaders
See issue for more detailed notes. Overall design: 1. Add a .materialize(...) function 2. Materializers are dynamically registered with the same mechanism as data savers 3. This manipulates the DAG and calls the materialization node 4. The materialization node can also have a results builder associated with it Left todo: 1. Documentation 2. More work on data savers/loaders
This is released, see: #264 for additional improvements |
Is your feature request related to a problem? Please describe.
Data savers (
@save_to
) are cool, but often materialization is more of an ad-hoc operation. This proposes making it easier to dynamically call materialization on a pre-existing DAG.Describe the solution you'd like
See comment below for spec. Basically a
materialize()
function in the driver that modifies the DAG to include a saving node and executes it.Describe alternatives you've considered
Do it:
save_to
(already doable)Additional context
This is something we've been thinking about for a while and
@save_to
was the first piece of this. About time!Will likely only work on the driverV2...
The text was updated successfully, but these errors were encountered: