subdag decorator: Variable not recognized #115

gms101 · 2023-03-18T20:08:25Z

When I add a subdag decorator to a function, the function itself is not able to recognise variables that comes prior in the dag(as declared by the main driver).

Stack Traces

https://fdm-bm77109.slack.com/archives/C04RWSYME79/p1679167875228639

Screenshots

Assume this the main driver function, here config_fetching module fetches a variable called backtesting_lookback_period

if __name__ == "__main__":
    adapter = base.SimplePythonGraphAdapter(base.DictResult())
    app_config = AppConfig(".FDm/config.toml")
    strategy = "L1R"
    app_config.set_selected_strategy(strategy)
    dr = driver.Driver({"app_config":app_config}, config_fetching,
                       construct_optimisation_window_data,
                       adapter=adapter)
    result = dr.execute(["optimisation_window_data"])
    print(result)
    # dr.display_all_functions("graph.dot", {})

And the construct_optimisation_window_data module looks like this

@subdag(load_price_data,inputs={})
def optimisation_window_data(end_to_end_price_for_all_ticker: pd.DataFrame,
                             backtesting_lookback_period: int) -> int:
    return backtesting_lookback_period

When I run the driver module, I get this error

ValueError: 1 errors encountered:
  Error: Required input optimisation_window_data.backtesting_lookback_period not provided for nodes: ['optimisation_window_data'].

So, essentially, the decorated function is not able to read backtesting_lookback_period node - which comes before in the DAG compilation.

The text was updated successfully, but these errors were encountered:

elijahbenizzy · 2023-03-18T20:39:05Z

OK, so I think this is something that one should be allowed to do, but I think we want to be careful about how we enable it.

Reasons to do it

intuitively it should work
this will match the notion of "adapters" we're planning (where we inject some inputs based on some set of external nodes)
reduce verbosity

Reasons not to do it

Not super readable as to where nodes come from -- hard to tell if they come from inside or outside the DAG
There could be ambiguity. E.G. if the subdag is also part of the DAG (probably not good practice), its doubly unclear which one its accessing

Possibilities

Allow it by default
Allow it with a flag (search_externally_for_inputs, but with a better name)
Allow it with a set of sources flagged for external_inputs (@subdag(..., external_inputs=["backtesting_lookback_period"])
Allow an inject parameter that can be used as one would inject: @subdag(..., inject={'backtest_lookback_period' : source('backtest_lookback_period')

I'm voting for (3), as it doesn't require much complexity, adds significant readability, and unblocks the user.

skrawcz · 2023-03-18T21:58:57Z

Just to clarify we're talking about how one would interpret this function -- where would the declared function dependencies come from, right? In particular as the error states "backtesting_lookback_period" right?

now:

@subdag(load_price_data,inputs={})
def optimisation_window_data(end_to_end_price_for_all_ticker: pd.DataFrame,
                             backtesting_lookback_period: int) -> int:
    return backtesting_lookback_period

This assumes both declared dependencies come from the subdag.

pass it in as an input

@subdag(load_price_data,inputs={"backtesting_lookback_period": source("backtesting_lookback_period")})
def optimisation_window_data(end_to_end_price_for_all_ticker: pd.DataFrame,
                             backtesting_lookback_period: int) -> int:
    return backtesting_lookback_period

This I think works today, right?

allow by default

Hard to know what comes from where. What if the subdag has one defined and the current one also has backtesting_lookback_period defined?

allow with flag

@subdag(load_price_data,inputs={}, fill_with_external_sources=True)
def optimisation_window_data(end_to_end_price_for_all_ticker: pd.DataFrame,
                             backtesting_lookback_period: int) -> int:
    return backtesting_lookback_period

This is clear -- but not clear which one is satisfied. Would silently change behavior if the underlying DAG suddenly included such a node.

specify what parameter(s) do not come from the subdag

@subdag(load_price_data,inputs={}, external=["backtest_lookback_period"])
def optimisation_window_data(end_to_end_price_for_all_ticker: pd.DataFrame,
                             backtesting_lookback_period: int) -> int:
    return backtesting_lookback_period

This seems pretty clear. It will get as long as the argument list though.

@subdag(load_price_data,inputs={}, inject={"backtest_lookback_period": source("backtest_lookback_period")])
def optimisation_window_data(end_to_end_price_for_all_ticker: pd.DataFrame,
                             backtesting_lookback_period: int) -> int:
    return backtesting_lookback_period

This seems like inputs above. So likely not useful to have separate.

skrawcz · 2023-03-19T05:47:28Z

More ideas:

Use `*` to delineate what's from the sub-DAG and what's from the outer DAG.

@subdag(load_price_data,inputs={})
def optimisation_window_data(
                             end_to_end_price_for_all_ticker: pd.DataFrame, *, 
                             backtesting_lookback_period: int) -> int:
    return backtesting_lookback_period

We don't have this construct now. So this would be new. But at least from a signature it would be clear. It would mean
that you couldn't request two things that are named the same from the sub-DAG and outer-DAG, but that seems fine?

Don't allow "joining" of DAGs via the parameters if @subdag is used.

Make another function that combines/transforms outputs from the sub-DAG.

@subdag(load_price_data,inputs={})
def optimisation_window_data_full(
                             end_to_end_price_for_all_ticker: pd.DataFrame) -> pd.DataFrame:
    return end_to_end_price_for_all_ticker

# second function to do the joining (could also have this function within the subdag too)
def optimisation_window_data(
                             optimisation_window_data_full: pd.DataFrame, backtesting_lookback_period: int) -> int:
    return optimisation_window_data_full[backtesting_lookback_period]  # or whatever the logic should be

This is more explicit. Maybe this is the simpler way to go for now?

elijahbenizzy · 2023-03-19T17:50:39Z

More ideas:

Use * to delineate what's from the sub-DAG and what's from the outer DAG.
@subdag(load_price_data,inputs={})
def optimisation_window_data(
                             end_to_end_price_for_all_ticker: pd.DataFrame, *, 
                             backtesting_lookback_period: int) -> int:
    return backtesting_lookback_period
We don't have this construct now. So this would be new. But at least from a signature it would be clear. It would mean that you couldn't request two things that are named the same from the sub-DAG and outer-DAG, but that seems fine?

Don't allow "joining" of DAGs via the parameters if @subdag is used.

Make another function that combines/transforms outputs from the sub-DAG.
@subdag(load_price_data,inputs={})
def optimisation_window_data_full(
                             end_to_end_price_for_all_ticker: pd.DataFrame) -> pd.DataFrame:
    return end_to_end_price_for_all_ticker

# second function to do the joining (could also have this function within the subdag too)
def optimisation_window_data(
                             optimisation_window_data_full: pd.DataFrame, backtesting_lookback_period: int) -> int:
    return optimisation_window_data_full[backtesting_lookback_period]  # or whatever the logic should be
This is more explicit. Maybe this is the simpler way to go for now?

* for delineation is cute but I think a little too magical. Not allowing joins is the current approach, but as @gms101 points out, there's a subtle internal inconsistency... E.G. the internal nodes can refer to something outside the subDAG, but the final subDAG node can't (which is, in essence, part of the DAG). That said, I think that, given that we're using the parameters to specify the outputs of the subDAG about which we care, its worth being explicit. The other obvious workaround is to refer to that node inside the DAG -- E.G. create the joining node.

I'd almost rather flip the script (although I'm not sure this makes sense now that we've already done it):

@subdag(load_price_data,inputs={}, from_subdag=["end_to_end_price_for_all_ticker"])
def optimisation_window_data(end_to_end_price_for_all_ticker: pd.DataFrame,
                             backtesting_lookback_period: int) -> int:
    return backtesting_lookback_period

I find this clearer to read, as the non-subdag components will have obvious sources, and the subDAG components will still be easily identifiable. It could be either/or (from_subdag/from_external), allowing one or the other? Another reason I like this is that a lot of decorators serve the purpose of injecting inputs into a function -- subDAG does this, @parameterize, and @inject all do this. So maybe this is the consistent way? The vars from within the subDAG are the weird ones (they refer to a namespace...).

See #115

elijahbenizzy · 2023-03-27T22:02:06Z

This is now complete, released in 1.21.0!

gms101 added the triage label for issues that need to be triaged. label Mar 18, 2023

elijahbenizzy added a commit that referenced this issue Mar 23, 2023

Adds ability to specify external inputs in subdag/parameterized_subdag

66406ad

See #115

elijahbenizzy mentioned this issue Mar 23, 2023

Allows layering of subdag and extract_* #120

Merged

7 tasks

elijahbenizzy added a commit that referenced this issue Mar 23, 2023

Adds ability to specify external inputs in subdag/parameterized_subdag

3c5791b

See #115

elijahbenizzy added a commit that referenced this issue Mar 23, 2023

Adds ability to specify external inputs in subdag/parameterized_subdag

29d6150

See #115

elijahbenizzy added a commit that referenced this issue Mar 23, 2023

Adds ability to specify external inputs in subdag/parameterized_subdag

523da36

See #115

elijahbenizzy added a commit that referenced this issue Mar 23, 2023

Adds ability to specify external inputs in subdag/parameterized_subdag

be630a3

See #115

elijahbenizzy closed this as completed Mar 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

subdag decorator: Variable not recognized #115

subdag decorator: Variable not recognized #115

gms101 commented Mar 18, 2023

elijahbenizzy commented Mar 18, 2023

skrawcz commented Mar 18, 2023 •

edited

Loading

skrawcz commented Mar 19, 2023 •

edited

Loading

elijahbenizzy commented Mar 19, 2023

Use `*` to delineate what's from the sub-DAG and what's from the outer DAG.

Don't allow "joining" of DAGs via the parameters if @subdag is used.

elijahbenizzy commented Mar 27, 2023

subdag decorator: Variable not recognized #115

subdag decorator: Variable not recognized #115

Comments

gms101 commented Mar 18, 2023

Stack Traces

Screenshots

elijahbenizzy commented Mar 18, 2023

Reasons to do it

Reasons not to do it

Possibilities

skrawcz commented Mar 18, 2023 • edited Loading

now:

pass it in as an input

allow by default

allow with flag

specify what parameter(s) do not come from the subdag

skrawcz commented Mar 19, 2023 • edited Loading

Use * to delineate what's from the sub-DAG and what's from the outer DAG.

Don't allow "joining" of DAGs via the parameters if @subdag is used.

elijahbenizzy commented Mar 19, 2023

Use * to delineate what's from the sub-DAG and what's from the outer DAG.

Don't allow "joining" of DAGs via the parameters if @subdag is used.

elijahbenizzy commented Mar 27, 2023

skrawcz commented Mar 18, 2023 •

edited

Loading

skrawcz commented Mar 19, 2023 •

edited

Loading

Use `*` to delineate what's from the sub-DAG and what's from the outer DAG.

Use `*` to delineate what's from the sub-DAG and what's from the outer DAG.