Two level query plan execution #10

alxmrs · 2024-02-13T03:41:10Z

One level, the fallback, would be the prototype in #8. This should always work, but is expensive since it requires compact Xarray datasets to be unraveled.

The other level would be more like xql today. It does as much pre processing on the Dataset with xr operations as possible, then trivially unravels at the end. This implies that the SQL-on-Xarray layer should have clean interface boundaries.

alxmrs · 2024-02-13T05:52:02Z

Some notes on how we could do this:

Control the sql parsing step
https://dask-sql.readthedocs.io/en/latest/how_does_it_work.html
from the sql plan, produce a refined xr.ds. The process of producing this should be a good enough effort while maintaining correctness. It might have a notion of bailing out due to ambiguity.
at last step, apply sql-dask engine on the converted, refined xr.ds.
leave open the possibility of using other sql engines on dfs via fuegue.

alxmrs · 2024-02-17T05:49:00Z

#2 would become one level of execution.

alxmrs · 2024-02-17T10:29:05Z

How will we integrate the distributed execution between the two levels? For example, the Xarray executor level would use xbeam on Dataflow, whereas the Dataframe executor would use Dask on Dataproc. Is there some way we can get both sides execution on the same context? Or, in the distributed case, would we hand off the tasks via IO, like how Cubed breaks up each step by writing to Zarr?

alxmrs · 2024-02-17T10:30:38Z

Hmmm... it looks like Beam supports Pandas-like Dataframes.

https://beam.apache.org/documentation/dsls/dataframes/overview/

alxmrs mentioned this issue Feb 17, 2024

Distributed Execution on Beam #16

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Two level query plan execution #10

Two level query plan execution #10

alxmrs commented Feb 13, 2024 •

edited

Loading

alxmrs commented Feb 13, 2024

alxmrs commented Feb 17, 2024

alxmrs commented Feb 17, 2024

alxmrs commented Feb 17, 2024

Two level query plan execution #10

Two level query plan execution #10

Comments

alxmrs commented Feb 13, 2024 • edited Loading

alxmrs commented Feb 13, 2024

alxmrs commented Feb 17, 2024

alxmrs commented Feb 17, 2024

alxmrs commented Feb 17, 2024

alxmrs commented Feb 13, 2024 •

edited

Loading