Dataflow builder issues #8386
Replies: 4 comments 1 reply
-
One of the prominent issues is that the coordinator often comes to |
Beta Was this translation helpful? Give feedback.
-
Copying this Slack message by @asenac over here since it seems relevant to the dataflow building discussion:
|
Beta Was this translation helpful? Give feedback.
-
I grabbed @mjibson in person on Friday to try and gain a shared understanding of the problem involving the Dataflow builder. The following is a write-up of understanding as of Friday. @mjibson feel free to correct if I've gotten something wrong. Coordinator issues#8318 is the overarching issue for dataflow builder issues that affect the coordinator team. It references two issues, both of which have workarounds that have been merged to main:
#8241 specifically refers to Materialize crashing if User1 adds an index while User2 has a read transaction. I asked @mjibson if we can run into a problem into if User1 deletes an index while User2 has a read transaction. He says that to our best knowledge, we will not run into a problem. But the mechanism preventing problems in this scenario is not in the coordinator. He believes that problems are prevented by reference counting in the dataflow layer or something like that. We should figure out the full answer by the time we redesigning the DataflowBuilder. Design of DataflowBuilderWe talked a bit about the current DataflowBuilder struct has immutable reference fields and how that makes implementing methods that mutably borrow DataflowBuilder (which is all of them) hard. I pointed out though, that the amount of cloning done in the optimizer is several orders of magnitude greater than any cloning we would do in the DataflowBuilder to get around those Rust restrictions. After all, the DataflowBuilder would just clone some IDs and likely simple |
Beta Was this translation helpful? Give feedback.
-
Questions for the optimizer/dataflow endThe following are questions that I think we would need to come to a decision on before we can proceed with a design on the dataflow builder. Each question comes with my opinion of what the answer should be. Question 1: What is the exact set of available indexes that optimizer wants the coordinator to provide for a given query?
I consider it acceptable for an initial version of the refactored DataflowBuilder to provide all indexes that fulfill condition 2 and worry about narrowing the set down to ones that fulfill the other conditions later. Question 2: What do the optimizer and dataflow layers consider the coordinator's responsibilities to be?
Right now, the optimizer does not tell the coordinator which indexes should be imported, but it should. When the optimizer is capable of telling the coordinator which indexes should be imported, I think:
Question 3: How should the coordinator pass the list of available indexes to the optimizer? Thus, the coordinator should skip passing the list of available indexes to the logical optimizer but pass the available indexes to the MIR physical optimizer |
Beta Was this translation helpful? Give feedback.
-
This serves as a place to collect known issues, comments, and discussion items involving the dataflow builder that way we can have a cross coord+dataflow discussion on how to fix the Dataflow builder to deal with all the issues.
Coordinator side issue: #8318
Unnecessary arrangement import issue: #4887
cc: @mjibson, @asenac, @frankmcsherry
Beta Was this translation helpful? Give feedback.
All reactions