-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing remote ENRICH by pushing the Enrich inside FragmentExec #114665
Changes from all commits
ca54471
b21abd4
2e99cb7
8d00120
57cf748
70b4041
79dc2dc
828ad80
e43e6c6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
pr: 114665 | ||
summary: Fixing remote ENRICH by pushing the Enrich inside `FragmentExec` | ||
area: ES|QL | ||
type: bug | ||
issues: | ||
- 105095 |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -52,8 +52,10 @@ | |
import org.elasticsearch.xpack.esql.plan.physical.RowExec; | ||
import org.elasticsearch.xpack.esql.plan.physical.ShowExec; | ||
import org.elasticsearch.xpack.esql.plan.physical.TopNExec; | ||
import org.elasticsearch.xpack.esql.plan.physical.UnaryExec; | ||
|
||
import java.util.List; | ||
import java.util.concurrent.atomic.AtomicBoolean; | ||
|
||
/** | ||
* <p>This class is part of the planner</p> | ||
|
@@ -104,6 +106,46 @@ public PhysicalPlan map(LogicalPlan p) { | |
// | ||
// Unary Plan | ||
// | ||
if (localMode == false && p instanceof Enrich enrich && enrich.mode() == Enrich.Mode.REMOTE) { | ||
// When we have remote enrich, we want to put it under FragmentExec, so it would be executed remotely. | ||
// We're only going to do it on the coordinator node. | ||
// The way we're going to do it is as follows: | ||
// 1. Locate FragmentExec in the tree. If we have no FragmentExec, we won't do anything. | ||
// 2. Put this Enrich under it, removing everything that was below it previously. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Everything that is under the Enrich will still be under it when it's inserted into the Fragment, as a logical plan (which will later be converted to physical when processing the fragment as I understand). |
||
// 3. Above FragmentExec, we should deal with pipeline breakers, since pipeline ops already are supposed to go under | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The pipeline breaker influence the exchange data transfer, if you add another node it will break the data-node / coordinator contract. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure what you mean here, could you explain? |
||
// FragmentExec. | ||
// 4. Aggregates can't appear here since the plan should have errored out if we have aggregate inside remote Enrich. | ||
// 5. So we should be keeping: LimitExec, ExchangeExec, OrderExec, TopNExec (actually OrderExec probably can't happen anyway). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds like you want Enrich to be a pipeline breaker itself. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Making Enrich pipeline breaker won't help us too much, by itself. The problem is that due to how mapping works right now, it has (had before the patch) no ability to place Enrich inside the fragment (and thus execute it remotely) if any pipeline breakers are present (and LIMIT at least is always present unless assimilated into TopN). |
||
|
||
var child = map(enrich.child()); | ||
Comment on lines
+109
to
+120
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is similar to the code below under UnaryPlan, var child = map().. // line 150. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, this part so far is, but the later part diverges. |
||
AtomicBoolean hasFragment = new AtomicBoolean(false); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use Holder hasFragment = new Holder<>(false); |
||
|
||
var childTransformed = child.transformUp((f) -> { | ||
// Once we reached FragmentExec, we stuff our Enrich under it | ||
if (f instanceof FragmentExec) { | ||
hasFragment.set(true); | ||
return new FragmentExec(p); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure why this works! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It basically does the same thing it does later to non-pipeline-breakers but with some complications. |
||
} | ||
if (f instanceof EnrichExec enrichExec) { | ||
// It can only be ANY because COORDINATOR would have errored out earlier, and REMOTE should be under FragmentExec | ||
assert enrichExec.mode() == Enrich.Mode.ANY : "enrich must be in ANY mode here"; | ||
return enrichExec.child(); | ||
} | ||
if (f instanceof UnaryExec unaryExec) { | ||
if (f instanceof LimitExec || f instanceof ExchangeExec || f instanceof OrderExec || f instanceof TopNExec) { | ||
return f; | ||
} else { | ||
return unaryExec.child(); | ||
} | ||
Comment on lines
+134
to
+139
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm, not sure what this is suppose to do - check if it's a pipeline breaker otherwise skip it? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, FilterExec will be removed from this part - the filter will be inside Enrich plan under FragmentExec, which when converted to a physical plan would produce the FilterExec which will be executed. |
||
} | ||
// Currently, it's either UnaryExec or LeafExec. Leaf will either resolve to FragmentExec or we'll ignore it. | ||
return f; | ||
}); | ||
|
||
if (hasFragment.get()) { | ||
return childTransformed; | ||
} | ||
} | ||
Comment on lines
+120
to
+148
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This section of code needs to be integrated with the one below (line 112-115, about Enrich and coordinator mode). |
||
|
||
if (p instanceof UnaryPlan ua) { | ||
var child = map(ua.child()); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remote typically implies remote cluster. I think you mean data node (or in ESQL terminology local as in local planning).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, here it is meant that remote enrich would go to the remote cluster. It may also do it effectively on the local cluster if the enrich policy & indexes are there, but the important part I think is that it'd also go to the remote.