-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel fragment exec instance #851
Conversation
fe/src/main/java/org/apache/doris/planner/SingleNodePlanner.java
Outdated
Show resolved
Hide resolved
olapScanNode.getOlapTable().getName(), olapScanNode.getFragmentId()); | ||
return false; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this works? If the fragment is agg(hash(scan1, scan2)), for scan2, the RootNode of destFragment
is agg
.
I think if it isn't easy to match this case, we can return true for it.
For broadcast join, I think the bigger influence is that the outer child will be parallel, and making it would need more hash table for inner child. So it's ok to scan parallel here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It's doesn't work for agg(hash(scan1, scan2)).
So, should we parallel for all query?
In my test, even if we parallel for small table scan, the performance loss is very little.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For simple, I think we can parallel for all queries.
However one thing we should notice is that when we enable fragment parallel, FE would send more PlanFragmentInstance
than before, which would cost FE more CPU to serialize requests and more network bandwidth to transfer requests. Then as the number of concurrent queries increase, there would be some bottleneck for FE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For simple, I think we can parallel for all queries.
OK.
which would cost FE more CPU to serialize requests and more network bandwidth to transfer requests.
Yes, this is a side-effect for this improve. But which should be acceptable.
* [Feature-WIP](inverted) add inverted index compaction implementation (apache#851) * [Feature-WIP](inverted) add inverted index compaction implementation * [Feature-WIP](inverted) set condition for rowid vec trans * [Fix](inverted) remove useless function * [Fix](inverted) seperate compound directory fs and cfs, for cloud mode and other situations Co-authored-by: airborne12 <airborne08@gmail.com>
For #831