Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement adaptive remote task request size #10013

Closed

Conversation

yansun7
Copy link
Contributor

@yansun7 yansun7 commented Nov 19, 2021

Currently HttpRemotTask will send all pending splits in one taskUpdateRequest, sometimes causing large request for each task in coordinator.

This change will enable the ability to split large taskUpdateRequest into several smaller batch dynamically to send.

Introduced 4 parameters regarding to this feature:

query.remote-task.enable-adaptive-request-size

  • session: enable_adaptive_remote_task_request_size
  • to control the enable/disable of the feature

query.remote-task.max-request-size

  • session: max_remote_task_request_size
  • the max size of taskUpdateRequest

query.remote-task.request-size-headroom

  • session: remote_task_request_size_headroom
  • the expected mean value of the taskUpdateRequest will be between (max_remote_task_request_size - remote_task_request_size_headroom) and (max_remote_task_request_size)

query.remote-task.guaranteed-splits-per-task

  • session: remote_task_guaranteed_splits_per_request
  • guaranteed splits that will be sent in taskUpdateRequest for each task to prevent deadlock, regardless of max_remote_task_request_size

@cla-bot cla-bot bot added the cla-signed label Nov 19, 2021
@yansun7 yansun7 force-pushed the adaptive-remote-task-request-size branch from a6b33e1 to 99595ec Compare November 19, 2021 20:22
@yansun7 yansun7 changed the title implement adaptive remote task request size Implement adaptive remote task request size Nov 19, 2021
@yansun7 yansun7 marked this pull request as draft November 20, 2021 03:21
@yansun7 yansun7 force-pushed the adaptive-remote-task-request-size branch from 29d32ba to 2612574 Compare November 22, 2021 07:34
@yansun7 yansun7 marked this pull request as ready for review November 22, 2021 14:48
@yansun7 yansun7 force-pushed the adaptive-remote-task-request-size branch from 2612574 to f77f287 Compare November 23, 2021 17:25
@martint martint requested a review from arhimondr December 2, 2021 06:40
@arhimondr
Copy link
Contributor

arhimondr commented Dec 2, 2021

sometimes causing large request for each task in coordinator.

How does this problem manifest itself? Could you please elaborate more on the problem you are trying to solve?

@JunhyungSong
Copy link
Member

@arhimondr
The problematic situation:
Table has 3884 columns.
Table schema has a “parameters” field with the whole table schema string(~ 300k characters)
Size of each split object(in task update request) is big due to large strings in table properties.
If taskUpdateRequest contains a lot of huge splits like above and send it to the worker through http, it causes coordinator crash due to out of memory.
So, this PR tried to limit the size of taskUpdateRequest not only using max_unacknowledged_splits_per_task, but also using max_remote_task_request_size.

@yansun7 can you fix the merge conflicts, please?

@@ -58,6 +58,11 @@
private Duration remoteTaskMaxErrorDuration = new Duration(5, TimeUnit.MINUTES);
private int remoteTaskMaxCallbackThreads = 1000;

private boolean enabledAdaptiveTaskRequestSize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't it 'true'?

sources = sources.stream()
.map(source -> new TaskSource(
source.getPlanNodeId(),
source.getSplits().stream()
.filter(scheduledSplit -> scheduledSplit.getSequenceId() > currentMaxAcknowledgedSplit)
.filter(scheduledSplit -> scheduledSplit.getSequenceId() > maxAcknowledgedSplitByPlanNode.getOrDefault(source.getPlanNodeId(), Long.MIN_VALUE))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There could be lots of splits. In this case, there will be lots of map accesses. It would be better to do like the below,

            .map(source -> {
                long currentMaxAcknowledgedSplit = maxAcknowledgedSplitByPlanNode.getOrDefault(source.getPlanNodeId(), Long.MIN_VALUE);
                return new TaskSource(
                        source.getPlanNodeId(),
                        source.getSplits().stream()
                                .filter(scheduledSplit -> scheduledSplit.getSequenceId() > currentMaxAcknowledgedSplit)
                                .collect(Collectors.toSet()),
                        // Like splits, noMoreSplitsForLifespan could be pruned so that only new items will be processed.
                        // This is not happening here because correctness won't be compromised due to duplicate events for noMoreSplitsForLifespan.
                        source.getNoMoreSplitsForLifespan(),
                        source.isNoMoreSplits());
            })

.max()
.orElse(maxAcknowledgedSplit);
for (TaskSource taskSource : sources) {
long maxAcknowledgedSplit = taskSource.getSplits().stream()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When taskSource.getSplits().size() == 0, it can 'continue;'.

long maxAcknowledgedSplit = taskSource.getSplits().stream()
.mapToLong(ScheduledSplit::getSequenceId)
.max()
.orElse(Long.MIN_VALUE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If taskSource.getSplits().size() == 0 is already checked, .orElse(Long.MIN_VALUE) is not needed. Instead of that, just '.getAsLong()'.

if (!maxAcknowledgedSplitByPlanNode.containsKey(planNodeId)) {
maxAcknowledgedSplitByPlanNode.put(planNodeId, Long.MIN_VALUE);
}
maxAcknowledgedSplitByPlanNode.computeIfPresent(planNodeId, (key, val) -> maxAcknowledgedSplit > val ? maxAcknowledgedSplit : val);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If taskSource.getSplits().size() == 0 is already checked, it should throw when val >= maxAcknowledgedSplit.

e.g.

            maxAcknowledgedSplitByPlanNode.compute(planNodeId, (key, val) -> {
                if (val == null) {
                    return maxAcknowledgedSplit;
                }
                if (val >= maxAcknowledgedSplit) {
                    throw new IllegalStateException(format("%s - splits are out of order? planNodeId=%s, newMax=%d, currentMax=%d", taskId, planNodeId, maxAcknowledgedSplit, val));
                }
                return maxAcknowledgedSplit;
            });

Comment on lines +363 to +365
if (!maxAcknowledgedSplitByPlanNode.containsKey(planNodeId)) {
maxAcknowledgedSplitByPlanNode.put(planNodeId, Long.MIN_VALUE);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If taskSource.getSplits().size() == 0 is already checked, this is not needed.

// abandon current request and reschedule update if size of request body exceeds requestSizeLimit
// and splitBatchSize is updated
if (taskUpdateRequestJson.length > maxRequestSize && splitBatchSize < oldSplitBatchSize) {
log.debug("%s - current taskUpdateRequestJson exceeded limit: %d, abandon.", taskId, taskUpdateRequestJson.length);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log.info?

{
int numSplits = 0;
for (TaskSource taskSource : sources) {
numSplits = Math.max(numSplits, taskSource.getSplits().size());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should find max only for partitioned sources.

{
int numSplits = 0;
for (TaskSource taskSource : sources) {
numSplits = Math.max(numSplits, taskSource.getSplits().size());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With colocation joins with bucketed tables, there could be more than one partitioned sources. In that case, this logic will become awkward. Please apply this logic only when there is one partitioned source.

@bitsondatadev
Copy link
Member

👋 @yansun7 - this PR is inactive and doesn't seem to be under development. If you'd like to continue work on this at any point in the future, feel free to re-open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants