Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support concurrent export of query results #6539

Merged
merged 9 commits into from
Sep 7, 2021

Conversation

EmmyMiao87
Copy link
Contributor

Proposed changes

This pr mainly supports

  1. Export query result sets concurrently
  2. Query result set export supports s3 protocol

Among them, there are several preconditions for concurrently exporting query result sets

  1. Enable concurrent export variables
  2. The query itself can be exported concurrently
    (some queries containing sort nodes at the top level cannot be exported concurrently)
  3. Export the s3 protocol used instead of the broker

After exporting the result set concurrently,
the file prefix is changed to outfile_{query_instance_id}_filenumber.{file_format}

Types of changes

What types of changes does your code introduce to Doris?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation Update (if none of the other choices apply)
  • Code refactor (Modify the code structure, format the code, etc...)
  • Optimization. Including functional usability improvements and performance improvements.
  • Dependency. Such as changes related to third-party components.
  • Other.

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

  • I have created an issue on (Fix #ISSUE) and described the bug/feature there in detail
  • Compiling and unit tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • If these changes need document changes, I have updated the document
  • Any dependent changes have been merged

@EmmyMiao87 EmmyMiao87 added area/outfile kind/improvement api-review Categorizes an issue or PR as actively needing an API review. labels Aug 31, 2021
be/src/exec/data_sink.cpp Show resolved Hide resolved
} else {
_storage_type = TStorageBackendType::BROKER;
}
_fragment_instance_id.hi = 12345678987654321;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to be compatible with the old version of Fe and the new version of be

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment to explain.

BTW, I suggest to add FileNamePrefix column in the result of outfile operation. For easy to get full name of exported files. Such as:

+------------+-----------+-----------+-------------+------------------------------------------+
| FileNumber | TotalRows | FileSize  | URL         |FileNamePrefix                            |
+------------+-----------+-----------+-------------+------------------------------------------+
|          1 |    123605 | 361061014 | 127.0.0.1   |my_file2_46e9ef9b66924a79-92f887d43be10325|
|          1 |    128180 | 374334318 | 127.0.0.1   |my_file2_46e9ef9b66924a79-92f887d43be10326|
|          1 |    125156 | 365569023 | 127.0.0.1   |my_file2_46e9ef9b66924a79-92f887d43be10327|
|          1 |    124096 | 362395588 | 127.0.0.1   |my_file2_46e9ef9b66924a79-92f887d43be10328|
|          1 |    124862 | 364727515 | 127.0.0.1   |my_file2_46e9ef9b66924a79-92f887d43be10329|
|          1 |    124520 | 363649600 | 127.0.0.1   |my_file2_46e9ef9b66924a79-92f887d43be10330|
|          1 |    124447 | 363479285 | 127.0.0.1   |my_file2_46e9ef9b66924a79-92f887d43be10331|
|          1 |    125134 | 365490611 | 127.0.0.1   |my_file2_46e9ef9b66924a79-92f887d43be10332|
+------------+-----------+-----------+-------------+------------------------------------------+

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not easy to add, there will be compatibility issues

be/src/runtime/file_result_writer.cpp Outdated Show resolved Hide resolved
be/src/runtime/file_result_writer.h Show resolved Hide resolved
This pr mainly supports
1. Export query result sets concurrently
2. Query result set export supports s3 protocol

Among them, there are several preconditions for concurrently exporting query result sets
1. Enable concurrent export variables
2. The query itself can be exported concurrently
    (some queries containing sort nodes at the top level cannot be exported concurrently)
3. Export the s3 protocol used instead of the broker

After exporting the result set concurrently,
the file prefix is changed to outfile_{query_instance_id}_filenumber.{file_format}
morningman
morningman previously approved these changes Sep 6, 2021
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

github-actions bot commented Sep 6, 2021

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Sep 6, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Sep 6, 2021

PR approved by anyone and no changes requested.

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Sep 6, 2021
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Sep 6, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Sep 6, 2021

PR approved by at least one committer and no changes requested.

@morningman morningman merged commit 9469b2c into apache:master Sep 7, 2021
@morningman morningman mentioned this pull request Oct 10, 2021
morrySnow added a commit to morrySnow/incubator-doris that referenced this pull request Feb 29, 2024
morrySnow added a commit to morrySnow/incubator-doris that referenced this pull request Mar 1, 2024
morrySnow added a commit to morrySnow/incubator-doris that referenced this pull request Mar 1, 2024
morrySnow added a commit that referenced this pull request Mar 4, 2024
yiguolei pushed a commit that referenced this pull request Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-review Categorizes an issue or PR as actively needing an API review. approved Indicates a PR has been approved by one committer. area/outfile kind/improvement reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants