-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Operator outputBatchRows may overflow #10868
Fix Operator outputBatchRows may overflow #10868
Conversation
✅ Deploy Preview for meta-velox canceled.
|
@mbasmanova Can you help review this PR? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jinchengchenghh thanks for the change.
velox/core/QueryConfig.h
Outdated
uint32_t preferredOutputBatchRows() const { | ||
return get<uint32_t>(kPreferredOutputBatchRows, 1024); | ||
int32_t preferredOutputBatchRows() const { | ||
uint32_t batchRows = get<uint32_t>(kPreferredOutputBatchRows, 1024); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const
velox/core/QueryConfig.h
Outdated
uint32_t maxOutputBatchRows() const { | ||
return get<uint32_t>(kMaxOutputBatchRows, 10'000); | ||
int32_t maxOutputBatchRows() const { | ||
uint32_t maxBatchRows = get<uint32_t>(kMaxOutputBatchRows, 10'000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const
velox/dwio/common/SortingWriter.cpp
Outdated
if (sortBuffer_->estimateOutputRowSize().has_value() && | ||
sortBuffer_->estimateOutputRowSize().value() != 0) { | ||
estimatedMaxOutputRows = | ||
uint64_t maxOutputRows = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const
velox/exec/Operator.cpp
Outdated
if (UNLIKELY(batchSize > std::numeric_limits<vector_size_t>::max())) { | ||
return std::numeric_limits<vector_size_t>::max(); | ||
} | ||
return std::max<vector_size_t>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const uint64_t batchSize =
return std::max<vector_size_t>(batchSize, 1);
velox/exec/Operator.cpp
Outdated
if (!averageRowSize.has_value()) { | ||
return queryConfig.preferredOutputBatchRows(); | ||
} | ||
|
||
const uint64_t rowSize = averageRowSize.value(); | ||
|
||
if (rowSize * queryConfig.maxOutputBatchRows() < | ||
uint64_t batchBytes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (queryConfig.preferredOutputBatchBytes() / rowSize > queryConfig.maxOutputBatchRows()) {
return queryConfig.maxOutputBatchRows();
}
return outputBatchRowsByBytes(queryConfig, rowSize);
velox/exec/Operator.cpp
Outdated
return queryConfig.maxOutputBatchRows(); | ||
} | ||
return std::max<uint32_t>( | ||
queryConfig.preferredOutputBatchBytes() / rowSize, 1); | ||
return std::max<uint64_t>(batchSize, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return std::max<vector_size_t>(batchSize, 1);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jinchengchenghh LGTM % minors. Thanks!
TEST_F(OperatorUtilsTest, outputBatchRows) { | ||
RowTypePtr rowType = ROW({"c0"}, {INTEGER()}); | ||
{ | ||
setBatchConfig(10, 20, 234); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/setBatchConfig/setTaskOutputBatchConfig/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jinchengchenghh LGTM. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jinchengchenghh Thank you for the fix.
@xiaoxmeng Thank you for reviewing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jinchengchenghh there is a test failure in CI. Thanks!
aca6bc5
to
fafcd0c
Compare
@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@xiaoxmeng merged this pull request in 4499332. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
Summary: The computation of function outputBatchRows() may overflow, fix it. And refactor the relevant output batch size config from uint32_t to vector_size_t(int32_t) because the RowVector numRows type is vector_size_t. Pull Request resolved: facebookincubator#10868 Reviewed By: gggrace14 Differential Revision: D62013297 Pulled By: xiaoxmeng fbshipit-source-id: 087b603967ff3666624e8d4c8b1a23c6130846f9
Summary: The computation of function outputBatchRows() may overflow, fix it. And refactor the relevant output batch size config from uint32_t to vector_size_t(int32_t) because the RowVector numRows type is vector_size_t. Pull Request resolved: facebookincubator#10868 Reviewed By: gggrace14 Differential Revision: D62013297 Pulled By: xiaoxmeng fbshipit-source-id: 087b603967ff3666624e8d4c8b1a23c6130846f9
The computation of function outputBatchRows() may overflow, fix it. And refactor the relevant output batch size config from uint32_t to vector_size_t(int32_t) because the RowVector numRows type is vector_size_t.