Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Operator outputBatchRows may overflow #10868

Closed

Conversation

jinchengchenghh
Copy link
Contributor

@jinchengchenghh jinchengchenghh commented Aug 28, 2024

The computation of function outputBatchRows() may overflow, fix it. And refactor the relevant output batch size config from uint32_t to vector_size_t(int32_t) because the RowVector numRows type is vector_size_t.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 28, 2024
Copy link

netlify bot commented Aug 28, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit fafcd0c
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/66d14ffc3d9777000886eafd

@jinchengchenghh
Copy link
Contributor Author

@mbasmanova Can you help review this PR? Thanks!

Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jinchengchenghh thanks for the change.

uint32_t preferredOutputBatchRows() const {
return get<uint32_t>(kPreferredOutputBatchRows, 1024);
int32_t preferredOutputBatchRows() const {
uint32_t batchRows = get<uint32_t>(kPreferredOutputBatchRows, 1024);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const

uint32_t maxOutputBatchRows() const {
return get<uint32_t>(kMaxOutputBatchRows, 10'000);
int32_t maxOutputBatchRows() const {
uint32_t maxBatchRows = get<uint32_t>(kMaxOutputBatchRows, 10'000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const

if (sortBuffer_->estimateOutputRowSize().has_value() &&
sortBuffer_->estimateOutputRowSize().value() != 0) {
estimatedMaxOutputRows =
uint64_t maxOutputRows =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const

if (UNLIKELY(batchSize > std::numeric_limits<vector_size_t>::max())) {
return std::numeric_limits<vector_size_t>::max();
}
return std::max<vector_size_t>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const uint64_t batchSize = 

return std::max<vector_size_t>(batchSize, 1);

if (!averageRowSize.has_value()) {
return queryConfig.preferredOutputBatchRows();
}

const uint64_t rowSize = averageRowSize.value();

if (rowSize * queryConfig.maxOutputBatchRows() <
uint64_t batchBytes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (queryConfig.preferredOutputBatchBytes() / rowSize > queryConfig.maxOutputBatchRows()) {
   return queryConfig.maxOutputBatchRows();
}
return outputBatchRowsByBytes(queryConfig, rowSize);

return queryConfig.maxOutputBatchRows();
}
return std::max<uint32_t>(
queryConfig.preferredOutputBatchBytes() / rowSize, 1);
return std::max<uint64_t>(batchSize, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return std::max<vector_size_t>(batchSize, 1);

Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jinchengchenghh LGTM % minors. Thanks!

TEST_F(OperatorUtilsTest, outputBatchRows) {
RowTypePtr rowType = ROW({"c0"}, {INTEGER()});
{
setBatchConfig(10, 20, 234);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/setBatchConfig/setTaskOutputBatchConfig/

Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jinchengchenghh LGTM. Thanks!

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jinchengchenghh Thank you for the fix.

@xiaoxmeng Thank you for reviewing.

velox/core/QueryConfig.h Outdated Show resolved Hide resolved
Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jinchengchenghh there is a test failure in CI. Thanks!

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng merged this pull request in 4499332.

Copy link

Conbench analyzed the 1 benchmark run on commit 4499332b.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

Joe-Abraham pushed a commit to Joe-Abraham/velox that referenced this pull request Sep 2, 2024
Summary:
The computation of function outputBatchRows() may overflow, fix it. And refactor the relevant output batch size config from uint32_t to vector_size_t(int32_t) because the RowVector numRows type is vector_size_t.

Pull Request resolved: facebookincubator#10868

Reviewed By: gggrace14

Differential Revision: D62013297

Pulled By: xiaoxmeng

fbshipit-source-id: 087b603967ff3666624e8d4c8b1a23c6130846f9
Joe-Abraham pushed a commit to Joe-Abraham/velox that referenced this pull request Sep 3, 2024
Summary:
The computation of function outputBatchRows() may overflow, fix it. And refactor the relevant output batch size config from uint32_t to vector_size_t(int32_t) because the RowVector numRows type is vector_size_t.

Pull Request resolved: facebookincubator#10868

Reviewed By: gggrace14

Differential Revision: D62013297

Pulled By: xiaoxmeng

fbshipit-source-id: 087b603967ff3666624e8d4c8b1a23c6130846f9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants