Pre-allocate error vector in TRY #9986

mbasmanova · 2024-05-30T23:09:13Z

Summary:
TRY(CAST(...)) is up to 4x slower than TRY_CAST when many rows fail.
The profile reveals that significant percentage of cpu time goes to
EvalCtx::ensureErrorsVectorSize. For every row that fails, we call
EvalCtx::ensureErrorsVectorSize to resize the error vector to accommodate that
row. When many rows fail we end up resizing a lot: resize(1), resize(2), resize
(3),....resize(n). Fix this by pre-allocating error vector in TryExpr.

An earlier attempt at fixing this #9911 caused 2x memory regression in one of the streaming pipelines. The change
was reverted: #9971

The regression was due to TRY starting to allocate 'nulls' buffer in results
unconditionally. Even if there were no errors, TRY would still allocate 'nulls'
buffer. When result is a boolean vector, allocating unnecessary 'nulls' buffer
increases memory usage for 'result' by 2x. This fix makes sure not to do that
and adds a test.

Also, this change creates ErrorVector with only nulls buffer allocated.
The 'values' buffer that requires ~20 bytes per row is allocated only if an
error occurs.

Before:

============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
cast##try_cast_invalid_empty_input                          2.27ms    440.97
cast##tryexpr_cast_invalid_empty_input                      8.96ms    111.56
cast##try_cast_invalid_nan                                  5.49ms    182.26
cast##tryexpr_cast_invalid_nan                             12.96ms     77.17

After:

cast##try_cast_invalid_empty_input                          2.22ms    451.34
cast##tryexpr_cast_invalid_empty_input                      4.52ms    221.06
cast##try_cast_invalid_nan                                  5.79ms    172.69
cast##tryexpr_cast_invalid_nan                              8.16ms    122.48

Differential Revision: D57968341

Summary: TRY(CAST(...)) is up to 4x slower than TRY_CAST when many rows fail. The profile reveals that significant percentage of cpu time goes to EvalCtx::ensureErrorsVectorSize. For every row that fails, we call EvalCtx::ensureErrorsVectorSize to resize the error vector to accommodate that row. When many rows fail we end up resizing a lot: resize(1), resize(2), resize (3),....resize(n). Fix this by pre-allocating error vector in TryExpr. An earlier attempt at fixing this facebookincubator#9911 caused 2x memory regression in one of the streaming pipelines. The change was reverted: facebookincubator#9971 The regression was due to TRY starting to allocate 'nulls' buffer in results unconditionally. Even if there were no errors, TRY would still allocate 'nulls' buffer. When result is a boolean vector, allocating unnecessary 'nulls' buffer increases memory usage for 'result' by 2x. This fix makes sure not to do that and adds a test. Also, this change creates ErrorVector with only nulls buffer allocated. The 'values' buffer that requires ~20 bytes per row is allocated only if an error occurs. Before: ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ cast##try_cast_invalid_empty_input 2.27ms 440.97 cast##tryexpr_cast_invalid_empty_input 8.96ms 111.56 cast##try_cast_invalid_nan 5.49ms 182.26 cast##tryexpr_cast_invalid_nan 12.96ms 77.17 ``` After: ``` cast##try_cast_invalid_empty_input 2.22ms 451.34 cast##tryexpr_cast_invalid_empty_input 4.52ms 221.06 cast##try_cast_invalid_nan 5.79ms 172.69 cast##tryexpr_cast_invalid_nan 8.16ms 122.48 ``` Differential Revision: D57968341

facebook-github-bot · 2024-05-30T23:09:25Z

This pull request was exported from Phabricator. Differential Revision: D57968341

netlify · 2024-05-30T23:09:32Z

✅ Deploy Preview for meta-velox canceled.

Name	Link
🔨 Latest commit	`09f0d4a`
🔍 Latest deploy log	https://app.netlify.com/sites/meta-velox/deploys/6659071ced04cf00084aa986

xiaoxmeng

@mbasmanova thanks for the improvement!

facebook-github-bot · 2024-05-31T03:38:18Z

This pull request has been merged in 76424c0.

conbench-facebook · 2024-05-31T04:29:13Z

Conbench analyzed the 1 benchmark run on commit 76424c0d.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

Summary: Pull Request resolved: facebookincubator#9986 TRY(CAST(...)) is up to 4x slower than TRY_CAST when many rows fail. The profile reveals that significant percentage of cpu time goes to EvalCtx::ensureErrorsVectorSize. For every row that fails, we call EvalCtx::ensureErrorsVectorSize to resize the error vector to accommodate that row. When many rows fail we end up resizing a lot: resize(1), resize(2), resize (3),....resize(n). Fix this by pre-allocating error vector in TryExpr. An earlier attempt at fixing this facebookincubator#9911 caused 2x memory regression in one of the streaming pipelines. The change was reverted: facebookincubator#9971 The regression was due to TRY starting to allocate 'nulls' buffer in results unconditionally. Even if there were no errors, TRY would still allocate 'nulls' buffer. When result is a boolean vector, allocating unnecessary 'nulls' buffer increases memory usage for 'result' by 2x. This fix makes sure not to do that and adds a test. Also, this change creates ErrorVector with only nulls buffer allocated. The 'values' buffer that requires ~20 bytes per row is allocated only if an error occurs. Before: ``` ============================================================================ [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s ============================================================================ cast##try_cast_invalid_empty_input 2.27ms 440.97 cast##tryexpr_cast_invalid_empty_input 8.96ms 111.56 cast##try_cast_invalid_nan 5.49ms 182.26 cast##tryexpr_cast_invalid_nan 12.96ms 77.17 ``` After: ``` cast##try_cast_invalid_empty_input 2.22ms 451.34 cast##tryexpr_cast_invalid_empty_input 4.52ms 221.06 cast##try_cast_invalid_nan 5.79ms 172.69 cast##tryexpr_cast_invalid_nan 8.16ms 122.48 ``` Reviewed By: xiaoxmeng, bikramSingh91 Differential Revision: D57968341 fbshipit-source-id: d9f44aeda56596d9efb035ff9fada5eae22bea1d

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 30, 2024

facebook-github-bot added the fb-exported label May 30, 2024

mbasmanova requested review from pedroerp and bikramSingh91 May 30, 2024 23:10

xiaoxmeng approved these changes May 30, 2024

View reviewed changes

bikramSingh91 approved these changes May 31, 2024

View reviewed changes

facebook-github-bot closed this in 76424c0 May 31, 2024

facebook-github-bot added the Merged label May 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre-allocate error vector in TRY #9986

Pre-allocate error vector in TRY #9986

mbasmanova commented May 30, 2024

facebook-github-bot commented May 30, 2024

netlify bot commented May 30, 2024 •

edited

Loading

xiaoxmeng left a comment

facebook-github-bot commented May 31, 2024

conbench-facebook bot commented May 31, 2024

Pre-allocate error vector in TRY #9986

Pre-allocate error vector in TRY #9986

Conversation

mbasmanova commented May 30, 2024

facebook-github-bot commented May 30, 2024

netlify bot commented May 30, 2024 • edited Loading

✅ Deploy Preview for meta-velox canceled.

xiaoxmeng left a comment

Choose a reason for hiding this comment

facebook-github-bot commented May 31, 2024

conbench-facebook bot commented May 31, 2024

netlify bot commented May 30, 2024 •

edited

Loading