Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Reduce string allocations in Expr::display_name (use write instead of format!) #10454

Merged

Conversation

erratic-pattern
Copy link
Contributor

@erratic-pattern erratic-pattern commented May 11, 2024

Just a small refactor that should, in theory, reduce string allocations and thus benefit concurrent throughput by reducing allocator lock contention. However, when running concurrency benchmarks on my M3 Max I only saw a minor improvement in InfluxDB (about 40 additional queries per second with 3 threads, but no change in overall curve as # of threads increases).

I don't have any strong opinion on whether or not this should be merged in, but I might as well submit it as a PR.

@github-actions github-actions bot added the logical-expr Logical plan and expressions label May 11, 2024
@alamb alamb changed the title refactor: use Write instead of format! to implement display_name refactor: use Reduce string allocations in Expr::display_name (use write instead of format! to implement display_name) May 11, 2024
@alamb alamb changed the title refactor: use Reduce string allocations in Expr::display_name (use write instead of format! to implement display_name) refactor: use Reduce string allocations in Expr::display_name (use write instead of format!) May 11, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a great idea to speed up planning by avoiding string allocations in Expr::display_name @erratic-pattern -- thank you 🙏

I left some suggestions for small additional improvements, but I also think this PR could be merged as is.

I am going to run my planing benchmarks too to see if we can measure any difference here

Comment on lines 1657 to 1658
fn write_function_name<'a>(
w: &'a mut (dyn Write + 'a),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than using dynamic dispatch I think you can make this just a normal generic function and make the code simpler and likely more performant

Something like

fn write_function_name<W: Write>(
    w: &mut W,

I actually tried it locally and it seems to work well. Here is a PR erratic-pattern#1 to this branch for your consideration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than using dynamic dispatch I think you can make this just a normal generic function and make the code simpler and likely more performant

Something like

fn write_function_name<W: Write>(
    w: &mut W,

I actually tried it locally and it seems to work well. Here is a PR erratic-pattern#1 to this branch for your consideration

I didn't find a meaningful difference in performance here vs monomorphic types (I tried &mut String explicitly in my testing, which is similar code to the generic here). In fact Formatter in the standard library also has an internal dyn Write, though it has an actual need for ad-hoc polymorphism whereas we don't. I am guessing it gets optimized in most cases, but it certainly wouldn't hurt to make it explicitly generic so I agree with this change.

@@ -1693,10 +1708,9 @@ pub(crate) fn create_name(e: &Expr) -> Result<String> {
if let Some(char) = escape_char {
format!("CHAR '{char}'")
} else {
"".to_string()
"".to_owned()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could avoid this allocation (and the format! above it).

@@ -1717,111 +1732,118 @@ pub(crate) fn create_name(e: &Expr) -> Result<String> {
if let Some(char) = escape_char {
format!("CHAR '{char}'")
} else {
"".to_string()
"".to_owned()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above -- let's get rid of all the extra allocations

@erratic-pattern erratic-pattern changed the title refactor: use Reduce string allocations in Expr::display_name (use write instead of format!) refactor: Reduce string allocations in Expr::display_name (use write instead of format!) May 11, 2024
@github-actions github-actions bot added the optimizer Optimizer rules label May 11, 2024
@alamb
Copy link
Contributor

alamb commented May 11, 2024

Thanks @erratic-pattern -- I took the liberty of merging the branch up from main to resolve a merge conflict as well

@erratic-pattern erratic-pattern force-pushed the adam/reduce-string-allocations-no-cow branch from 63a1656 to 1d8ecd1 Compare May 11, 2024 19:01
@erratic-pattern erratic-pattern requested a review from alamb May 11, 2024 19:02
@alamb
Copy link
Contributor

alamb commented May 11, 2024

Wow -- according to my benchmarks this change makes a non trivial difference in performance. We just keep driving tese numbers down

group                                         main                                   reduce-string-allocations-no-cow
-----                                         ----                                   --------------------------------
logical_aggregate_with_join                   1.01  1214.3±63.26µs        ? ?/sec    1.00  1197.2±15.52µs        ? ?/sec
logical_plan_tpcds_all                        1.01    158.7±1.76ms        ? ?/sec    1.00    157.8±1.40ms        ? ?/sec
logical_plan_tpch_all                         1.00     17.0±0.18ms        ? ?/sec    1.00     17.0±0.22ms        ? ?/sec
logical_select_all_from_1000                  1.05     18.8±0.14ms        ? ?/sec    1.00     18.0±0.11ms        ? ?/sec
logical_select_one_from_700                   1.01    816.2±8.81µs        ? ?/sec    1.00   807.9±10.28µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.00   757.9±23.06µs        ? ?/sec    1.01    761.9±8.19µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.00    749.3±8.82µs        ? ?/sec    1.00    747.1±6.33µs        ? ?/sec
physical_plan_tpcds_all                       1.04   1354.5±6.54ms        ? ?/sec    1.00   1296.4±9.46ms        ? ?/sec
physical_plan_tpch_all                        1.05     92.7±1.15ms        ? ?/sec    1.00     88.5±1.09ms        ? ?/sec
physical_plan_tpch_q1                         1.10      5.2±0.06ms        ? ?/sec    1.00      4.7±0.05ms        ? ?/sec
physical_plan_tpch_q10                        1.07      4.5±0.06ms        ? ?/sec    1.00      4.2±0.09ms        ? ?/sec
physical_plan_tpch_q11                        1.06      3.9±0.06ms        ? ?/sec    1.00      3.7±0.06ms        ? ?/sec
physical_plan_tpch_q12                        1.12      3.1±0.06ms        ? ?/sec    1.00      2.8±0.05ms        ? ?/sec
physical_plan_tpch_q13                        1.07      2.2±0.03ms        ? ?/sec    1.00      2.0±0.02ms        ? ?/sec
physical_plan_tpch_q14                        1.09      2.7±0.04ms        ? ?/sec    1.00      2.5±0.04ms        ? ?/sec
physical_plan_tpch_q16                        1.07      3.8±0.06ms        ? ?/sec    1.00      3.5±0.05ms        ? ?/sec
physical_plan_tpch_q17                        1.05      3.5±0.06ms        ? ?/sec    1.00      3.4±0.05ms        ? ?/sec
physical_plan_tpch_q18                        1.03      4.0±0.06ms        ? ?/sec    1.00      3.9±0.07ms        ? ?/sec
physical_plan_tpch_q19                        1.12      6.4±0.09ms        ? ?/sec    1.00      5.8±0.09ms        ? ?/sec
physical_plan_tpch_q2                         1.03      7.8±0.09ms        ? ?/sec    1.00      7.6±0.08ms        ? ?/sec
physical_plan_tpch_q20                        1.07      4.7±0.08ms        ? ?/sec    1.00      4.4±0.08ms        ? ?/sec
physical_plan_tpch_q21                        1.02      6.2±0.08ms        ? ?/sec    1.00      6.1±0.07ms        ? ?/sec
physical_plan_tpch_q22                        1.07      3.4±0.05ms        ? ?/sec    1.00      3.2±0.05ms        ? ?/sec
physical_plan_tpch_q3                         1.06      3.2±0.06ms        ? ?/sec    1.00      3.0±0.04ms        ? ?/sec
physical_plan_tpch_q4                         1.02      2.3±0.05ms        ? ?/sec    1.00      2.3±0.06ms        ? ?/sec
physical_plan_tpch_q5                         1.01      4.4±0.07ms        ? ?/sec    1.00      4.4±0.06ms        ? ?/sec
physical_plan_tpch_q6                         1.07  1603.4±29.06µs        ? ?/sec    1.00  1494.7±42.11µs        ? ?/sec
physical_plan_tpch_q7                         1.04      5.7±0.08ms        ? ?/sec    1.00      5.5±0.10ms        ? ?/sec
physical_plan_tpch_q8                         1.02      7.3±0.08ms        ? ?/sec    1.00      7.2±0.07ms        ? ?/sec
physical_plan_tpch_q9                         1.03      5.7±0.09ms        ? ?/sec    1.00      5.5±0.08ms        ? ?/sec
physical_select_all_from_1000                 1.04     61.2±0.31ms        ? ?/sec    1.00     59.1±0.34ms        ? ?/sec
physical_select_one_from_700                  1.04      3.7±0.05ms        ? ?/sec    1.00      3.5±0.03ms        ? ?/sec

@alamb
Copy link
Contributor

alamb commented May 12, 2024

🚀

@alamb alamb merged commit 8cc92a9 into apache:main May 12, 2024
23 checks passed
@erratic-pattern
Copy link
Contributor Author

Nice! Those results look a lot better than what I found on my laptop. Very hard to get consistent benchmark results on a personal computer when there's so much process scheduling noise

@alamb
Copy link
Contributor

alamb commented May 13, 2024

Very hard to get consistent benchmark results on a personal computer when there's so much process scheduling noise

Yeah, I have a gcp VM running on which I run the benchmarks

findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
…instead of format!) (apache#10454)

* refactor: use Write instead of format! to implement display_name

* Use static dispatch for write

* remove more allocations

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants