Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Change PlSmallStr impl from Arc<str> to compact_str #18508

Merged
merged 9 commits into from
Sep 2, 2024

Conversation

nameexhaustion
Copy link
Collaborator

@nameexhaustion nameexhaustion commented Sep 2, 2024

  • compact_str for small-string in-lining, and after testing also preferred against Arc<str> / kstring<arc> as the atomic ref-counting performed badly under contention
  • Replace PlSmallStr::const_default() with more ergonomic PlSmallStr::EMPTY

@github-actions github-actions bot added internal An internal refactor or improvement rust Related to Rust Polars labels Sep 2, 2024
#[inline(always)]
pub fn from_static(s: &'static str) -> Self {
Self(Arc::from(s))
pub const fn from_static(s: &'static str) -> Self {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are now const fn for from_static

Cargo.toml Outdated Show resolved Hide resolved
@nameexhaustion nameexhaustion changed the title refactor(rust): Change PlSmallStr impl from Arc<str> to kstring refactor(rust): Change PlSmallStr impl from Arc<str> to compact_str Sep 2, 2024
@nameexhaustion nameexhaustion marked this pull request as ready for review September 2, 2024 09:40
@ritchie46 ritchie46 changed the title refactor(rust): Change PlSmallStr impl from Arc<str> to compact_str perf: Change PlSmallStr impl from Arc<str> to compact_str Sep 2, 2024
@github-actions github-actions bot added performance Performance issues or improvements python Related to Python Polars labels Sep 2, 2024
@ritchie46
Copy link
Member

I marked this as a perf change. 😎

Copy link

codecov bot commented Sep 2, 2024

Codecov Report

Attention: Patch coverage is 83.84798% with 68 lines in your changes missing coverage. Please review.

Project coverage is 79.87%. Comparing base (c03e760) to head (54cd6ea).
Report is 13 commits behind head on main.

Files with missing lines Patch % Lines
crates/polars-python/src/expr/rolling.rs 15.38% 22 Missing ⚠️
...polars-core/src/chunked_array/from_iterator_par.rs 50.00% 5 Missing ⚠️
...es/polars-core/src/chunked_array/comparison/mod.rs 94.20% 4 Missing ⚠️
...tes/polars-core/src/chunked_array/from_iterator.rs 76.47% 4 Missing ⚠️
...rates/polars-plan/src/dsl/function_expr/boolean.rs 0.00% 4 Missing ⚠️
...ates/polars-core/src/chunked_array/iterator/mod.rs 70.00% 3 Missing ⚠️
crates/polars-core/src/frame/row/av_buffer.rs 91.42% 3 Missing ⚠️
crates/polars-core/src/chunked_array/ops/apply.rs 0.00% 2 Missing ⚠️
...s-core/src/frame/group_by/aggregations/dispatch.rs 71.42% 2 Missing ⚠️
crates/polars-core/src/series/any_value.rs 90.47% 2 Missing ⚠️
... and 17 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #18508      +/-   ##
==========================================
+ Coverage   79.86%   79.87%   +0.01%     
==========================================
  Files        1501     1501              
  Lines      202029   201777     -252     
  Branches     2868     2868              
==========================================
- Hits       161342   161174     -168     
+ Misses      40140    40056      -84     
  Partials      547      547              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ritchie46
Copy link
Member

Internal benchmarks:

use std::hint::black_box;
use std::sync::Arc;
use std::time::Duration;

use compact_str::CompactString;

fn bench_multithread_clone<T: Clone + Send + 'static>(n_iters: usize, n_threads: usize, value: T) -> Duration {
    let start = std::time::Instant::now();
    let mut threads = Vec::with_capacity(n_threads);
    for _ in 0..n_threads {
        let value = value.clone();
        threads.push(
            std::thread::spawn(move || {
                for _ in 0..n_iters {
                    black_box(value.clone());
                }
            })
        );
    }
    for t in threads {
        t.join().unwrap();
    }
    start.elapsed()
}

fn main() {
    let n_iters = 10_000_000;
    let n_threads = 10;
    for logsz in 0..11 {
        let sz = 1 << logsz;
        let s = "a".repeat(sz);
        println!("String   {sz}: {:?}", bench_multithread_clone(n_iters, n_threads, s.clone()));
        println!("Arc<str> {sz}: {:?}", bench_multithread_clone(n_iters, n_threads, Arc::from_iter(s.chars())));
        println!("Compact  {sz}: {:?}", bench_multithread_clone(n_iters, n_threads, CompactString::from(&s)));
    }
}

High contention

String   1: 310.708458ms
Arc<str> 1: 10.11404825s
Compact  1: 15.558041ms
String   2: 327.036084ms
Arc<str> 2: 10.694496084s
Compact  2: 16.020833ms
String   4: 323.366625ms
Arc<str> 4: 10.955285917s
Compact  4: 17.379417ms
String   8: 328.070042ms
Arc<str> 8: 10.536749875s
Compact  8: 15.1295ms
String   16: 348.985125ms
Arc<str> 16: 11.969632458s
Compact  16: 16.202709ms
String   32: 254.559125ms
Arc<str> 32: 9.809960167s
Compact  32: 290.558875ms
String   64: 274.858916ms
Arc<str> 64: 9.887851958s
Compact  64: 272.698ms
String   128: 245.200375ms
Arc<str> 128: 10.31820525s
Compact  128: 249.990667ms
String   256: 261.957209ms
Arc<str> 256: 10.237664458s
Compact  256: 279.541459ms
String   512: 728.768542ms
Arc<str> 512: 10.320912292s
Compact  512: 763.411584ms
String   1024: 423.475666ms
Arc<str> 1024: 10.892088625s
Compact  1024: 512.338875ms

Lower contention

for _ in 0..n_iters {
    black_box(value.clone());
    for i in 0..1000 {
        black_box(i*i);
    }
}


String   1: 540.059708ms
Arc<str> 1: 1.389195917s
Compact  1: 415.176167ms
String   2: 525.745792ms
Arc<str> 2: 1.368962625s
Compact  2: 432.936209ms
String   4: 503.196ms
Arc<str> 4: 1.385971s
Compact  4: 493.812583ms
String   8: 497.884583ms
Arc<str> 8: 1.467783334s
Compact  8: 455.944667ms
String   16: 492.412459ms
Arc<str> 16: 1.386792791s
Compact  16: 454.659ms
String   32: 481.999208ms
Arc<str> 32: 1.336161917s
Compact  32: 523.770375ms
String   64: 472.040334ms
Arc<str> 64: 1.38562975s
Compact  64: 460.261959ms
String   128: 489.277875ms
Arc<str> 128: 1.448168959s
Compact  128: 481.207792ms
String   256: 486.657875ms
Arc<str> 256: 1.338470875s
Compact  256: 475.626292ms
String   512: 553.985458ms
Arc<str> 512: 1.365395s
Compact  512: 510.286834ms
String   1024: 481.878875ms
Arc<str> 1024: 1.444221125s
Compact  1024: 521.473291ms

@ritchie46 ritchie46 merged commit 9be94c2 into pola-rs:main Sep 2, 2024
25 checks passed
@c-peters c-peters added the accepted Ready for implementation label Sep 3, 2024
@nameexhaustion nameexhaustion deleted the pl-str-impl branch September 4, 2024 05:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation internal An internal refactor or improvement performance Performance issues or improvements python Related to Python Polars rust Related to Rust Polars
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants