Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance issue for musl binary built by rust #70108

Closed
gmlove opened this issue Mar 18, 2020 · 3 comments
Closed

performance issue for musl binary built by rust #70108

gmlove opened this issue Mar 18, 2020 · 3 comments

Comments

@gmlove
Copy link

gmlove commented Mar 18, 2020

It seems there is performance issue for the built musl binary. To reproduce this, I created a simple crate as below:

// src/main.rs
use std::collections::HashMap;
use std::ops::Range;
use std::time::Instant;

use rayon::prelude::*;
use uuid;

fn main() {
    let mut map = HashMap::new();

    for i in 0..1000000 {
        map.insert(format!("{}-{}", i, uuid::Uuid::new_v4()), i);
    }

    let start = Instant::now();
    let mut sum = 0;
    for i in 0..1000000 {
        sum += *map.get(&format!("{}-{}", i, uuid::Uuid::new_v4())).unwrap_or(&0);
    }
    println!("single thread sum with map.get: {}ms", start.elapsed().as_millis());

    let start = Instant::now();
    let thread_num = 10usize;
    let range: Vec<usize> = Range { start: 0usize, end: thread_num }.collect();
    let sum: usize = range.into_par_iter().map(|_| {
        let range: Vec<usize> = Range { start: 0, end: 1000000 / thread_num }.collect();
        let sum: usize = range.iter().map(|i| {
            *map.get(&format!("{}-{}", i, uuid::Uuid::new_v4())).unwrap_or(&0)
        }).sum();
        sum
    }).sum();

    println!("multi thread sum with map.get: {}ms", start.elapsed().as_millis());
}
# Cargo.toml
[package]
name = "musl-perf"
version = "0.1.0"
authors = ["gmlove <gracekinglau@gmail.com>"]
edition = "2018"

[dependencies]
uuid = { version = "0.8", features = ["serde", "v4"] }
rayon = "1.1"

If I build the crate for target x86_64-unknown-linux-gnu, the program can get benefit from multi threading.
The output on my machine was(built by cargo build --release on my ubuntu os):

single thread sum with map.get: 251ms
multi thread sum with map.get: 87ms

But for target x86_64-unknown-linux-musl, the performance under multi thread is even worse than single thread.
The output on my machine was (built by cargo build --target x86_64-unknown-linux-musl --release):

single thread sum with map.get: 316ms
multi thread sum with map.get: 1016ms

Not sure if it's musl itself is slow or has compatibility with rust.
The source code above is at: https://github.com/gmlove/experiments/blob/master/rust/musl-perf/src/main.rs

Will be so grateful if someone could have a look at this one.

@Mark-Simulacrum
Copy link
Member

I would recommend directing questions like this to users.rust-lang.org, and asking here (which we try to keep solely to bugs) only if investigation indicates this is a problem with the compiler or standard library. My initial guess it that the libc musl uses is slower (for malloc -- especially multithreaded alloc), on average, than glibc.

@gmlove
Copy link
Author

gmlove commented Mar 19, 2020

Yes, looks like memory allocation is a problem for musl. After I change the program to below:

use std::collections::HashMap;
use std::ops::Range;
use std::time::Instant;

use rayon::prelude::*;
use uuid;

fn main() {
    let mut map = HashMap::new();

    for i in 0..1000000 {
        map.insert(i, i);
    }

    let start = Instant::now();
    let mut sum = 0;
    for i in 0..1000000 {
        sum += *map.get(&i).unwrap_or(&0);
    }
    println!("single thread sum(={}) with map.get: {}ms", sum, start.elapsed().as_millis());

    let start = Instant::now();
    let thread_num = 10usize;
    let range: Vec<usize> = Range { start: 0usize, end: thread_num }.collect();
    let sum: usize = range.into_par_iter().map(|_| {
        let range: Vec<usize> = Range { start: 0, end: 1000000 / thread_num }.collect();
        let sum: usize = range.iter().map(|i| {
            *map.get(&i).unwrap_or(&0)
        }).sum();
        sum
    }).sum();

    println!("multi thread sum(={}) with map.get: {}ms", sum, start.elapsed().as_millis());
}

The result for musl is :

single thread sum(=499999500000) with map.get: 84ms
multi thread sum(=49999500000) with map.get: 20ms

while for gnu is:

single thread sum(=499999500000) with map.get: 99ms
multi thread sum(=49999500000) with map.get: 18ms

Thanks Mark-Simulacrum

@jianshu93
Copy link

jianshu93 commented Dec 17, 2022

Hello All,

I am having similar issue with the musl static linked binary. My program is here:

https://github.com/jianshu93/gsearch

It seems we rely on :https://github.com/jean-pierreBoth/hnswlib-rs

which uses a lot of hashbrown. Performance degrades significantly compare to gnu libc version using all threads available. we rely on on rayon, crossbeam and parking-lot for parallelism and concurrency.

Thanks,

Jianshu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants