int_vector is substantially slower than std::vector #43

h-2 · 2018-07-02T11:46:19Z

Even if the entire amount of storage is pre-reserved (no memory allocations) calling push_back on the sdsl vector is substantially slower that doing the same on std::vector (up to 10x slower).

push_back<std::vector, uint8_t, true>      1 ns          1 ns  847837409 alph_size=256 preallocated_mem=1 sizeof=1
push_back<std::vector, uint16_t, true>     1 ns          1 ns  658253559 alph_size=65.536k preallocated_mem=1 sizeof=2
push_back<std::vector, uint32_t, true>     2 ns          2 ns  448232364 alph_size=4.29497G preallocated_mem=1 sizeof=4
push_back<std::vector, uint64_t, true>     2 ns          2 ns  277192593 preallocated_mem=1 sizeof=8
push_back<sdsl_int_vec, uint8_t, true>    10 ns         10 ns   71418369 alph_size=256 preallocated_mem=1 sizeof=1
push_back<sdsl_int_vec, uint16_t, true>    9 ns          9 ns   73339899 alph_size=65.536k preallocated_mem=1 sizeof=2
push_back<sdsl_int_vec, uint32_t, true>   10 ns         10 ns   66514001 alph_size=4.29497G preallocated_mem=1 sizeof=4
push_back<sdsl_int_vec, uint64_t, true>   11 ns         11 ns   57827822 preallocated_mem=1 sizeof=8

(the fourth column is number of iterations performed in fixed amount of time)

I thought that, especially, for the builtin integer types this shouldn't be so noticeable?

The text was updated successfully, but these errors were encountered:

99991 · 2022-04-06T07:53:13Z

The function end() is at fault here:

sdsl-lite/include/sdsl/int_vector.hpp

Line 787 in 9930944

    
           iterator end() noexcept { return int_vector_trait<t_width>::end(this, m_data, (m_size / m_width)); }

m_size / m_width is a relatively slow operation since the compiler is not smart enough to infer that m_width never changes and could be optimized to a right shift.

Here is a benchmark using the slow end function:

#include <iostream>
#include <chrono>
#include <sdsl/bit_vectors.hpp>

#include <vector>

using namespace std;
using namespace sdsl;

void run_benchmark(){
    size_t num_runs = 1000 * 1000;

    int_vector<64> values;

    values.reserve(num_runs);

    chrono::steady_clock::time_point begin = chrono::steady_clock::now();

    for (size_t i = 0; i < num_runs; i++){
        values.amortized_resize(i + 1);

        // Uncomment to make it go fast.
        //values.m_width = 64;
        auto end = int_vector_trait<64>::end(&values, values.m_data, (values.m_size / values.m_width));

        *(end - 1) = i;
    }

    chrono::steady_clock::time_point end = chrono::steady_clock::now();

    double mean = chrono::duration_cast<chrono::nanoseconds> (end - begin).count() / (double)num_runs;

    cout << "On average, it took " << mean << " nanoseconds to push_back one value." << endl;
}

int main(){

    // Run benchmark a few times just to be sure.
    for (size_t i = 0; i < 9; i++){
        cout << "Run " << i + 1 << ": ";
        run_benchmark();
    }

    return 0;
}

And here are the results on my computer:

Run 9: On average, it took 8.11565 nanoseconds to push_back one value.

And after uncommenting the line values.m_width = 64;, the compiler is smart enough to make the optimization:

Run 9: On average, it took 1.71234 nanoseconds to push_back one value.

I do not understand why m_width exists. It seems like it is initialized with t_width at the beginning and then never written to again. It might be a good idea to simply delete m_width and replace all occurrences with t_width instead.

h-2 · 2022-04-06T10:35:25Z

I do not understand why m_width exists. It seems like it is initialized with t_width at the beginning and then never written to again. It might be a good idea to simply delete m_width and replace all occurrences with t_width instead.

I haven't double-checked this, but it seems like an easy fix.

@eseiler what do you think?

h-2 · 2022-04-06T11:05:08Z

I just checked. The width can be dynamic:

sdsl-lite/include/sdsl/int_vector.hpp

Lines 246 to 247 in 9930944

    
            *  \tparam t_width Width of the integer. If set to `0` it is variable 
        
            *          during runtime, otherwise fixed at compile time.

sdsl-lite/include/sdsl/int_vector.hpp

Line 625 in 9930944

    
           void width(uint8_t new_width) noexcept { int_vector_trait<t_width>::set_width(new_width, m_width); }

sdsl-lite/include/sdsl/int_vector.hpp

Lines 126 to 136 in 9930944

    
               static void set_width(uint8_t new_width, int_width_type & width) noexcept 
        
               { 
        
                   if (t_width == 0) 
        
                   { 
        
                       if (0 < new_width and new_width <= 64) 
        
                           width = new_width; 
        
                       else 
        
                           width = 64; 
        
                   } 
        
               } 
        
           };

So removing m_width is not an option.

However, replacing:

sdsl-lite/include/sdsl/int_vector.hpp

Line 787 in 9930944

    
           iterator end() noexcept { return int_vector_trait<t_width>::end(this, m_data, (m_size / m_width)); }

with the following should work:

iterator end() noexcept
{ 
    if constexpr (t_width == 0) // dynamic width
        return int_vector_trait<t_width>::end(this, m_data, (m_size / m_width));
    else
        return int_vector_trait<t_width>::end(this, m_data, (m_size / t_width));
}

eseiler · 2022-04-19T10:32:11Z

Sorry for the late reply, I was sick :(

Yes, we need both, as the bitvector may be dynamic.
I'll try getting around to implement the if constexpr and adding a benchmark (or running one from seqan3).

eseiler · 2022-04-19T11:53:54Z

#102

I managed to go from 9.2ns to 2.8ns.

@99991 Are you able to reproduce your times with my PR?

99991 · 2022-04-19T12:42:48Z

@eseiler I get around 3.5 ns with the linked benchmark and 2 ns after replacing sdsl::int_vector<64> with std::vector<int64_t>.

eseiler · 2022-04-20T10:45:08Z

Thanks for the feedback, @99991 !

I tweaked it a bit more. The benchmark now uses random values.
I got rid of some expensive math functions in amortized_resize, which now also has a version for t_width and one for m_width.

I think I'm going ahead and merge #102 as it already improves the performance quite a lot.
Having said that, I would be very curious whether you also see the improvements I saw with changing amortized_resize.

Another thing to test would be the growth_factor. There's a comment in the benchmark.
Setting it to 2 often gets me close to 2ns. Leaving it at the default (1.5) is never much faster than 2.5ns.

One thing to note is that the benchmark is a microbenchmark, and even with a moderately sized vector (in the benchmark it's 64 million elements, so it should avoid most cache effects), the numbers fluctuate quite a bit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

int_vector is substantially slower than std::vector #43

int_vector is substantially slower than std::vector #43

h-2 commented Jul 2, 2018

99991 commented Apr 6, 2022

h-2 commented Apr 6, 2022

h-2 commented Apr 6, 2022

eseiler commented Apr 19, 2022

eseiler commented Apr 19, 2022

99991 commented Apr 19, 2022

eseiler commented Apr 20, 2022 •

edited

Loading

int_vector is substantially slower than std::vector #43

int_vector is substantially slower than std::vector #43

Comments

h-2 commented Jul 2, 2018

99991 commented Apr 6, 2022

h-2 commented Apr 6, 2022

h-2 commented Apr 6, 2022

eseiler commented Apr 19, 2022

eseiler commented Apr 19, 2022

99991 commented Apr 19, 2022

eseiler commented Apr 20, 2022 • edited Loading

eseiler commented Apr 20, 2022 •

edited

Loading