[C++] [Parquet] Use std::count in parquet ColumnReader #39398

Hattonuri · 2023-12-30T12:27:33Z

Describe the enhancement requested

I've found that for-loop here

Lines 1055 to 1073 in 7c3480e

    
           void ReadLevels(int64_t batch_size, int16_t* def_levels, int16_t* rep_levels, 
        
                           int64_t* num_def_levels, int64_t* values_to_read) { 
        
             batch_size = 
        
                 std::min(batch_size, this->num_buffered_values_ - this->num_decoded_values_); 
        
             // If the field is required and non-repeated, there are no definition levels 
        
             if (this->max_def_level_ > 0 && def_levels != nullptr) { 
        
               *num_def_levels = this->ReadDefinitionLevels(batch_size, def_levels); 
        
               // TODO(wesm): this tallying of values-to-decode can be performed with better 
        
               // cache-efficiency if fused with the level decoding. 
        
               for (int64_t i = 0; i < *num_def_levels; ++i) { 
        
                 if (def_levels[i] == this->max_def_level_) { 
        
                   ++(*values_to_read); 
        
                 } 
        
               } 
        
             } else { 
        
               // Required field, read all values 
        
               *values_to_read = batch_size; 
        
             }

transforms into

0xc0c2f0 <ReadLevels()+96> inc %rdx
0xc0c2f3 <ReadLevels()+99> cmp %rax,%rdx
0xc0c2f6 <ReadLevels()+102> jge 0xc0c30c <ReadLevels()+124>
0xc0c2f8 <ReadLevels()+104> cmp %cx,(%r14,%rdx,2)
0xc0c2fd <ReadLevels()+109> jne 0xc0c2f0 <ReadLevels()+96>
0xc0c2ff <ReadLevels()+111> incq 0x0(%rbp)
0xc0c303 <ReadLevels()+115> mov (%rbx),%rax
0xc0c306 <ReadLevels()+118> jmp 0xc0c2f0 <ReadLevels()+96>

That means that it uses iteration element by element and changes reference with incq
I think that the reason is that values_to_read and num_def_levels are not set as restrict. So the compiler can not optimize this to a more efficient way(for example using simd)

On my flamegraph this part showed ~10% of time spent

Component(s)

C++, Parquet

…39397) ### Rationale for this change I've found that for-loop here https://github.com/apache/arrow/blob/7c3480e2f028f5881242f227f42155cf833efee7/cpp/src/parquet/column_reader.cc#L1055-L1073 transforms into 0xc0c2f0 <ReadLevels()+96> inc %rdx 0xc0c2f3 <ReadLevels()+99> cmp %rax,%rdx 0xc0c2f6 <ReadLevels()+102> jge 0xc0c30c <ReadLevels()+124> 0xc0c2f8 <ReadLevels()+104> cmp %cx,(%r14,%rdx,2) 0xc0c2fd <ReadLevels()+109> jne 0xc0c2f0 <ReadLevels()+96> 0xc0c2ff <ReadLevels()+111> incq 0x0(%rbp) 0xc0c303 <ReadLevels()+115> mov (%rbx),%rax 0xc0c306 <ReadLevels()+118> jmp 0xc0c2f0 <ReadLevels()+96> That means that it uses iteration element by element and changes reference with incq I think that the reason is that values_to_read and num_def_levels are not set as restrict. So the compiler can not optimize this to a more efficient way(for example using simd) On my flamegraph this part showed ~10% of time spent In this file there also some for loops which could easily be changed to std::count, but they do not touch references and I don't know the reason why std::count was not used in the all cpp/src/parquet/ directory - so I didn't change much ### What changes are included in this PR? Using `std::count` in `parquet/column_reader.cc` to avoid loop not being optimized ### Are these changes tested? They are tested with unittest but not benched because I don't know what bench will show performance rise here( ### Are there any user-facing changes? * Closes: #39398 Authored-by: Dmitry Stasenko <dmitry.stasenko@pinely.com> Signed-off-by: mwish <maplewish117@gmail.com>

…els (apache#39397) ### Rationale for this change I've found that for-loop here https://github.com/apache/arrow/blob/7c3480e2f028f5881242f227f42155cf833efee7/cpp/src/parquet/column_reader.cc#L1055-L1073 transforms into 0xc0c2f0 <ReadLevels()+96> inc %rdx 0xc0c2f3 <ReadLevels()+99> cmp %rax,%rdx 0xc0c2f6 <ReadLevels()+102> jge 0xc0c30c <ReadLevels()+124> 0xc0c2f8 <ReadLevels()+104> cmp %cx,(%r14,%rdx,2) 0xc0c2fd <ReadLevels()+109> jne 0xc0c2f0 <ReadLevels()+96> 0xc0c2ff <ReadLevels()+111> incq 0x0(%rbp) 0xc0c303 <ReadLevels()+115> mov (%rbx),%rax 0xc0c306 <ReadLevels()+118> jmp 0xc0c2f0 <ReadLevels()+96> That means that it uses iteration element by element and changes reference with incq I think that the reason is that values_to_read and num_def_levels are not set as restrict. So the compiler can not optimize this to a more efficient way(for example using simd) On my flamegraph this part showed ~10% of time spent In this file there also some for loops which could easily be changed to std::count, but they do not touch references and I don't know the reason why std::count was not used in the all cpp/src/parquet/ directory - so I didn't change much ### What changes are included in this PR? Using `std::count` in `parquet/column_reader.cc` to avoid loop not being optimized ### Are these changes tested? They are tested with unittest but not benched because I don't know what bench will show performance rise here( ### Are there any user-facing changes? * Closes: apache#39398 Authored-by: Dmitry Stasenko <dmitry.stasenko@pinely.com> Signed-off-by: mwish <maplewish117@gmail.com>

Hattonuri added the Type: enhancement label Dec 30, 2023

github-actions bot added Component: Parquet Component: C++ labels Dec 30, 2023

github-actions bot mentioned this issue Dec 30, 2023

GH-39398: [C++][Parquet] Use std::count in ColumnReader ReadLevels #39397

Merged

github-actions bot assigned Hattonuri Dec 30, 2023

github-actions bot mentioned this issue Jan 6, 2024

GH-39398: [C++][Parquet] DNM: benchmark for readLevels #39486

Closed

mapleFU closed this as completed in #39397 Jan 9, 2024

mapleFU added this to the 16.0.0 milestone Jan 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++] [Parquet] Use std::count in parquet ColumnReader #39398

[C++] [Parquet] Use std::count in parquet ColumnReader #39398

Hattonuri commented Dec 30, 2023

[C++] [Parquet] Use std::count in parquet ColumnReader #39398

[C++] [Parquet] Use std::count in parquet ColumnReader #39398

Comments

Hattonuri commented Dec 30, 2023

Describe the enhancement requested

Component(s)