Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "Ensure PartitionedOutputOperator is run with fixed local distribution" #15358

Merged
merged 1 commit into from
Dec 10, 2022

Conversation

arhimondr
Copy link
Contributor

Description

This reverts commit 1c06202.

Additional local exchange increases CPU time.

Regression noticed during performance verification cycle by @lukasz-stec

Reproduced by running a following query on a TPCH x1000 dataset on a 4 nodes cluster (m5.8xlarge on AWS):

SELECT
  sum(length(l_comment) + length(s_comment))
FROM lineitem, supplier
WHERE l_suppkey = s_suppkey;

The additional local exchange causes a CPU time increase from ~2.09h to ~2.19h (a 5% increase).

The regression is caused by an increase of context switches and CPU migrations:

Before:

  2,091,016.93 msec task-clock                #   27.119 CPUs utilized          
         2,221,498      context-switches          #    0.001 M/sec                  
           658,519      cpu-migrations            #    0.315 K/sec                  
               920      page-faults               #    0.000 K/sec                  
 5,386,582,982,247      cycles                    #    2.576 GHz                    
 4,446,312,739,476      instructions              #    0.83  insn per cycle         
   680,314,613,026      branches                  #  325.351 M/sec                  
    22,854,687,505      branch-misses             #    3.36% of all branches        
 1,078,160,950,603      L1-dcache-loads           #  515.616 M/sec                  
     5,255,101,736      L1-dcache-load-misses     #    0.49% of all L1-dcache hits  
   <not supported>      LLC-loads                                                   
   <not supported>      LLC-load-misses   

After:

 2,390,431.42 msec task-clock                #   27.578 CPUs utilized          
         3,026,056      context-switches          #    0.001 M/sec                  
           940,979      cpu-migrations            #    0.394 K/sec                  
             4,322      page-faults               #    0.002 K/sec                  
 6,185,926,599,130      cycles                    #    2.588 GHz                    
 4,777,419,838,837      instructions              #    0.77  insn per cycle         
   733,708,556,808      branches                  #  306.936 M/sec                  
    24,900,478,611      branch-misses             #    3.39% of all branches        
 1,160,521,729,359      L1-dcache-loads           #  485.486 M/sec                  
     4,050,568,791      L1-dcache-load-misses     #    0.35% of all L1-dcache hits

What results in an observable regression in the number of instruction per cycle (0.83 insn per cycle vs 0.77 insn per cycle).

Additional context and related issues

#13834

Release notes

(X) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@arhimondr
Copy link
Contributor Author

CC: @linzebing @losipiuk

@arhimondr arhimondr merged commit dd27aa7 into trinodb:master Dec 10, 2022
@arhimondr arhimondr deleted the revert-fixed-distribution branch December 10, 2022 13:06
@github-actions github-actions bot added this to the 404 milestone Dec 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants