Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: derived projections shouldn't be pushed past sorts #32952

Closed
jordanlewis opened this issue Dec 8, 2018 · 3 comments · Fixed by #60469
Closed

opt: derived projections shouldn't be pushed past sorts #32952

jordanlewis opened this issue Dec 8, 2018 · 3 comments · Fixed by #60469
Assignees
Labels
A-sql-optimizer SQL logical planning and optimizations. C-performance Perf of queries or internals. Solution not expected to change functional behavior.

Comments

@jordanlewis
Copy link
Member

Here's the plan for TPCH Q1:

root@:26257/tpch> explain(verbose) select
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (l_discount - 1)) as sum_disc_price, 
sum(l_extendedprice * (l_discount -1) * (1 + l_tax)) as sum_charge, 
avg(l_quantity) as avg_qty,   
avg(l_extendedprice) as avg_price, 
avg(l_discount) as avg_disc
from lineitem                                                                                                                                                                                               group by l_returnflag, l_linestatus                                                                                                                                                                                           order by l_returnflag,  l_linestatus;
            tree           |    field    |                      description                       |                                          columns                                          |          ordering
+--------------------------+-------------+--------------------------------------------------------+-------------------------------------------------------------------------------------------+-----------------------------+
  render                   |             |                                                        | (sum_qty, sum_base_price, sum_disc_price, sum_charge, avg_qty, avg_price, avg_disc)       |
   │                       | render 0    | agg0                                                   |                                                                                           |
   │                       | render 1    | agg1                                                   |                                                                                           |
   │                       | render 2    | agg2                                                   |                                                                                           |
   │                       | render 3    | agg3                                                   |                                                                                           |
   │                       | render 4    | agg4                                                   |                                                                                           |
   │                       | render 5    | agg5                                                   |                                                                                           |
   │                       | render 6    | agg6                                                   |                                                                                           |
   └── group               |             |                                                        | (l_returnflag, l_linestatus, agg0, agg1, agg2, agg3, agg4, agg5, agg6)                    | +l_returnflag,+l_linestatus
        │                  | aggregate 0 | l_returnflag                                           |                                                                                           |
        │                  | aggregate 1 | l_linestatus                                           |                                                                                           |
        │                  | aggregate 2 | sum(l_quantity)                                        |                                                                                           |
        │                  | aggregate 3 | sum(l_extendedprice)                                   |                                                                                           |
        │                  | aggregate 4 | sum(column19)                                          |                                                                                           |
        │                  | aggregate 5 | sum(column21)                                          |                                                                                           |
        │                  | aggregate 6 | avg(l_quantity)                                        |                                                                                           |
        │                  | aggregate 7 | avg(l_extendedprice)                                   |                                                                                           |
        │                  | aggregate 8 | avg(l_discount)                                        |                                                                                           |
        │                  | group by    | @6,@7                                                  |                                                                                           |
        │                  | ordered     | @6,@7                                                  |                                                                                           |
        └── sort           |             |                                                        | (column19, column21, l_quantity, l_extendedprice, l_discount, l_returnflag, l_linestatus) | +l_returnflag,+l_linestatus
             │             | order       | +l_returnflag,+l_linestatus                            |                                                                                           |
             └── render    |             |                                                        | (column19, column21, l_quantity, l_extendedprice, l_discount, l_returnflag, l_linestatus) |
                  │        | render 0    | l_extendedprice * (l_discount - 1.0)                   |                                                                                           |
                  │        | render 1    | (l_extendedprice * (l_discount - 1.0)) * (l_tax + 1.0) |                                                                                           |
                  │        | render 2    | l_quantity                                             |                                                                                           |
                  │        | render 3    | l_extendedprice                                        |                                                                                           |
                  │        | render 4    | l_discount                                             |                                                                                           |
                  │        | render 5    | l_returnflag                                           |                                                                                           |
                  │        | render 6    | l_linestatus                                           |                                                                                           |
                  └── scan |             |                                                        | (l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus)              |
                           | table       | lineitem@primary                                       |                                                                                           |
                           | spans       | ALL                                                    |                                                                                           |
(33 rows)

Time: 2.552ms

As you can see, the optimizer pushes the "derived" render expressions (projections that aren't part of the scan) past the sort. This is inefficient, as it requires that the sort shuffle more memory.

Ideally, the optimizer should not materialize any extra projections below a sort, to avoid this kind of shuffling.

@jordanlewis jordanlewis added the A-sql-optimizer SQL logical planning and optimizations. label Dec 8, 2018
@jordanlewis jordanlewis added the C-performance Perf of queries or internals. Solution not expected to change functional behavior. label Dec 8, 2018
@jordanlewis jordanlewis changed the title opt: expensive renders shouldn't be pushed past sorts opt: derived projections shouldn't be pushed past sorts Dec 8, 2018
@RaduBerinde
Copy link
Member

Sounds like we should be taking the number of total columns into account for sort.

@RaduBerinde RaduBerinde self-assigned this Dec 8, 2018
@RaduBerinde
Copy link
Member

Also, I wondered many times if it would make sense to swap the order in which we evaluate things, specifically to do enforcers last. I believe it might help in practice, even if the right answer is always that if a plan is better than another, the cost should reflect that.

@andy-kimball
Copy link
Contributor

I think the coster just needs a good overhaul. It's still very simplistic, and so we see these kinds of anomalies. I'm planning to look into it more when I work on incorporating latencies into the cost function.

RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Feb 11, 2021
Currently, if we have to sort results and project a new column, there
is no cost difference between the two orders and we happen to prefer
the sort on top. It is preferable to sort before adding new columns to
avoid storing the extra value in memory or on disk.

This change improves the sort costing by adding a cost proportional to
the total number of values.

Fixes cockroachdb#32952.
craig bot pushed a commit that referenced this issue Feb 11, 2021
60154: sql: add crdb_internal.show_create_all_tables builtin r=rafiss a=RichardJCai

sql: add crdb_internal.show_create_all_tables builtin

Making this a PR first before continuing with #53488, we can alias this builtin with SHOW CREATE ALL TABLES.

Example output:
```
query T
SELECT crdb_internal.show_create_all_tables('d')
----
CREATE TABLE public.parent (
  x INT8 NULL,
  y INT8 NULL,
  z INT8 NULL,
  UNIQUE INDEX parent_x_y_z_key (x ASC, y ASC, z ASC),
  UNIQUE INDEX parent_x_key (x ASC),
  FAMILY f1 (x, y, z, rowid)
);
CREATE TABLE public.full_test (
  x INT8 NULL,
  y INT8 NULL,
  z INT8 NULL,
  UNIQUE INDEX full_test_x_key (x ASC),
  FAMILY f1 (x, y, z, rowid)
);
CREATE SEQUENCE public.s MINVALUE 1 MAXVALUE 9223372036854775807 INCREMENT 1 START 1;
CREATE VIEW public.vx ("?column?") AS SELECT 1;
ALTER TABLE public.full_test ADD CONSTRAINT fk_x_ref_parent FOREIGN KEY (x, y, z) REFERENCES public.parent(x, y, z) MATCH FULL ON DELETE CASCADE ON UPDATE CASCADE;
ALTER TABLE public.full_test ADD CONSTRAINT test_fk FOREIGN KEY (x) REFERENCES public.parent(x) ON DELETE CASCADE;
-- Validate foreign key constraints. These can fail if there was unvalidated data during the SHOW CREATE ALL TABLES
ALTER TABLE public.full_test VALIDATE CONSTRAINT fk_x_ref_parent;
ALTER TABLE public.full_test VALIDATE CONSTRAINT test_fk;
```

Release note (sql change): crdb_internal.show_create_all_tables is
a new builtin that takes in a database name (string) and returns
a flat log of all the CREATE TABLE statements in the database followed
by alter statements to add constraints. The output can be used
to recreate a database. This builtin was added to replace old dump logic.

60469: opt: prefer sorting fewer columns r=RaduBerinde a=RaduBerinde

Currently, if we have to sort results and project a new column, there
is no cost difference between the two orders and we happen to prefer
the sort on top. It is preferable to sort before adding new columns to
avoid storing the extra value in memory or on disk.

This change improves the sort costing by adding a cost proportional to
the total number of values.

Fixes #32952.

Co-authored-by: richardjcai <caioftherichard@gmail.com>
Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
@craig craig bot closed this as completed in c140110 Feb 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-optimizer SQL logical planning and optimizations. C-performance Perf of queries or internals. Solution not expected to change functional behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants