Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: group by does not conform with postgres #26709

Closed
BramGruneir opened this issue Jun 13, 2018 · 10 comments · Fixed by #41732
Closed

sql: group by does not conform with postgres #26709

BramGruneir opened this issue Jun 13, 2018 · 10 comments · Fixed by #41732
Assignees
Labels
A-sql-optimizer SQL logical planning and optimizations. A-sql-pgcompat Semantic compatibility with PostgreSQL A-tools-hibernate Issues that pertain to Hibernate integration. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. docs-done docs-known-limitation good first issue
Milestone

Comments

@BramGruneir
Copy link
Member

BramGruneir commented Jun 13, 2018

org.hibernate.userguide.hql.HQLTest test_hql_group_by_example_3 fails

create table Phone (
   id int8 not null,
    phone_number varchar(255),
    phone_type varchar(255),
    person_id int8,
    order_id int4,
    primary key (id)
);
create table phone_call (
   id int8 not null,
    duration int4 not null,
    call_timestamp timestamp,
    phone_id int8,
    primary key (id)
);
create table Person (
   id int8 not null,
    address varchar(255),
    createdOn timestamp,
    name varchar(255),
    nickName varchar(255),
    version int4 not null,
    primary key (id)
);
select
    person2_.id as col_0_0_,
    sum(call0_.duration) as col_1_0_,
    person2_.id as id1_2_,
    person2_.address as address2_2_,
    person2_.createdOn as createdO3_2_,
    person2_.name as name4_2_,
    person2_.nickName as nickName5_2_,
    person2_.version as version6_2_ 
from
    phone_call call0_ 
inner join
    Phone phone1_ 
        on call0_.phone_id=phone1_.id 
inner join
    Person person2_ 
        on phone1_.person_id=person2_.id 
group by
    person2_.id;

This should parse and execute, but instead it produces
ERROR: column "address" must appear in the GROUP BY clause or be used in an aggregate function

From the postgres docs:

When GROUP BY is present, or any aggregate functions are present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions or when the ungrouped column is functionally dependent on the grouped columns, since there would otherwise be more than one possible value to return for an ungrouped column. A functional dependency exists if the grouped columns (or a subset thereof) are the primary key of the table containing the ungrouped column.

@BramGruneir BramGruneir added the A-sql-pgcompat Semantic compatibility with PostgreSQL label Jun 13, 2018
@knz knz added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-sql-optimizer SQL logical planning and optimizations. labels Jun 14, 2018
@knz
Copy link
Contributor

knz commented Jun 14, 2018

cc @RaduBerinde @andy-kimball

Andy says:

doing something as simple as:
select x, y FROM (select x, y from t) AS t group by x;
causes Postgres to raise an error

OTOH, the query above uses a join in the FROM clause and yet the syntax is valid.

This means that:

  • during semantic analysis we must propagate the key properties of columns in the FROM relational expression up to the analysis of SELECT target exprs.
  • however, we must not propagate these key properties if a relational sub-expression was using the (...) (subquery) syntax

Probably a good idea to peek at pg to determine how this is detected.

@BramGruneir BramGruneir added the A-tools-hibernate Issues that pertain to Hibernate integration. label Jun 14, 2018
@knz knz added this to the 2.1 milestone Jul 23, 2018
@knz knz modified the milestones: 2.1, 2.2 Aug 7, 2018
@sploiselle
Copy link
Contributor

@knz Can we get a docs description this issue as a Known Limitation for 2.1?

@knz
Copy link
Contributor

knz commented Oct 24, 2018

"""
title: Using columns in SELECT not listed in GROUP BY

Applications developed for PostgreSQL may exploit the fact that PostgreSQL allows a SELECT clause to name a column that is not also listed in GROUP BY in some cases, for example SELECT a GROUP BY b. This is not yet supported by CockroachDB. To work around this limitation, and depending on expected results, the rendered columns should be either added at the end of the GROUP BY list, for example SELECT a GROUP BY b, a, or DISTINCT should also be used, for example SELECT DISTINCT a GROUP BY b.
"""

@RaduBerinde RaduBerinde added this to the 20.1 milestone Sep 16, 2019
@RaduBerinde
Copy link
Member

Another example from @jordanlewis:

root@:26257/activerecord_unittest> SELECT "developers"."id", "developers"."name", "developers"."salary", "developers"."firm_id", "developers"."mentor_id", "developers"."created_at", "developers"."updated_at", "developers"."created_on", "developers"."updated_on" FROM "developers" GROUP BY developers.name, developers.salary, developers.id, developers.created_at, developers.updated_at, developers.created_on, developers.updated_on;
pq: column "firm_id" must appear in the GROUP BY clause or be used in an aggregate function

@RaduBerinde
Copy link
Member

PG docs:

When GROUP BY is present, or any aggregate functions are present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions or when the ungrouped column is functionally dependent on the grouped columns, since there would otherwise be more than one possible value to return for an ungrouped column. A functional dependency exists if the grouped columns (or a subset thereof) are the primary key of the table containing the ungrouped column.

So the column is accepted if the PK of the base table it is coming from is in the grouping list. In our case this would be the ColumnMeta.Table.
https://github.com/postgres/postgres/blob/master/src/backend/parser/parse_agg.c
https://github.com/postgres/postgres/blob/master/src/backend/catalog/pg_constraint.c

An alternative would be to use our internal FDs. These are a superset of this simple rule and we would accept more cases; however it is fragile - changing some rules could in principle change the detected FDs, so it's possible that some queries would work in a release but then stop working in a future release.

@rohany
Copy link
Contributor

rohany commented Oct 7, 2019

A note -- this feature seems to be used heavily by django (alot of the django annotate/group by tools end up creating queries like this).

@RaduBerinde RaduBerinde assigned RaduBerinde and unassigned rytaft Oct 18, 2019
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Oct 18, 2019
Postgres supports selecting columns that are not grouping columns if
the entire PK is part of the grouping columns (see
https://www.postgresql.org/docs/current/sql-select.html#SQL-GROUPBY).
This commit implements this feature, motivated by the desire for
compatibility with various ORMs.

We implement this feature by adding these columns as grouping columns
on-the-fly; in other words, we interpret `SELECT v FROM kv GROUP BY k`
as having `GROUP BY k,v` (which is equivalent). Normalization rules
will subsequently trim down the list of columns.

Fixes cockroachdb#26709.

Release note (sql change): It is now valid for SELECT and HAVING to
refer to ungrouped columns in the special case when the grouped
columns contain the primary key of the table containing the ungrouped
column.
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Oct 22, 2019
Postgres supports selecting columns that are not grouping columns if
the entire PK is part of the grouping columns (see
https://www.postgresql.org/docs/current/sql-select.html#SQL-GROUPBY).
This commit implements this feature, motivated by the desire for
compatibility with various ORMs.

We implement this feature by adding these columns as grouping columns
on-the-fly; in other words, we interpret `SELECT v FROM kv GROUP BY k`
as having `GROUP BY k,v` (which is equivalent). Normalization rules
will subsequently trim down the list of columns.

Fixes cockroachdb#26709.

Release note (sql change): It is now valid for SELECT and HAVING to
refer to ungrouped columns in the special case when the grouped
columns contain the primary key of the table containing the ungrouped
column.
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Oct 23, 2019
Postgres supports selecting columns that are not grouping columns if
the entire PK is part of the grouping columns (see
https://www.postgresql.org/docs/current/sql-select.html#SQL-GROUPBY).
This commit implements this feature, motivated by the desire for
compatibility with various ORMs.

We implement this feature by adding these columns as grouping columns
on-the-fly; in other words, we interpret `SELECT v FROM kv GROUP BY k`
as having `GROUP BY k,v` (which is equivalent). Normalization rules
will subsequently trim down the list of columns.

Fixes cockroachdb#26709.

Release note (sql change): It is now valid for SELECT and HAVING to
refer to ungrouped columns in the special case when the grouped
columns contain the primary key of the table containing the ungrouped
column.
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Nov 23, 2019
Postgres supports selecting columns that are not grouping columns if
the entire PK is part of the grouping columns (see
https://www.postgresql.org/docs/current/sql-select.html#SQL-GROUPBY).
This commit implements this feature, motivated by the desire for
compatibility with various ORMs.

We implement this feature by adding these columns as grouping columns
on-the-fly; in other words, we interpret `SELECT v FROM kv GROUP BY k`
as having `GROUP BY k,v` (which is equivalent). Normalization rules
will subsequently trim down the list of columns.

Fixes cockroachdb#26709.

Release note (sql change): It is now valid for SELECT and HAVING to
refer to ungrouped columns in the special case when the grouped
columns contain the primary key of the table containing the ungrouped
column.
@jseldess
Copy link
Contributor

@RaduBerinde, can we resolve/remove this known limitation from the 20.1 docs? https://www.cockroachlabs.com/docs/dev/known-limitations.html#truncate-does-not-behave-like-delete

@RaduBerinde
Copy link
Member

@jseldess yes, we should remove it.

@jseldess
Copy link
Contributor

Sorry, was linking to the wrong limitation. I meant: https://www.cockroachlabs.com/docs/dev/known-limitations.html#using-columns-in-select-not-listed-in-group-by

I'll remove that one. Please let me know if that's not right, @RaduBerinde.

@RaduBerinde
Copy link
Member

Yes, I figured you were referring to that one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-optimizer SQL logical planning and optimizations. A-sql-pgcompat Semantic compatibility with PostgreSQL A-tools-hibernate Issues that pertain to Hibernate integration. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. docs-done docs-known-limitation good first issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants