Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(api): support order_by in order-sensitive aggregates (collect/group_concat/first/last) #9729

Merged
merged 29 commits into from
Aug 2, 2024

Conversation

jcrist
Copy link
Member

@jcrist jcrist commented Jul 30, 2024

This adds a new order_by keyword argument to order-sensitive aggregates (collect/group_concat/first/last) to specify the ordering the aggregation should use. By default no ordering is specified, and the result is backend dependent (following the current behavior).

In [1]: import ibis

In [2]: t = ibis.memtable({"x": [1, 3, 2, 4], "y": [10, 9, 8, 7]}, name="test")

In [3]: t.x.collect().execute()  # no ordering specified, backend-dependent result
Out[3]: [1, 3, 2, 4]

In [4]: t.x.collect(order_by="y").execute()  # use the specified ordering, or error if unsupported
Out[4]: [4, 2, 3, 1]

In [5]: ibis.to_sql(t.x.collect(order_by="y"))
Out[5]: 
SELECT
  ARRAY_AGG("t0"."x" ORDER BY "t0"."y" ASC) FILTER(WHERE
    "t0"."x" IS NOT NULL) AS "ArrayCollect(x, (y,))"
FROM "test" AS "t0"

This is generally implemented using either an aggregate-internal ORDER BY clause, or via WITHIN GROUP (...) (backend dependent). The SQL required here can vary significantly between backends unfortunately :/.

Fixes #9170.

@jcrist jcrist added feature Features or general enhancements ci-run-cloud Add this label to trigger a run of BigQuery, Snowflake, and Databricks backends in CI labels Jul 30, 2024
@ibis-docs-bot ibis-docs-bot bot removed the ci-run-cloud Add this label to trigger a run of BigQuery, Snowflake, and Databricks backends in CI label Jul 30, 2024
@jcrist
Copy link
Member Author

jcrist commented Jul 30, 2024

I don't have cloud testing setup locally, so I'm 🤞 that I did everything correctly for bigquery/snowflake.

Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

One concern about ignoring filters with literal inputs.

ibis/backends/sql/compilers/base.py Outdated Show resolved Hide resolved
@cpcloud
Copy link
Member

cpcloud commented Jul 31, 2024

There's a couple things that can be pulled out here, I'll do a couple separate PRs.

  1. RisingWave seems to support UNNEST now in test(risingwave): mark test_union_aliasing as notimpl because arbitrary aggregate function is not supported #9731.
  2. Oracle group_concat was just implemented incorrectly, but is pretty easy to get working in feat(oracle): support group_concat operator #9732.

@cpcloud cpcloud force-pushed the order-by-in-aggregates branch from 79626db to 773d7fe Compare August 2, 2024 11:02
@cpcloud
Copy link
Member

cpcloud commented Aug 2, 2024

I think there's probably an argument to be made for adding order_by to arbitrary but that can be done in a follow up.

@cpcloud cpcloud force-pushed the order-by-in-aggregates branch from b906201 to b9f2b19 Compare August 2, 2024 16:17
@cpcloud
Copy link
Member

cpcloud commented Aug 2, 2024

Sweet, thanks @jcrist! Merging.

@cpcloud cpcloud merged commit a18cb5d into ibis-project:main Aug 2, 2024
82 checks passed
@jcrist jcrist deleted the order-by-in-aggregates branch August 2, 2024 17:59
@cpcloud cpcloud added this to the 9.3 milestone Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat(api): Add order_by parameter to the collect() method
2 participants