Skip to content

Latest commit

 

History

History
759 lines (543 loc) · 25.2 KB

pipe-syntax.md

File metadata and controls

759 lines (543 loc) · 25.2 KB

Pipe query syntax

ZetaSQL supports pipe query syntax, which is a simpler and more concise alternative to standard query syntax. Pipe syntax supports many of the same operators as standard syntax, and improves some areas of SQL query functionality.

Pipe syntax

Pipe syntax has the following key characteristics:

  • Pipe syntax consists of a pipe and an angle bracket |>, an operator name, and any arguments:
    |> operator_name argument_list
  • Pipe operators can be added to the end of any valid query.
  • Pipe operators can be applied in any order, any number of times.
  • Pipe syntax works anywhere standard syntax is supported: in queries, views, table-valued functions (TVFs), and other contexts.
  • Pipe syntax can be mixed with standard syntax in the same query. For example, subqueries can use different syntax from the parent query.
  • A query can start with a FROM clause, and pipe operators can optionally be added after the FROM clause.

Compare the following equivalent queries that count open tickets assigned to a user:

Standard syntax

SELECT component_id, COUNT(*)
FROM ticketing_system_table
WHERE
  assignee_user.email = 'username@email.com'
  AND status IN ('NEW', 'ASSIGNED', 'ACCEPTED')
GROUP BY component_id
ORDER BY component_id DESC;

Pipe syntax

FROM ticketing_system_table
|> WHERE
    assignee_user.email = 'username@email.com'
    AND status IN ('NEW', 'ASSIGNED', 'ACCEPTED')
|> AGGREGATE COUNT(*)
   GROUP AND ORDER BY component_id DESC;

Pipe operator semantics

Pipe operators have the following semantic behavior:

  • Each pipe operator performs a self-contained operation.
  • A pipe operator consumes the input table passed to it through the pipe character and produces a new table as output.
  • A pipe operator can reference only columns from its immediate input table. Columns from earlier in the same query aren't visible. Inside subqueries, correlated references to outer columns are still allowed.

FROM queries

In pipe syntax, a query can start with a standard FROM clause and use any standard FROM syntax, including tables, joins, subqueries, UNNEST operations, and table-valued functions (TVFs). Table aliases can be assigned to each input item using the AS alias clause.

A query with only a FROM clause, like FROM table_name, is allowed in pipe syntax and returns all rows from the table. For tables with columns, FROM table_name in pipe syntax is similar to SELECT * FROM table_name in standard syntax. For value tables, FROM table_name in pipe syntax returns the row values without expanding fields, similar to SELECT value FROM table_name AS value in standard syntax.

Examples

-- Return a table row that matches a condition.
FROM table_name
|> WHERE value_column IS NULL
|> LIMIT 1;
-- Join tables in the FROM clause and then apply pipe operators.
FROM Table1 AS t1 JOIN Table2 AS t2 USING (key)
|> AGGREGATE SUM(t2.value)
   GROUP BY t1.key;

Pipe operators

ZetaSQL supports the following pipe operators. For operators that correspond or relate to similar operations in standard syntax, the operator descriptions highlight similarities and differences and link to more detailed documentation on the corresponding syntax.

SELECT pipe operator

|> SELECT expression [[AS] alias] [, ...]

Description

Produces a new table with the listed columns, similar to the outermost SELECT clause in a table subquery in standard syntax. Supports standard output modifiers like SELECT AS STRUCT, and supports window functions. Doesn't support aggregations or anonymization.

In pipe syntax, the SELECT operator in a query is optional. The SELECT operator can be used near the end of a query to specify the list of output columns. The final query result contains the columns returned from the last pipe operator. If the SELECT operator isn't used to select specific columns, the output includes the full row, similar to what the SELECT * statement in standard syntax produces. For value tables, the result is the row value, without field expansion.

In pipe syntax, the SELECT clause doesn't perform aggregation. Use the AGGREGATE operator instead.

For cases where SELECT would be used in standard syntax to rearrange columns, pipe syntax supports other operators:

Example

|> SELECT account_id AS Account

EXTEND pipe operator

|> EXTEND expression [[AS] alias] [, ...]

Description

Propagates the existing table and adds a computed column, similar to SELECT *, new_column in standard syntax. Supports window functions.

Examples

|> EXTEND status IN ('NEW', 'ASSIGNED', 'ACCEPTED') AS is_open
-- Window function, with OVER
|> EXTEND SUM(val) OVER (ORDER BY k) AS val_over_k

SET pipe operator

|> SET column_name = expression [, ...]

Description

Replaces the value of a column in the current table, similar to SELECT * REPLACE (expression AS column) in standard syntax. Each referenced column must exist exactly once in the input table.

After a SET operation, the referenced top-level columns (like x) are updated, but table aliases (like t) still refer to the original row values. Therefore, t.x will still refer to the original value.

Example

|> SET x = 5, y = CAST(y AS INT32)

DROP pipe operator

|> DROP column_name [, ...]

Description

Removes listed columns from the current table, similar to SELECT * EXCEPT (column) in standard syntax. Each referenced column must exist at least once in the input table.

After a DROP operation, the referenced top-level columns (like x) are removed, but table aliases (like t) still refer to the original row values. Therefore, t.x will still refer to the original value.

The DROP operator doesn't correspond to the DROP statement in data definition language (DDL), which deletes persistent schema objects.

Example

|> DROP account_id, user_id

RENAME pipe operator

|> RENAME old_column_name [AS] new_column_name [, ...]

Description

Renames specified columns. Each column to be renamed must exist exactly once in the input table. The RENAME operator can't rename value table fields, pseudo-columns, range variables, or objects that aren't columns in the input table.

After a RENAME operation, the referenced top-level columns (like x) are renamed, but table aliases (like t) still refer to the original row values. Therefore, t.x will still refer to the original value.

Example

|> RENAME last_name AS surname

AS pipe operator

|> AS alias

Description

Introduces a table alias for the input table, similar to applying the AS alias clause on a table subquery in standard syntax. Any existing table aliases are removed and the new alias becomes the table alias for all columns in the row.

The AS operator can be useful after operators like SELECT, EXTEND, or AGGREGATE that add columns but can't give table aliases to them.

Example

|> SELECT x, y, z
|> AS table_alias
|> WHERE table_alias.y = 10

WHERE pipe operator

|> WHERE boolean_expression

Description

Filters the results of the input table. The WHERE operator behaves the same as the WHERE clause in standard syntax.

In pipe syntax, the WHERE operator also replaces the HAVING clause and QUALIFY clause in standard syntax. For example, after performing aggregation with the AGGREGATE operator, use the WHERE operator instead of the HAVING clause. For window functions inside a QUALIFY clause, use window functions inside a WHERE clause instead.

Example

|> WHERE assignee_user.email = 'username@email.com'

LIMIT pipe operator

|> LIMIT count [OFFSET skip_rows]

Description

Limits the number of rows to return in a query, with an optional OFFSET clause to skip over rows. The LIMIT operator behaves the same as the LIMIT and OFFSET clause in standard syntax.

Examples

|> LIMIT 10
|> LIMIT 10 OFFSET 2

AGGREGATE pipe operator

-- Full-table aggregation
|> AGGREGATE aggregate_expression [[AS] alias] [, ...]
-- Aggregation with grouping
|> AGGREGATE [aggregate_expression [[AS] alias] [, ...]]
   GROUP BY groupable_items [[AS] alias] [, ...]
-- Aggregation with grouping and shorthand ordering syntax
|> AGGREGATE [aggregate_expression [order_suffix] [[AS] alias] [, ...]]
   GROUP [AND ORDER] BY groupable_item [order_suffix] [[AS] alias] [, ...]

order_suffix: {ASC | DESC} [{NULLS FIRST | NULLS LAST}]

Description

Performs aggregation on data across grouped rows or an entire table. The AGGREGATE operator is similar to a query in standard syntax that contains a GROUP BY clause or a SELECT list with aggregate functions or both. In pipe syntax, the GROUP BY clause is part of the AGGREGATE operator. Pipe syntax doesn't support a standalone GROUP BY operator.

Without the GROUP BY clause, the AGGREGATE operator performs full-table aggregation and produces one output row.

With the GROUP BY clause, the AGGREGATE operator performs aggregation with grouping, producing one row for each set of distinct values for the grouping expressions.

The AGGREGATE expression list corresponds to the aggregated expressions in a SELECT list in standard syntax. Each expression in the AGGREGATE list must include an aggregate function. Aggregate expressions can also include scalar expressions (for example, sqrt(SUM(x*x))). Column aliases can be assigned using the AS operator. Window functions aren't allowed, but the EXTEND operator can be used before the AGGREGATE operator to compute window functions.

The GROUP BY clause in the AGGREGATE operator corresponds to the GROUP BY clause in standard syntax. Unlike in standard syntax, aliases can be assigned to GROUP BY items. Standard grouping operators like GROUPING SETS, ROLLUP, and CUBE are supported.

The output columns from the AGGREGATE operator include all grouping columns first, followed by all aggregate columns, using their assigned aliases as the column names.

Unlike in standard syntax, grouping expressions aren't repeated across SELECT and GROUP BY clauses. In pipe syntax, the grouping expressions are listed once, in the GROUP BY clause, and are automatically included as output columns for the AGGREGATE operator.

Because output columns are fully specified by the AGGREGATE operator, the SELECT operator isn't needed after the AGGREGATE operator unless you want to produce a list of columns different from the default.

Examples

-- Full-table aggregation
|> AGGREGATE COUNT(*) AS row_count, SUM(num_users) AS total_users
-- Aggregation with grouping
|> AGGREGATE COUNT(*) AS row_count, SUM(num_users) AS total_users,
   GROUP BY org_site, date

The following examples compare aggregation in standard syntax and in pipe syntax:

-- Aggregation in standard syntax
SELECT id, EXTRACT(MONTH FROM date) AS month, SUM(value) AS total
FROM table
GROUP BY id, month
-- The same aggregation in pipe syntax
FROM table
|> AGGREGATE SUM(value) AS total
   GROUP BY id, EXTRACT(MONTH FROM date) AS month

Shorthand ordering syntax with AGGREGATE

The AGGREGATE operator supports a shorthand ordering syntax, which is equivalent to applying the ORDER BY operator as part of the AGGREGATE operator without repeating the column list:

-- Aggregation with grouping and shorthand ordering syntax
|> AGGREGATE [aggregate_expression [order_suffix] [[AS] alias] [, ...]]
   GROUP [AND ORDER] BY groupable_item [order_suffix] [[AS] alias] [, ...]

order_suffix: {ASC | DESC} [{NULLS FIRST | NULLS LAST}]

The GROUP AND ORDER BY clause is equivalent to an ORDER BY clause on all groupable_items. By default, each groupable_item is sorted in ascending order with NULL values first. Other ordering suffixes like DESC or NULLS LAST can be used for other orders.

Without the GROUP AND ORDER BY clause, the ASC or DESC suffixes can be added on individual columns in the GROUP BY list or AGGREGATE list or both. The NULLS FIRST and NULLS LAST suffixes can be used to further modify NULL sorting.

Adding these suffixes is equivalent to adding an ORDER BY clause that includes all of the suffixed columns with the suffixed grouping columns first, matching the left-to-right output column order.

Examples

-- Order by all grouping columns.
|> AGGREGATE COUNT(*)
   GROUP AND ORDER BY first_name, last_name DESC

The ordering in the previous example is equivalent to using |> ORDER BY first_name, last_name DESC.

-- Order by specified grouping and aggregate columns.
|> AGGREGATE COUNT(*) DESC
   GROUP BY first_name, last_name ASC

The ordering in the previous example is equivalent to using |> ORDER BY last_name ASC, COUNT(*) DESC.

ORDER BY pipe operator

|> ORDER BY expression [sort_options] [, ...]

Description

Sorts results by a list of expressions. The ORDER BY operator behaves the same as the ORDER BY clause in standard syntax. Suffixes like ASC, DESC, and NULLS LAST are supported for customizing the ordering for each expression.

In pipe syntax, the AGGREGATE operator also supports shorthand ordering suffixes to apply ORDER BY behavior more concisely as part of aggregation.

Example

|> ORDER BY last_name DESC

JOIN pipe operator

|> [join_type] JOIN from_item [[AS] alias] [{on_clause | using_clause}]

Description

Joins rows from the input table with rows from a second table provided as an argument. The JOIN operator behaves the same as the JOIN operation in standard syntax. The input table is the left side of the join and the JOIN argument is the right side of the join. Standard join inputs are supported, including tables, subqueries, UNNEST operations, and table-valued function (TVF) calls. Standard join modifiers like LEFT, INNER, and CROSS are allowed before the JOIN keyword.

An alias can be assigned to the input table on the right side of the join, but not to the input table on the left side of the join. If an alias on the input table is needed, perhaps to disambiguate columns in an ON expression, then an alias can be added using the AS operator before the JOIN arguments.

Example

|> JOIN ticketing_system_table AS components
     ON bug_table.component_id = CAST(components.component_id AS int64)

CALL pipe operator

|> CALL table_function (argument [, ...]) [[AS] alias]

Description

Calls a table-valued function (TVF), similar to table function calls in standard syntax.

TVFs in standard syntax can be called in the FROM clause or in a JOIN operation. These are both allowed in pipe syntax as well.

In pipe syntax, TVFs that take a table argument can also be called with the CALL operator. The first table argument comes from the input table and must be omitted in the arguments. An optional table alias can be added for the output table.

Multiple TVFs can be called sequentially without using nested subqueries.

Examples

|> CALL AddSuffix('*')
|> CALL AddSuffix2(arg1, arg2, arg3)

The following examples compare a TVF call in standard syntax and in pipe syntax:

-- Call a TVF in standard syntax.
FROM tvf( (SELECT * FROM table), arg1, arg2 )
-- Call the same TVF in pipe syntax.
SELECT * FROM table
|> CALL tvf(arg1, arg2)

WINDOW pipe operator

|> WINDOW window_expression [[AS] alias] [, ...]

Description

Adds a column with the result of computing the function over some window of existing rows, similar to calling window functions in a SELECT list in standard syntax. Existing rows and columns are unchanged. The window expression must include a window function with an OVER clause.

The EXTEND operator is recommended for window functions instead of the WINDOW operator because it also supports window expressions and covers the same use cases.

Example

|> WINDOW SUM(val) OVER (ORDER BY k)

TABLESAMPLE pipe operator

|> TABLESAMPLE sample_method (sample_size {PERCENT | ROWS}) [, ...]

Description

Selects a random sample of rows from the input table. The TABLESAMPLE pipe operator behaves the same as TABLESAMPLE operator in standard syntax.

Example

|> TABLESAMPLE BERNOULLI (0.1 PERCENT)

PIVOT pipe operator

|> PIVOT (aggregate_expression FOR input_column IN (pivot_column [, ...])) [[AS] alias]

Description

Rotates rows into columns. The PIVOT pipe operator behaves the same as the PIVOT operator in standard syntax.

Example

|> SELECT year, username, num_users
|> PIVOT (SUM(num_users) FOR username IN ('Jeff', 'Jeffrey', 'Jeffery'))

UNPIVOT pipe operator

|> UNPIVOT (values_column FOR name_column IN (column_to_unpivot [, ...])) [[AS] alias]

Description

Rotates columns into rows. The UNPIVOT pipe operator behaves the same as the UNPIVOT operator in standard syntax.

Example

|> UNPIVOT (count FOR user_location IN (London, Bangalore, Madrid))
|> ORDER BY year, cnt

ASSERT pipe operator

|> ASSERT expression [, payload_expression [, ...]]

Description

Evaluates an expression over all rows of an input table to verify that the expression is true or raise an assertion error if it's false.

The expression must evaluate to a boolean value. When the expression evaluates to TRUE, the input table passes through the ASSERT operator unchanged. When the expression evaluates to FALSE or NULL, the query fails with an Assertion failed error.

One or more optional payload expressions can be provided. If the assertion fails, the payload expression values are computed, converted to strings, and included in the error message, separated by spaces.

If no payload is provided, the error message includes the SQL text of the assertion expression.

The ASSERT operator has no equivalent operation in standard syntax. The ASSERT statement is a related feature that verifies that a single expression is true.

Example

FROM table
|> ASSERT count != 0, "Count is zero for user", userId
|> SELECT total / count AS average