Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: sql unnest and cleanup unnest datasource #13736

Merged
merged 48 commits into from
Apr 4, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
d2e244e
docs: sql unnest and cleanup datasource unnest
317brian Feb 1, 2023
87adfd7
words
317brian Feb 1, 2023
d6b735e
caps
317brian Feb 1, 2023
58eb0df
fix
317brian Feb 1, 2023
d4ba9d8
address comments
317brian Feb 6, 2023
e96ef27
address comments
317brian Feb 6, 2023
adf1a71
address comments from vtlim
317brian Feb 8, 2023
6b274a6
address comments from somu
317brian Feb 8, 2023
0f4c3cc
address comments from Victoria
317brian Feb 22, 2023
4f60c8e
fix filename
317brian Feb 22, 2023
fafafd4
Merge branch 'master' into unnest-sql
317brian Mar 2, 2023
3439ba2
update spelling file
317brian Mar 2, 2023
b494f2e
Fix a link problem (#13876)
writer-jill Mar 2, 2023
ddc9971
docs: fix html nits (#13835)
317brian Mar 2, 2023
ed21780
Add Post Aggregators for Tuple Sketches (#13819)
anshu-makkar Mar 3, 2023
351dcfc
do not run non sql compatible tests for all jdk flavours (#13875)
abhishekagarwal87 Mar 3, 2023
c1b20e6
Bump CycloneDX module to address POM errors (#13878)
imply-elliott Mar 3, 2023
da7e72b
Allow druid-kubernetes-overlord-extensions to be loaded in any druid …
Mar 3, 2023
9d64618
use getProperty in MSQDurableStorageModule (#13881)
Mar 4, 2023
cf3456e
Python Druid API for use in notebooks (#13787)
paul-rogers Mar 5, 2023
5a200cd
Fix durable storage cleanup (#13853)
rohangarg Mar 6, 2023
05a27f1
Adding forbidden api for Properties#get() and Properties#getOrDefault…
cryptoe Mar 6, 2023
7fdce23
Suggested memory calculation in case NOT_ENOUGH_MEMORY_FAULT is throw…
cryptoe Mar 6, 2023
d7652cf
Stream Kubernetes Job Logs (#13869)
Mar 6, 2023
81506fc
Web console: Compaction history dialog (#13861)
vogievetsky Mar 6, 2023
de2c820
Add warning comments to Granularity.getIterable. (#13888)
gianm Mar 7, 2023
deab298
fix SQL in segment card (#13895)
vogievetsky Mar 7, 2023
385f383
use Calcites.getColumnTypeForRelDataType for SQL CAST operator conver…
clintropolis Mar 7, 2023
04a1eac
Use base task dir in kubernetes task runner (#13880)
Mar 7, 2023
8895cf8
Add validation for aggregations on __time (#13793)
adarshsanjeev Mar 8, 2023
43b77fc
Improved error message when topic name changes within same supervisor…
abhishekagarwal87 Mar 8, 2023
34fc9b0
fix ci (#13901)
clintropolis Mar 8, 2023
5f8590f
Fix start-druid for indexers. (#13891)
gianm Mar 8, 2023
f219371
Sort-merge join and hash shuffles for MSQ. (#13506)
gianm Mar 8, 2023
27a56c5
fix KafkaInputFormat when used with Sampler API (#13900)
clintropolis Mar 9, 2023
a0d4204
Fix for OOM in the Tombstone generating logic in MSQ (#13893)
LakshSingla Mar 9, 2023
1921fc4
Avoid creating new RelDataTypeFactory during SQL planning. (#13904)
gianm Mar 9, 2023
5c96758
use native nvl expression for SQL NVL and 2 argument COALESCE (#13897)
clintropolis Mar 9, 2023
b337537
Merge branch 'master' into unnest-sql
317brian Mar 10, 2023
c8b3f3a
merge master
317brian Mar 10, 2023
011047d
updates to unnest for latest changes
317brian Mar 14, 2023
3282888
remve mention of allowlist
317brian Mar 14, 2023
87fc458
add name to virtualcolumn block for unnest datasource
317brian Mar 14, 2023
62ad960
add show/hide
317brian Mar 14, 2023
e3e8753
fix missing anchor link
317brian Mar 29, 2023
aea3e9a
Merge branch 'master' into unnest-sql
317brian Mar 29, 2023
4d5c24b
Apply suggestions from code review
317brian Mar 30, 2023
f9beef2
Apply suggestions from code review
317brian Mar 31, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions docs/querying/datasource.md
Original file line number Diff line number Diff line change
Expand Up @@ -397,7 +397,7 @@ When you use the `unnest` datasource, the unnested column looks like this:
When unnesting data, keep the following in mind:

- The total number of rows will grow to accommodate the new rows that the unnested data occupy.
- You can unnest the values in more than one column in a single `unnest` datasource. This can lead to a very large number of new rows depending on your dataset. You can see an example of this in the [unnest tutorial](../tutorials/tutorial-unnest-datasource.md#unnest-multiple-columns).
- You can unnest the values in more than one column in a single `unnest` datasource, but this can lead to a very large number of new rows depending on your dataset.

The `unnest` datasource uses the following syntax:

Expand All @@ -410,6 +410,7 @@ The `unnest` datasource uses the following syntax:
},
"virtualColumn": {
"type": "expression",
"name": "output_column",
"expression": "\"column_reference\""
},
"outputName": "unnested_target_column"
Expand All @@ -421,4 +422,4 @@ The `unnest` datasource uses the following syntax:
* `dataSource.base.type`: The type of datasource you want to unnest, such as a table.
* `dataSource.virtualColumn`: [Virtual column](virtual-columns.md) that references the nested values. The output name of this column is reused as the name of the column that contains unnested values. You can replace the source column with the unnested column by specifying the source column's name or a new column by specifying a different name. Outputting it to a new column can help you verify that you get the results that you expect but isn't required.

To learn more about how to use the `unnest` datasource, see the [unnest tutorial](../tutorials/tutorial-unnest-datasource.md).
To learn more about how to use the `unnest` datasource, see the [unnest tutorial](../tutorials/tutorial-unnest-arrays.md).
7 changes: 7 additions & 0 deletions docs/querying/sql-functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -1399,6 +1399,13 @@ Truncates a numerical expression to a specific number of decimal digits.

Parses `expr` into a `COMPLEX<json>` object. This operator deserializes JSON values when processing them, translating stringified JSON into a nested structure. If the input is not a `VARCHAR` or it is invalid JSON, this function will result in a `NULL` value.

## UNNEST
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be constant and call it Unnest or Unnesting at all places ? multi-value-dimensions.md calls this Unnesting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on Clint's comment, I've removed it from the mvd page.


`UNNEST(source_expression) as table_alias_name(column_alias_name)`

Unnests a source expression that includes arrays into a target column with an aliased name.

For more information, see [UNNEST](./sql.md#unnest).

## UPPER

Expand Down
38 changes: 38 additions & 0 deletions docs/querying/sql.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ Druid SQL supports SELECT queries with the following structure:
[ WITH tableName [ ( column1, column2, ... ) ] AS ( query ) ]
SELECT [ ALL | DISTINCT ] { * | exprs }
FROM { <table> | (<subquery>) | <o1> [ INNER | LEFT ] JOIN <o2> ON condition }
[, UNNEST(source_expression) as table_alias_name(column_alias_name) ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the right one, we should do the previous thing the same name <table_alias_name>

[ WHERE expr ]
[ GROUP BY [ exprs | GROUPING SETS ( (exprs), ... ) | ROLLUP (exprs) | CUBE (exprs) ] ]
[ HAVING expr ]
Expand Down Expand Up @@ -82,6 +83,43 @@ FROM clause, metadata tables are not considered datasources. They exist only in
For more information about table, lookup, query, and join datasources, refer to the [Datasources](datasource.md)
documentation.

## UNNEST

> The UNNEST SQL function is [experimental](../development/experimental.md). Its API and behavior are subject
> to change in future releases. It is not recommended to use this feature in production at this time.

The UNNEST clause unnests array values. It's the SQL equivalent to the [unnest datasource](./datasource.md#unnest). The source for UNNEST can be an array or an input that's been transformed into an array, such as with helper functions like MV_TO_ARRAY or ARRAY.

The following is the general syntax for UNNEST, specifically a query that returns the column that gets unnested:

```sql
SELECT column_alias_name FROM datasource, UNNEST(source_expression1) AS table_alias_name1(column_alias_name1), UNNEST(source_expression2) AS table_alias_name2(column_alias_name2), ...
```

* The `datasource` for UNNEST can be any Druid datasource, such as the following:
* A table, such as `FROM a_table`.
* A subset of a table based on a query, a filter, or a JOIN. For example, `FROM (SELECT columnA,columnB,columnC from a_table)`.
* The `source_expression` for the UNNEST function must be an array and can come from any expression. If the dimension you are unnesting is a multi-value dimension, you have to specify `MV_TO_ARRAY(dimension)` to convert it to an implicit ARRAY type. You can also specify any expression that has an SQL array datatype. For example, you can call UNNEST on the following:
* `ARRAY[dim1,dim2]` if you want to make an array out of two dimensions.
* `ARRAY_CONCAT(dim1,dim2)` if you want to concatenate two multi-value dimensions.
* The `AS table_alias_name(column_alias_name)` clause is not required but is highly recommended. Use it to specify the output, which can be an existing column or a new one. Replace `table_alias_name` and `column_alias_name` with a table and column name you want to alias the unnested results to. If you don't provide this, Druid uses a nondescriptive name, such as `EXPR$0`.

Keep the following things in mind when writing your query:

- You must include the context parameter `"enableUnnest": true`.
- You can unnest multiple source expressions in a single query.
- Notice the comma between the datasource and the UNNEST function. This is needed in most cases of the UNNEST function. Specifically, it is not needed when you're unnesting an inline array since the array itself is the datasource.
- If you view the native explanation of a SQL UNNEST, you'll notice that Druid uses `j0.unnest` as a virtual column to perform the unnest. An underscore is added for each unnest, so you may notice virtual columns named `_j0.unnest` or `__j0.unnest`.
- UNNEST preserves the ordering of the source array that is being unnested.

For examples, see the [Unnest arrays tutorial](../tutorials/tutorial-unnest-arrays.md).

The UNNEST function has the following limitations:

- The function does not remove any duplicates or nulls in an array. Nulls will be treated as any other value in an array. If there are multiple nulls within the array, a record corresponding to each of the nulls gets created.
- Arrays inside complex JSON types are not supported.
- You cannot perform an UNNEST at ingestion time, including SQL-based ingestion using the MSQ task engine.

## WHERE

The WHERE clause refers to columns in the FROM table, and will be translated to [native filters](filters.md). The
Expand Down
Loading