Implement `SHOW FUNCTIONS` #13799

goldmedal · 2024-12-16T17:49:46Z

Which issue does this PR close?

Rationale for this change

This PR implements SHOW FUNCTIONS to list the available functions during runtime. Instead of listing the function name only (like Apache Spark), I think it's better to provide more information like what Snowflake did.

To provide the required information, I also added some columns to information_schema.routines and information_schema.parameters.

Syntax

SHOW FUNCTIONS [ LIKE <pattern> ];

Sample Output

> show functions like '%datetrunc';
+---------------+-------------------------------------+-------------------------+-------------------------------------------------+---------------+-------------------------------------------------------+-----------------------------------+
| function_name | return_type                         | parameters              | parameter_types                                 | function_type | description                                           | syntax_example                    |
+---------------+-------------------------------------+-------------------------+-------------------------------------------------+---------------+-------------------------------------------------------+-----------------------------------+
| datetrunc     | Timestamp(Microsecond, Some("+TZ")) | [precision, expression] | [Utf8, Timestamp(Microsecond, Some("+TZ"))]     | SCALAR        | Truncates a timestamp value to a specified precision. | date_trunc(precision, expression) |
| datetrunc     | Timestamp(Nanosecond, None)         | [precision, expression] | [Utf8View, Timestamp(Nanosecond, None)]         | SCALAR        | Truncates a timestamp value to a specified precision. | date_trunc(precision, expression) |
| datetrunc     | Timestamp(Second, Some("+TZ"))      | [precision, expression] | [Utf8View, Timestamp(Second, Some("+TZ"))]      | SCALAR        | Truncates a timestamp value to a specified precision. | date_trunc(precision, expression) |
| datetrunc     | Timestamp(Microsecond, None)        | [precision, expression] | [Utf8View, Timestamp(Microsecond, None)]        | SCALAR        | Truncates a timestamp value to a specified precision. | date_trunc(precision, expression) |
| datetrunc     | Timestamp(Second, None)             | [precision, expression] | [Utf8View, Timestamp(Second, None)]             | SCALAR        | Truncates a timestamp value to a specified precision. | date_trunc(precision, expression) |
| datetrunc     | Timestamp(Microsecond, None)        | [precision, expression] | [Utf8, Timestamp(Microsecond, None)]            | SCALAR        | Truncates a timestamp value to a specified precision. | date_trunc(precision, expression) |
| datetrunc     | Timestamp(Second, None)             | [precision, expression] | [Utf8, Timestamp(Second, None)]                 | SCALAR        | Truncates a timestamp value to a specified precision. | date_trunc(precision, expression) |
| datetrunc     | Timestamp(Microsecond, Some("+TZ")) | [precision, expression] | [Utf8View, Timestamp(Microsecond, Some("+TZ"))] | SCALAR        | Truncates a timestamp value to a specified precision. | date_trunc(precision, expression) |
| datetrunc     | Timestamp(Nanosecond, Some("+TZ"))  | [precision, expression] | [Utf8, Timestamp(Nanosecond, Some("+TZ"))]      | SCALAR        | Truncates a timestamp value to a specified precision. | date_trunc(precision, expression) |
| datetrunc     | Timestamp(Millisecond, None)        | [precision, expression] | [Utf8, Timestamp(Millisecond, None)]            | SCALAR        | Truncates a timestamp value to a specified precision. | date_trunc(precision, expression) |
| datetrunc     | Timestamp(Millisecond, Some("+TZ")) | [precision, expression] | [Utf8, Timestamp(Millisecond, Some("+TZ"))]     | SCALAR        | Truncates a timestamp value to a specified precision. | date_trunc(precision, expression) |
| datetrunc     | Timestamp(Second, Some("+TZ"))      | [precision, expression] | [Utf8, Timestamp(Second, Some("+TZ"))]          | SCALAR        | Truncates a timestamp value to a specified precision. | date_trunc(precision, expression) |
| datetrunc     | Timestamp(Nanosecond, None)         | [precision, expression] | [Utf8, Timestamp(Nanosecond, None)]             | SCALAR        | Truncates a timestamp value to a specified precision. | date_trunc(precision, expression) |
| datetrunc     | Timestamp(Millisecond, None)        | [precision, expression] | [Utf8View, Timestamp(Millisecond, None)]        | SCALAR        | Truncates a timestamp value to a specified precision. | date_trunc(precision, expression) |
| datetrunc     | Timestamp(Millisecond, Some("+TZ")) | [precision, expression] | [Utf8View, Timestamp(Millisecond, Some("+TZ"))] | SCALAR        | Truncates a timestamp value to a specified precision. | date_trunc(precision, expression) |
| datetrunc     | Timestamp(Nanosecond, Some("+TZ"))  | [precision, expression] | [Utf8View, Timestamp(Nanosecond, Some("+TZ"))]  | SCALAR        | Truncates a timestamp value to a specified precision. | date_trunc(precision, expression) |
+---------------+-------------------------------------+-------------------------+-------------------------------------------------+---------------+-------------------------------------------------------+-----------------------------------+
16 row(s) fetched.

The Output Schema

function_name
return_type
parameters: The name of parameters (ordered by the ordinal position)
parameter_types: The type of parameters (ordered by the ordinal position)
description: The description of the function (the description defined in the document)
syntax_example: The syntax_example of the function (the syntax_example defined in the document)

What changes are included in this PR?

Implement SHOW FUNCTIONS.
Add the syntax_example field to information_schema.routines.
Add the rid field to information_schema.parameters.
- rid (short for routine id) is used to differentiate parameters from different signatures (it serves as the group-by key when generating the SHOW FUNCTIONS query). For example, the following signatures have different rid values:
  - datetrunc(Utf8, Timestamp(Microsecond, Some("+TZ"))) -> Timestamp(Microsecond, Some("+TZ"))
  - datetrunc(Utf8View, Timestamp(Nanosecond, None)) -> Timestamp(Nanosecond, None)

Are these changes tested?

yes, tested by sqllogictests

Are there any user-facing changes?

New SQL syntax.

alamb

This looks awesome @goldmedal . Thank you.

One thing I noticed is that including all the information in the output results in a pretty wide output schema

However, I don't really have any good suggestion to avoid this

alamb · 2024-12-19T11:13:45Z

datafusion/core/src/catalog_common/information_schema.rs

@@ -1222,6 +1237,12 @@ impl InformationSchemaParameters {
            Field::new("data_type", DataType::Utf8, false),
            Field::new("parameter_default", DataType::Utf8, true),
            Field::new("is_variadic", DataType::Boolean, false),
+            // `rid` (short for `routine id`) is used to differentiate parameters from different signatures


💯 for the documentation

goldmedal · 2024-12-19T16:25:01Z

One thing I noticed is that including all the information in the output results in a pretty wide output schema
However, I don't really have any good suggestion to avoid this

Thanks, @alamb for the review. Indeed, I think there are two directions for improvement:

How CLI shows the wide-column pretty.
SHOW FUNCTIONS can't select the specific column only.

About showing the wide column, our behavior is similar to Postgres psql.

test=# select '11111111111111111111111111111199999999999999999999999999999991111111111111111111111111111119999999999999999999999999999999', 
'1111111111111111111111111111119999999999999999999999999999999';
                                                          ?column?                                                          |                           ?column?                            
----------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------
 11111111111111111111111111111199999999999999999999999999999991111111111111111111111111111119999999999999999999999999999999 | 1111111111111111111111111111119999999999999999999999999999999
(1 row)

However, the way DuckDB did is more readable.

D select '11111111111111111111111111111199999999999999999999999999999991111111111111111111111111111119999999999999999999999999999999', 
  '1111111111111111111111111111119999999999999999999999999999999';
┌──────────────────────────────────────────────────────────────────────────────────────────┬─────────────────────────────────────────────────────────────────┐
│ '1111111111111111111111111111119999999999999999999999999999999111111111111111111111111…  │ '1111111111111111111111111111119999999999999999999999999999999' │
│                                         varchar                                          │                             varchar                             │
├──────────────────────────────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────┤
│ 11111111111111111111111111111199999999999999999999999999999991111111111111111111111111…  │ 1111111111111111111111111111119999999999999999999999999999999   │
└──────────────────────────────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────────────────────┘

About the second issue, I have no idea how to improve it 🤔. It's an limit of SHOW FUNCTIONS syntax. In DuckDB, they provide the duckdb_function() table function instead of the show function syntax. So the user can select only what they want.
If the number of columns is big, they will fold the result like:

D select * from duckdb_functions();
┌───────────────┬──────────────┬─────────────┬──────────────────────┬───────────────┬───┬──────────────────┬──────────┬──────────────┬─────────┬───────────┐
│ database_name │ database_oid │ schema_name │    function_name     │ function_type │ … │ has_side_effects │ internal │ function_oid │ example │ stability │
│    varchar    │   varchar    │   varchar   │       varchar        │    varchar    │   │     boolean      │ boolean  │    int64     │ varchar │  varchar  │
├───────────────┼──────────────┼─────────────┼──────────────────────┼───────────────┼───┼──────────────────┼──────────┼──────────────┼─────────┼───────────┤
│ system        │ 0            │ main        │ read_csv_auto        │ table         │ … │                  │ true     │           70 │         │           │
│ system        │ 0            │ main        │ read_csv_auto        │ table         │ … │                  │ true     │           70 │         │           │
│ system        │ 0            │ main        │ arrow_scan           │ table         │ … │                  │ true     │           96 │         │           │
│ system        │ 0            │ main        │ arrow_scan_dumb      │ table         │ … │                  │ true     │           98 │         │           │
│ system        │ 0            │ main        │ checkpoint           │ table         │ … │                  │ true     │           72 │         │           │

or

D select function_name, tags, parameters from duckdb_functions();
┌──────────────────────┬──────────────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│    function_name     │         tags         │                                                  parameters                                                  │
│       varchar        │ map(varchar, varch…  │                                                  varchar[]                                                   │
├──────────────────────┼──────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ read_csv_auto        │ {}                   │ [col0, hive_types_autocast, hive_types, union_by_name, filename, dtypes, null_padding, parallel, decimal_s…  │
│ read_csv_auto        │ {}                   │ [col0, delim, dateformat, column_names, sep, hive_partitioning, header, escape, allow_quoted_nulls, maximu…  │
│ arrow_scan           │ {}                   │ [col0, col1, col2]                                                                                           │
│ arrow_scan_dumb      │ {}                   │ [col0, col1, col2]                                                                                           │

Maybe we can consider improving the UX of the CLI.

comphead · 2024-12-19T23:49:20Z

Thanks @goldmedal it would be really nice to update the user documentation

alamb · 2024-12-20T14:40:27Z

Maybe we can consider improving the UX of the CLI.

That is an excellent idea -- I think it is one of @matthewmturner 's goals with https://github.com/datafusion-contrib/datafusion-dft

I am not sure how complicated we should make datafusion-cli -- there is a tension between a great CLI experience and keeping the code focused and maintainable

alamb · 2024-12-21T02:55:07Z

Thanks @goldmedal it would be really nice to update the user documentation

In #13868

alamb · 2024-12-21T02:55:18Z

Thank you @goldmedal and @comphead -- very nice

matthewmturner · 2024-12-21T13:28:36Z

Indeed, I am looking to incorporate something to improve the experience of looking at function help in dft. I have some ideas but haven't gotten around to it yet

goldmedal added 4 commits December 16, 2024 23:19

introduce rid for different signature

7afc910

implement show functions syntax

8ed2261

add syntax example

c7ed8c6

avoid duplicate join

5d33d28

github-actions bot added sql SQL Planner core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Dec 16, 2024

fix clippy

bda9a06

findepi mentioned this pull request Dec 17, 2024

Implement SHOW FUNCTIONS #12266

Closed

goldmedal added 2 commits December 17, 2024 20:09

show function_type instead of routine_type

4b1851b

add some doc and comments

412bfd6

goldmedal marked this pull request as ready for review December 17, 2024 13:54

alamb approved these changes Dec 19, 2024

View reviewed changes

alamb mentioned this pull request Dec 21, 2024

Add documentation for SHOW FUNCTIONS #13868

Merged

alamb merged commit ade14e7 into apache:main Dec 21, 2024
25 of 26 checks passed

alamb mentioned this pull request Dec 21, 2024

Fix build use of undeclared type ShowStatementFilter #13869

Merged

alamb mentioned this pull request Jan 1, 2025

Jan 1, 2025: This week(s) in DataFusion #13970

Open

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `SHOW FUNCTIONS` #13799

Implement `SHOW FUNCTIONS` #13799

goldmedal commented Dec 16, 2024 •

edited

Loading

alamb left a comment

alamb Dec 19, 2024

goldmedal commented Dec 19, 2024 •

edited

Loading

comphead commented Dec 19, 2024

alamb commented Dec 20, 2024

alamb commented Dec 21, 2024

alamb commented Dec 21, 2024

matthewmturner commented Dec 21, 2024

Implement SHOW FUNCTIONS #13799

Implement SHOW FUNCTIONS #13799

Conversation

goldmedal commented Dec 16, 2024 • edited Loading

Which issue does this PR close?

Rationale for this change

Syntax

Sample Output

The Output Schema

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

alamb left a comment

Choose a reason for hiding this comment

alamb Dec 19, 2024

Choose a reason for hiding this comment

goldmedal commented Dec 19, 2024 • edited Loading

comphead commented Dec 19, 2024

alamb commented Dec 20, 2024

alamb commented Dec 21, 2024

alamb commented Dec 21, 2024

matthewmturner commented Dec 21, 2024

Implement `SHOW FUNCTIONS` #13799

Implement `SHOW FUNCTIONS` #13799

goldmedal commented Dec 16, 2024 •

edited

Loading

goldmedal commented Dec 19, 2024 •

edited

Loading