Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement SHOW FUNCTIONS #13799

Merged
merged 7 commits into from
Dec 21, 2024
Merged

Conversation

goldmedal
Copy link
Contributor

@goldmedal goldmedal commented Dec 16, 2024

Which issue does this PR close?

Closes #12144.

Rationale for this change

This PR implements SHOW FUNCTIONS to list the available functions during runtime. Instead of listing the function name only (like Apache Spark), I think it's better to provide more information like what Snowflake did.

To provide the required information, I also added some columns to information_schema.routines and information_schema.parameters.

Syntax

SHOW FUNCTIONS [ LIKE <pattern> ];

Sample Output

> show functions like '%datetrunc';
+---------------+-------------------------------------+-------------------------+-------------------------------------------------+---------------+-------------------------------------------------------+-----------------------------------+
| function_name | return_type                         | parameters              | parameter_types                                 | function_type | description                                           | syntax_example                    |
+---------------+-------------------------------------+-------------------------+-------------------------------------------------+---------------+-------------------------------------------------------+-----------------------------------+
| datetrunc     | Timestamp(Microsecond, Some("+TZ")) | [precision, expression] | [Utf8, Timestamp(Microsecond, Some("+TZ"))]     | SCALAR        | Truncates a timestamp value to a specified precision. | date_trunc(precision, expression) |
| datetrunc     | Timestamp(Nanosecond, None)         | [precision, expression] | [Utf8View, Timestamp(Nanosecond, None)]         | SCALAR        | Truncates a timestamp value to a specified precision. | date_trunc(precision, expression) |
| datetrunc     | Timestamp(Second, Some("+TZ"))      | [precision, expression] | [Utf8View, Timestamp(Second, Some("+TZ"))]      | SCALAR        | Truncates a timestamp value to a specified precision. | date_trunc(precision, expression) |
| datetrunc     | Timestamp(Microsecond, None)        | [precision, expression] | [Utf8View, Timestamp(Microsecond, None)]        | SCALAR        | Truncates a timestamp value to a specified precision. | date_trunc(precision, expression) |
| datetrunc     | Timestamp(Second, None)             | [precision, expression] | [Utf8View, Timestamp(Second, None)]             | SCALAR        | Truncates a timestamp value to a specified precision. | date_trunc(precision, expression) |
| datetrunc     | Timestamp(Microsecond, None)        | [precision, expression] | [Utf8, Timestamp(Microsecond, None)]            | SCALAR        | Truncates a timestamp value to a specified precision. | date_trunc(precision, expression) |
| datetrunc     | Timestamp(Second, None)             | [precision, expression] | [Utf8, Timestamp(Second, None)]                 | SCALAR        | Truncates a timestamp value to a specified precision. | date_trunc(precision, expression) |
| datetrunc     | Timestamp(Microsecond, Some("+TZ")) | [precision, expression] | [Utf8View, Timestamp(Microsecond, Some("+TZ"))] | SCALAR        | Truncates a timestamp value to a specified precision. | date_trunc(precision, expression) |
| datetrunc     | Timestamp(Nanosecond, Some("+TZ"))  | [precision, expression] | [Utf8, Timestamp(Nanosecond, Some("+TZ"))]      | SCALAR        | Truncates a timestamp value to a specified precision. | date_trunc(precision, expression) |
| datetrunc     | Timestamp(Millisecond, None)        | [precision, expression] | [Utf8, Timestamp(Millisecond, None)]            | SCALAR        | Truncates a timestamp value to a specified precision. | date_trunc(precision, expression) |
| datetrunc     | Timestamp(Millisecond, Some("+TZ")) | [precision, expression] | [Utf8, Timestamp(Millisecond, Some("+TZ"))]     | SCALAR        | Truncates a timestamp value to a specified precision. | date_trunc(precision, expression) |
| datetrunc     | Timestamp(Second, Some("+TZ"))      | [precision, expression] | [Utf8, Timestamp(Second, Some("+TZ"))]          | SCALAR        | Truncates a timestamp value to a specified precision. | date_trunc(precision, expression) |
| datetrunc     | Timestamp(Nanosecond, None)         | [precision, expression] | [Utf8, Timestamp(Nanosecond, None)]             | SCALAR        | Truncates a timestamp value to a specified precision. | date_trunc(precision, expression) |
| datetrunc     | Timestamp(Millisecond, None)        | [precision, expression] | [Utf8View, Timestamp(Millisecond, None)]        | SCALAR        | Truncates a timestamp value to a specified precision. | date_trunc(precision, expression) |
| datetrunc     | Timestamp(Millisecond, Some("+TZ")) | [precision, expression] | [Utf8View, Timestamp(Millisecond, Some("+TZ"))] | SCALAR        | Truncates a timestamp value to a specified precision. | date_trunc(precision, expression) |
| datetrunc     | Timestamp(Nanosecond, Some("+TZ"))  | [precision, expression] | [Utf8View, Timestamp(Nanosecond, Some("+TZ"))]  | SCALAR        | Truncates a timestamp value to a specified precision. | date_trunc(precision, expression) |
+---------------+-------------------------------------+-------------------------+-------------------------------------------------+---------------+-------------------------------------------------------+-----------------------------------+
16 row(s) fetched. 

The Output Schema

  • function_name
  • return_type
  • parameters: The name of parameters (ordered by the ordinal position)
  • parameter_types: The type of parameters (ordered by the ordinal position)
  • description: The description of the function (the description defined in the document)
  • syntax_example: The syntax_example of the function (the syntax_example defined in the document)

What changes are included in this PR?

  • Implement SHOW FUNCTIONS.
  • Add the syntax_example field to information_schema.routines.
  • Add the rid field to information_schema.parameters.
    • rid (short for routine id) is used to differentiate parameters from different signatures (it serves as the group-by key when generating the SHOW FUNCTIONS query). For example, the following signatures have different rid values:
      • datetrunc(Utf8, Timestamp(Microsecond, Some("+TZ"))) -> Timestamp(Microsecond, Some("+TZ"))
      • datetrunc(Utf8View, Timestamp(Nanosecond, None)) -> Timestamp(Nanosecond, None)

Are these changes tested?

yes, tested by sqllogictests

Are there any user-facing changes?

New SQL syntax.

@github-actions github-actions bot added sql SQL Planner core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Dec 16, 2024
@goldmedal goldmedal marked this pull request as ready for review December 17, 2024 13:54
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome @goldmedal . Thank you.

One thing I noticed is that including all the information in the output results in a pretty wide output schema

Screenshot 2024-12-19 at 6 14 52 AM

However, I don't really have any good suggestion to avoid this

@@ -1222,6 +1237,12 @@ impl InformationSchemaParameters {
Field::new("data_type", DataType::Utf8, false),
Field::new("parameter_default", DataType::Utf8, true),
Field::new("is_variadic", DataType::Boolean, false),
// `rid` (short for `routine id`) is used to differentiate parameters from different signatures
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 for the documentation

@goldmedal
Copy link
Contributor Author

goldmedal commented Dec 19, 2024

One thing I noticed is that including all the information in the output results in a pretty wide output schema

Screenshot 2024-12-19 at 6 14 52 AM However, I don't really have any good suggestion to avoid this

Thanks, @alamb for the review. Indeed, I think there are two directions for improvement:

  • How CLI shows the wide-column pretty.
  • SHOW FUNCTIONS can't select the specific column only.

About showing the wide column, our behavior is similar to Postgres psql.

test=# select '11111111111111111111111111111199999999999999999999999999999991111111111111111111111111111119999999999999999999999999999999', 
'1111111111111111111111111111119999999999999999999999999999999';
                                                          ?column?                                                          |                           ?column?                            
----------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------
 11111111111111111111111111111199999999999999999999999999999991111111111111111111111111111119999999999999999999999999999999 | 1111111111111111111111111111119999999999999999999999999999999
(1 row)

However, the way DuckDB did is more readable.

D select '11111111111111111111111111111199999999999999999999999999999991111111111111111111111111111119999999999999999999999999999999', 
  '1111111111111111111111111111119999999999999999999999999999999';
┌──────────────────────────────────────────────────────────────────────────────────────────┬─────────────────────────────────────────────────────────────────┐
│ '1111111111111111111111111111119999999999999999999999999999999111111111111111111111111…  │ '1111111111111111111111111111119999999999999999999999999999999' │
│                                         varchar                                          │                             varchar                             │
├──────────────────────────────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────┤
│ 11111111111111111111111111111199999999999999999999999999999991111111111111111111111111…  │ 1111111111111111111111111111119999999999999999999999999999999   │
└──────────────────────────────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────────────────────┘

About the second issue, I have no idea how to improve it 🤔. It's an limit of SHOW FUNCTIONS syntax. In DuckDB, they provide the duckdb_function() table function instead of the show function syntax. So the user can select only what they want.
If the number of columns is big, they will fold the result like:

D select * from duckdb_functions();
┌───────────────┬──────────────┬─────────────┬──────────────────────┬───────────────┬───┬──────────────────┬──────────┬──────────────┬─────────┬───────────┐
│ database_name │ database_oid │ schema_name │    function_name     │ function_type │ … │ has_side_effects │ internal │ function_oid │ example │ stability │
│    varchar    │   varchar    │   varchar   │       varchar        │    varchar    │   │     boolean      │ boolean  │    int64     │ varchar │  varchar  │
├───────────────┼──────────────┼─────────────┼──────────────────────┼───────────────┼───┼──────────────────┼──────────┼──────────────┼─────────┼───────────┤
│ system        │ 0            │ main        │ read_csv_auto        │ table         │ … │                  │ true     │           70 │         │           │
│ system        │ 0            │ main        │ read_csv_auto        │ table         │ … │                  │ true     │           70 │         │           │
│ system        │ 0            │ main        │ arrow_scan           │ table         │ … │                  │ true     │           96 │         │           │
│ system        │ 0            │ main        │ arrow_scan_dumb      │ table         │ … │                  │ true     │           98 │         │           │
│ system        │ 0            │ main        │ checkpoint           │ table         │ … │                  │ true     │           72 │         │           │

or

D select function_name, tags, parameters from duckdb_functions();
┌──────────────────────┬──────────────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│    function_name     │         tags         │                                                  parameters                                                  │
│       varchar        │ map(varchar, varch…  │                                                  varchar[]                                                   │
├──────────────────────┼──────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ read_csv_auto        │ {}                   │ [col0, hive_types_autocast, hive_types, union_by_name, filename, dtypes, null_padding, parallel, decimal_s…  │
│ read_csv_auto        │ {}                   │ [col0, delim, dateformat, column_names, sep, hive_partitioning, header, escape, allow_quoted_nulls, maximu…  │
│ arrow_scan           │ {}                   │ [col0, col1, col2]                                                                                           │
│ arrow_scan_dumb      │ {}                   │ [col0, col1, col2]                                                                                           │

Maybe we can consider improving the UX of the CLI.

@comphead
Copy link
Contributor

Thanks @goldmedal it would be really nice to update the user documentation

@alamb
Copy link
Contributor

alamb commented Dec 20, 2024

Maybe we can consider improving the UX of the CLI.

That is an excellent idea -- I think it is one of @matthewmturner 's goals with https://github.com/datafusion-contrib/datafusion-dft

I am not sure how complicated we should make datafusion-cli -- there is a tension between a great CLI experience and keeping the code focused and maintainable

@alamb
Copy link
Contributor

alamb commented Dec 21, 2024

Thanks @goldmedal it would be really nice to update the user documentation

In #13868

@alamb alamb merged commit ade14e7 into apache:main Dec 21, 2024
25 of 26 checks passed
@alamb
Copy link
Contributor

alamb commented Dec 21, 2024

Thank you @goldmedal and @comphead -- very nice

@matthewmturner
Copy link
Contributor

Indeed, I am looking to incorporate something to improve the experience of looking at function help in dft. I have some ideas but haven't gotten around to it yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate sql SQL Planner sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

List available functions (SHOW FUNCTIONS)
4 participants