From 893fc429a6367b8f6de135c77d65a4d9ad9eb777 Mon Sep 17 00:00:00 2001 From: Manfred Moser Date: Mon, 16 Dec 2024 09:36:50 -0800 Subject: [PATCH] Add docs for Python UDFs --- docs/src/main/sphinx/udf.md | 1 + docs/src/main/sphinx/udf/function.md | 29 +++- docs/src/main/sphinx/udf/introduction.md | 5 +- docs/src/main/sphinx/udf/python.md | 181 ++++++++++++++++++++ docs/src/main/sphinx/udf/python/examples.md | 67 ++++++++ 5 files changed, 273 insertions(+), 10 deletions(-) create mode 100644 docs/src/main/sphinx/udf/python.md create mode 100644 docs/src/main/sphinx/udf/python/examples.md diff --git a/docs/src/main/sphinx/udf.md b/docs/src/main/sphinx/udf.md index ac715cc029a..4e8c54a6142 100644 --- a/docs/src/main/sphinx/udf.md +++ b/docs/src/main/sphinx/udf.md @@ -13,4 +13,5 @@ More details are available in the following sections: udf/introduction udf/function udf/sql +udf/python ``` diff --git a/docs/src/main/sphinx/udf/function.md b/docs/src/main/sphinx/udf/function.md index a8d5b6425dc..f5564ab574a 100644 --- a/docs/src/main/sphinx/udf/function.md +++ b/docs/src/main/sphinx/udf/function.md @@ -11,7 +11,8 @@ FUNCTION name ( [ parameter_name data_type [, ...] ] ) [ CALLED ON NULL INPUT ] [ SECURITY { DEFINER | INVOKER } ] [ COMMENT description] - statements + [ WITH ( property_name = expression [, ...] ) ] + { statements | AS definition } ``` ## Description @@ -31,7 +32,9 @@ The `type` value after the `RETURNS` keyword identifies the [data type](/language/types) of the UDF output. The optional `LANGUAGE` characteristic identifies the language used for the UDF -definition with `language`. Only `SQL` is supported. +definition with `language`. The `SQL` and `PYTHON` languages are supported by +default. Additional languages may be supported via a language engine plugin. +If not specified, the default language is `SQL`. The optional `DETERMINISTIC` or `NOT DETERMINISTIC` characteristic declares that the UDF is deterministic. This means that repeated UDF calls with identical @@ -58,10 +61,18 @@ The `COMMENT` characteristic can be used to provide information about the function to other users as `description`. The information is accessible with [](/sql/show-functions). -The body of the UDF can either be a simple single `RETURN` statement with an -expression, or compound list of `statements` in a `BEGIN` block. UDF must -contain a `RETURN` statement at the end of the top-level block, even if it's -unreachable. +The optional `WITH` clause can be used to specify properties for the function. +The available properties vary based on the function language. For +[](/udf/python), the `handler` property specifies the name of the Python +function to invoke. + +For SQL UDFs the body of the UDF can either be a simple single `RETURN` +statement with an expression, or compound list of `statements` in a `BEGIN` +block. UDF must contain a `RETURN` statement at the end of the top-level block, +even if it's unreachable. + +For UDFs in other languages, the `definition` is enclosed in a `$$`-quoted +string. ## Examples @@ -89,12 +100,14 @@ SELECT meaning_of_life(); ``` Further examples of varying complexity that cover usage of the `FUNCTION` -statement in combination with other statements are available in the [SQL -UDF examples documentation](/udf/sql/examples). +statement in combination with other statements are available in the [SQL UDF +documentation](/udf/sql/examples) and the [Python UDF +documentation](/udf/python). ## See also * [](/udf) * [](/udf/sql) +* [](/udf/python) * [](/sql/create-function) diff --git a/docs/src/main/sphinx/udf/introduction.md b/docs/src/main/sphinx/udf/introduction.md index 88189ccb3f5..da479dc491f 100644 --- a/docs/src/main/sphinx/udf/introduction.md +++ b/docs/src/main/sphinx/udf/introduction.md @@ -4,8 +4,6 @@ A user-defined function (UDF) is a custom function authored by a user of Trino in a client application. UDFs are scalar functions that return a single output value, similar to [built-in functions](/functions). -UDFs are defined and written using the [SQL routine language](/udf/sql). - :::{note} Custom functions can alternatively be written in Java and deployed as a plugin. Details are available in the [developer guide](/develop/functions). @@ -14,6 +12,9 @@ plugin. Details are available in the [developer guide](/develop/functions). (udf-declaration)= ## UDF declaration +Declare the UDF with the SQL [](/udf/function) keyword and the supported +statements for [](/udf/sql) or [](/udf/python). + A UDF can be declared as an [inline UDF](udf-inline) to be used in the current query, or declared as a [catalog UDF](udf-catalog) to be used in any future query. diff --git a/docs/src/main/sphinx/udf/python.md b/docs/src/main/sphinx/udf/python.md new file mode 100644 index 00000000000..32e7dd229e6 --- /dev/null +++ b/docs/src/main/sphinx/udf/python.md @@ -0,0 +1,181 @@ +# Python user-defined functions + +A Python user-defined function is a [user-defined function](/udf) that uses the +[Python programming language and statements](python-udf-lang) for the definition +of the function. + +:::{warning} +Python user-defined functions are an experimental feature. +::: + +## Python UDF declaration + +Declare a Python UDF as [inline](udf-inline) or [catalog UDF](udf-catalog) with +the following steps: + +* Use the [](/udf/function) keyword to declare the UDF name and parameters. +* Add the `RETURNS` declaration to specify the data type of the result. +* Set the `LANGUAGE` to `PYTHON`. +* Declare the name of the Python function to call with the `handler` property in + the `WITH` block. +* Use `$$` to enclose the Python code after the `AS` keyword. +* Add the function from the handler property and ensure it returns the declared + data type. +* Expand your Python code section to implement the function using the available + [Python language](python-udf-lang). + +The following snippet shows pseudo-code: + +```text + FUNCTION python_udf_name(input_parameter data_type) + RETURNS result_data_type + LANGUAGE PYTHON + WITH (handler = 'python_function') + AS $$ + ... + def python_function(input): + return ... + ... + $$ +``` + +A minimal example declares the UDF `doubleup` that returns the input integer +value `x` multiplied by two. The example shows declaration as [](udf-inline) and +invocation with the value `21` to yield the result `42`. + +Set the language to `PYTHON` to override the default `SQL` for [](/udf/sql). +The Python code is enclosed with ``$$` and must use valid formatting. + +```text +WITH + FUNCTION doubleup(x integer) + RETURNS integer + LANGUAGE PYTHON + WITH (handler = 'twice') + AS $$ + def twice(a): + return a * 2 + $$ +SELECT doubleup(21); +-- 42 +``` + +The same UDF can also be declared as [](udf-catalog). + +Refer to the [](/udf/python/examples) for more complex use cases and examples. + +```{toctree} +:titlesonly: true +:hidden: + +/udf/python/examples +``` + +(python-udf-lang)= +## Python language details + +The Trino Python UDF integrations uses Python 3.13.0 in a sandboxed environment. +Python code runs within a WebAssembly (WASM) runtime within the Java virtual +machine running Trino. + +Python language rules including indents must be observed. + +Python UDFs therefore only have access to the Python language and core libraries +included in the sandboxed runtime. Access to external resources with network or +file system operations is not supported. Usage of other Python libraries as well +as command line tools or package managers is not supported. + +The following libraries are explicitly removed from the runtime and therefore +not available within a Python UDF: + +* `bdb` +* `concurrent` +* `curses` +* `ensurepip` +* `doctest` +* `idlelib` +* `multiprocessing` +* `pdb` +* `pydoc` +* `socketserver*` +* `sqlite3` +* `ssl` +* `subprocess*` +* `tkinter` +* `turtle*` +* `unittest` +* `venv` +* `webbrowser*` +* `wsgiref` +* `xmlrpc` + +## Type mapping + +The following table shows supported Trino types and their corresponding Python +types for input and output values of a Python UDF: + +:::{list-table} File system support properties +:widths: 50, 50 +:header-rows: 1 + +* - Trino type + - Python type +* - row + - tuple +* - array + - list +* - map + - dict +* - boolean + - bool +* - tinyint + - int +* - smallint + - int +* - integer + - int +* - bigint + - int +* - real + - float +* - double + - float +* - decimal + - decimal.Decimal +* - varchar + - str +* - varbinary + - bytes +* - date + - datetime.date +* - time + - datetime.time +* - time with time zone + - datetime.time with datetime.tzinfo +* - timestamp + - datetime.datetime +* - timestamp with time zone + - datetime.datetime with datetime.tzinfo 1 +* - interval year to month + - int as the number of months +* - interval day to second + - datetime.timedelta +* - json + - str +* - uuid + - uuid.UUID +* - ipaddress + - ipaddress.IPv4Address or ipaddress.IPv6Address + +::: + +### Date and time + +Python datetime objects only support microsecond precision. Trino argument +values with greater precision arerounded when converted to Python values, and +Python return values are rounded if the Trino return type has less than +microsecond precision. + +Only fixed offset time zones are supported. Timestamps with political time zones +have the zone converted to the zone's offset for the timestamp's instant. + diff --git a/docs/src/main/sphinx/udf/python/examples.md b/docs/src/main/sphinx/udf/python/examples.md new file mode 100644 index 00000000000..f0b8d7584b1 --- /dev/null +++ b/docs/src/main/sphinx/udf/python/examples.md @@ -0,0 +1,67 @@ +# Example Python UDFs + +After learning about [](/udf/python), the following sections show examples +of valid Python UDFs. + +## XOR + +The following example implements a `xor` function for a logical Exclusive OR +operation on two boolean input parameters and tests it with two invocations: + +```text +WITH FUNCTION xor(a boolean, b boolean) +RETURNS boolean +LANGUAGE PYTHON +WITH (handler = 'bool_xor') +AS $$ +import operator +def bool_xor(a, b): + return operator.xor(a, b) +$$ +SELECT xor(true, false), xor(false, true); +``` + +Result of the query: + +``` + true | true +``` + +## reverse_words + +The following example uses a more elaborate Python script to reverse the +characters in each word of the input string `s` of type `varchar` and tests the +function. + +```text +WITH FUNCTION reverse_words(s varchar) +RETURNS varchar +LANGUAGE PYTHON +WITH (handler = 'reverse_words') +AS $$ +import re + +def reverse(s): + str = "" + for i in s: + str = i + str + return str + +pattern = re.compile(r"\w+[.,'!?\"]\w*") + +def process_word(word): + # Reverse only words without non-letter signs + return word if pattern.match(word) else reverse(word) + +def reverse_words(payload): + text_words = payload.split(' ') + return ' '.join([process_word(w) for w in text_words]) +$$ +SELECT reverse_words('Civic, level, dna racecar era semordnilap'); +``` + +Result of the query: + +``` +Civic, level, and racecar are palindromes +``` \ No newline at end of file