Simple and Vector Function metadata #8696
aditi-pandit
started this conversation in
General
Replies: 2 comments 1 reply
-
@mbasmanova : Would be great to hear your thoughts on this. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Co-author : @pramodsatya
A goal of Prestissimo SPIV2 is to make the Presto co-ordinator aware of the functions supported in the Prestissimo worker. Presto co-ordinator needs json conforming to
JsonBasedUdfFunctionMetadata
(https://github.com/prestodb/presto/blob/master/presto-function-namespace-managers/src/main/java/com/facebook/presto/functionNamespace/json/JsonBasedUdfFunctionMetadata.java) to register all Presto worker functions in its catalogs. So we are trying (in presto_cpp logic) to build this metadata from each function in the Velox function registries to expose to the Presto co-ordinator.To achieve this we have run into multiple road-blocks :
Metadata in JsonBasedUdfFunctionMetadata viz determinism, nullCallClause are not available in a uniform manner from Velox. Some of the information is available (if at all) from VectorFunctionMetadata, SimpleFunctionMetadata or UDFHolder (see relevant background below). We feel these fields should be exposed as part of the FunctionSignature (and AggregateFunctionSignature or WindowFunctionSignature respectively).
Prestissimo shouldn’t have to resolve functions to the actual function object and retrieve metadata from them. (like for VectorFunctionMetadata).
Prestissimo shouldn’t have to access UDFHolder/UDFRegistration logic to get these fields. Those are part of the Velox SimpleFunctionAdapter framework used to construct functions.
This metadata should be part of the Velox runtime.
Ideally, there should be a single source of truth for this metadata for both SimpleFunctions and VectorFunctions. VectorFunctionMetadata seems like a good place for this. SimpleFunctionAdapterFactory should populate this structure for Simple functions as well.
Aggregate functions do not have metadata for nullCallClause. Can this be inferred somehow ?
The rest of this document provides more details about each of the above items.
Background
Scalar functions in Velox
Scalar functions in Velox can be written using either the simple function API or as a vector function. Simple functions abstract the engine details and are easier to write.
The simple function framework allows developers to write the function logic by taking a single row of values as the input. e.g the logic of the simple function
year
which takes a single argument of type timestamp and returns anint64_t
value is defined by the followingcall
method in the classYearFunction
:Functions are registered using
registerFunction
. e.g.registerFunction<YearFunction, int64_t, Timestamp>({prefix + "year"});
The registration makes these functions and their signatures queryable from a
FunctionRegistry
structure.Vector functions
Vector functions on the other hand are written to run on whole vectors.
Vector functions derive from the base class,
VectorFunction
, which consists of a pure virtual function,apply
, that defines the function logic:Vector functions are registered against a set of supported signatures, which are constructed using the FunctionSignatureBuilder and bound to the function as follows:
Simple function metadata
Internally, all simple functions are converted to an instance of VectorFunction for vectorized execution. This is accomplished by the utility class
SimpleFunctionAdapter
, which derives from VectorFunction and constructs the apply function from the simple function definition.Mechanics of Simple function
Simple functions are registered using the following helper function in Velox, which takes the function’s class, return type, and argument types as template variables:
The function
registerSimpleFunction
is templated on the class UDFHolder and is defined as:mutableSimpleFunctions
returns an instance of classSimpleFunctionRegistry
, which keeps track of all the registered simple functions. The helper functionregisterFunction
inSimpleFunctionRegistry
registers the function name in the function registry, mapping it to the function metadata and function entry:The template variable UDF here corresponds to the class
SimpleFunctionAdapterFactoryImpl
, which exposes the metadata from the classUDFHolder
, and provides a helper function to construct an instance ofVectorFunction
using utility classSimpleFunctionAdapter
:The simple function metadata is obtained from the class
UDFHolder
and is represented by the classSimpleFunctionMetadata
:Current logic to obtain ‘is_deterministic’ and ‘default_null_behavior‘:
The function signature for simple functions is constructed in
SimpleFunctionMetadata::buildSignature
. The fieldsis_deterministic
anddefault_null_behavior
, once added to the function signature, can be updated inbuildSignature
.SimpleFunctionMetadata
is currently missing information about the default null behavior. This information is present in the classUDFHolder
in the variabledefault_null_behavior
, which uses a method resolver to check if acallNullable
method is defined in the function’s class. Thedefault_null_behavior
variable can be passed fromUDFHolder
toSimpleFunctionMetadata
when the singleton instance of function metadata is created inSimpleFunctionRegistry::registerFunction
. It would also require adding a variable inSimpleFunctionAdapterFactoryImpl
to representdefault_null_behavior
so it’s value can be obtained inSimpleFunctionRegistry
.Open questions:
We rely on using the simple function framework to obtain the metadata values. Is this an ideal way to add new fields to simple function metadata? e.g. ‘default_null_behavior’ is inferred from the UDFHolder with
static constexpr bool is_default_null_behavior = !udf_has_callNullable;
Should this field be defined by the UDF author in the registration instead ?
Vector function metadata
Velox provides the following macro and helper function to register vector functions with metadata (represented by an instance of class
VectorFunctionMetadata
):Currently, the metadata about default null behavior and determinism is obtained from the following functions in the class
VectorFunction
:Proposed change and open questions
is_deterministic
anddefault_null_behavior
. It would be ideal to obtain this metadata from the function signature without resolving the function.is_deterministic
, this can be accomplished by adding a field in bothVectorFunctionMetadata
and the function signature to represent this metadata. InregisterVectorFunction
, the value of this metadata can be obtained fromVectorFunctionMetadata
, and then set in all the function signatures.VectorFunctionMetadata
to represent the default null behavior metadata? We need this to be consistent across Simple and VectorFunctions.isDeterministic
andisDefaultNullBehavior
be removed fromVectorFunction
if they are moved toVectorFunctionMetadata
?Aggregate function metadata
For aggregate functions written using the recently added simple aggregate framework, which uses
SimpleAggregateAdapter
(eg:GeometricMeanAggregate
), the metadatadefault_null_behavior
can be obtained from the fieldSimpleAggregateAdapter::aggregate_default_null_behavior
. However, the metadatais_deterministic
is still missing. Furthermore, most of the aggregates are not written using the simple aggregate framework so the metadataaggregate_default_null_behavior
is not present. This metadata can be obtained fromAggregateFunctionSignature
by adding fields for them but it is not clear how they will be populated since this information is not present.Beta Was this translation helpful? Give feedback.
All reactions