-
Notifications
You must be signed in to change notification settings - Fork 1
UDFs Background Information
Drill provides documentation about how to create a User Defined Function (UDF). The information is procedural and walks you through the steps. While this is a great start, some people would like to know what is happening "behind the scenes." That is the topic of this page.
If there is only one message you take away from this page, let it be:
Drill UDFs are NOT Java
Instead, they are use Drill-specific Domain-specific language (DSL) that happens to be expressed in a subset of Java. Use only those Java constructs that Drill specifically allows.
The material here describes the theory of Drill's UDF support so you know what is going on behind the scenes. We then present a simple framework to make UDFs easier to develop and suggest debugging strategies. Finally we present a troubleshooting guide of the many things that will go wrong, what they mean, and how to correct the problems.
To avoid excessive duplication, this page assumes you are familiar with the existing documentation. We'll touch on some sections to offer simpler alternates, but mostly count on the Drill documentation for the basics of setting up a Maven project, etc.
- UDF Theory
- Structure of a UDF
- UDF Semantics
- Simplified UDF Framework
- Data Types and Holders for UDFs
- Debugging UDFs
- UDF Troubleshooting
- Aggregate UDFs
- Working with VARCHAR Data in UDFs
- Packaging and Deploying UDFs
Next we note that the arguments are something called a Float8Holder
rather than a Java double
. The reason for this is three-fold (which we will explore deeper later):
- The holder structure is convenient for code generation.
- The holders can store not just the
value
but also whether the value is null. - Some types (such as
VARCHAR
) require more than just a simple value.
Different holder types exist for each Drill data type and cardinality (nullable, non-nullable or repeated.) Here is the (abbreviate) definition of the Float8Holder
:
So, this looks pretty simple: we get our input value from x.value
and we put our return value into out.value
. Not quite as easy as using Java semantics, but not hard.