Skip to content

UDFs Background Information

Paul Rogers edited this page Jan 6, 2018 · 31 revisions

Introduction

Drill supports User Defined Functions (UDFs) which are SQL functions added to an existing Drill installation. UDFs use the same mechanism as Drill's built-in SQL functions. To develop UDFs effectively, you should have a strong knowledge of Drill internals gained from experience as the only documentation is the code itself.

Drill provides documentation about how to create a UDF. The information is procedural and walks you through the steps, assuming that you already know enough about Drill internals to fill in the gaps. The purpose of this page is to explain a bit of that background information for UDF authors who are not yet Drill internals experts.

If there is only one message you take away from this page, let it be:


Drill UDFs are NOT Java


Instead, they are use Drill-specific Domain-specific language (DSL) that happens to be expressed in a subset of Java. Use only those Java constructs that Drill specifically allows.

The material here describes the theory behind Drill's UDF support so you know what is going on behind the scenes. We then present a simple framework to make UDFs easier to develop and suggest debugging strategies by walking though the process to develop and test a simple (row-by-row) UDF. We then dive into the undocumented details of the mechanisms your code will use. Finally we present a troubleshooting guide of the many things that will go wrong, what they mean, and how to correct the problems.

To avoid excessive duplication, this page assumes you are familiar with the existing documentation. We'll touch on some sections to offer simpler alternates, but mostly count on the Drill documentation for the basics of setting up a Maven project, etc.

Topics

Holders

Next we note that the arguments are something called a Float8Holder rather than a Java double. The reason for this is three-fold (which we will explore deeper later):

  • The holder structure is convenient for code generation.
  • The holders can store not just the value but also whether the value is null.
  • Some types (such as VARCHAR) require more than just a simple value.

Different holder types exist for each Drill data type and cardinality (nullable, non-nullable or repeated.) Here is the (abbreviate) definition of the Float8Holder:

So, this looks pretty simple: we get our input value from x.value and we put our return value into out.value. Not quite as easy as using Java semantics, but not hard.

Clone this wiki locally