Skip to content

UDFs Background Information

Paul Rogers edited this page Jan 6, 2018 · 31 revisions

Introduction

Drill provides documentation about how to create a User Defined Function (UDF). The information is procedural and walks you through the steps. While this is a great start, some people would like to know what is happening "behind the scenes." That is the topic of this page.

If there is only one message you take away from this page, let it be:


Drill UDFs are NOT Java


Instead, they are use Drill-specific Domain-specific language (DSL) that happens to be expressed in a subset of Java. Use only those Java constructs that Drill specifically allows.

The material here describes the theory of Drill's UDF support so you know what is going on behind the scenes. We then present a simple framework to make UDFs easier to develop and suggest debugging strategies. Finally we present a troubleshooting guide of the many things that will go wrong, what they mean, and how to correct the problems.

To avoid excessive duplication, this page assumes you are familiar with the existing documentation. We'll touch on some sections to offer simpler alternates, but mostly count on the Drill documentation for the basics of setting up a Maven project, etc.

Topics

Holders

Next we note that the arguments are something called a Float8Holder rather than a Java double. The reason for this is three-fold (which we will explore deeper later):

  • The holder structure is convenient for code generation.
  • The holders can store not just the value but also whether the value is null.
  • Some types (such as VARCHAR) require more than just a simple value.

Different holder types exist for each Drill data type and cardinality (nullable, non-nullable or repeated.) Here is the (abbreviate) definition of the Float8Holder:

So, this looks pretty simple: we get our input value from x.value and we put our return value into out.value. Not quite as easy as using Java semantics, but not hard.

Clone this wiki locally