-
Notifications
You must be signed in to change notification settings - Fork 980
EVF Tutorial Overview
This tutorial shows how to use the Extended Vector Framework to create a simple format plugin. The EVF framework has also been called the "row set framework" and the "new scan framework". Here we focus on using the framework via a a real-world use case: a specific format plugin based on Drill's "Easy" framework. We'll walk through the steps to convert the plugin from the traditional way to create vectors to an implementation based on the EVF.
Once you've understood the basics, you can explore additional features and read the background information.
The Drill log plugin is the focus of this tutorial. A simplified version of this plugin is explained in the Learning Apache Drill book. The version used here is the one which ships with Drill.
The focus here is on the conversion to EVF, rather than the details of the plugin. Each plugin has its own internal structure, so we leave it to the reader to map from the log reader to some other plugin.
Most format plugins are based on the "Easy" framework. EVF extends the "Easy" framework, offering a simplified plugin implementation based on EVF. The Easy framework supports both styles; we select one or the other (or even both) based on a few lines of code.
"Legacy" plugins are based on the idea of a "record reader" (a concept borrowed from Hive.) Unlike the hive record readers, Drill's never read a single record: they all read a batch of records. In EVF, the reader changes to be a "row batch reader" which implements a new batch-focused interface.
In Drill 1.16 and earlier, the LogRecordReader
uses a typical method to write to value vectors using the associated Mutator
class. Other readers tried to be more clever. For example, the "V2" text reader (Drill 1.16 and earlier) worked with direct memory itself, handling its own buffer allocation, offset vector calculations and so on.
With the EVF, we'll replace the Mutator
(or direct access to vectors) with a ColumnWriter
. We'll first do the simplest possible conversion, then look at how to use advanced features, such as type conversions, schema and table properties.
Let's work though the needed changes one-by-one.
Next: Plugin Revisions