-
Notifications
You must be signed in to change notification settings - Fork 980
Storage Plugin Configuration
This section discusses how Drill learns of the storage plugin and instantiates it and its related classes. Later sections will discuss the operations performed on it.
The key elements of a storage plugin include:
- The storage plugin configuration: the JSON format and the corresponding Jackson-serialized Java class.
- The storage plugin class.
- Planning nodes and physical operators.
- Execution time operator implementations.
A storage plugin starts with the plugin configuration and plugin class. Given the elements described below, Drill will find, load and configure the storage plugin, and you will see the default plugin configuration in Drill's web UI when you first start Drill. (Perhaps the configuration must be created manually if the plugin registration already exists?...)
Drill locates a storage plugin using the following process:
- Add a
drill-module.conf
file that Drill finds at start-up. - In the
drill-module.conf
file, add the Java package to Drill's class path scan. - Drill scans the class path looking for classes that derive from
StoragePlugin
.
Storage plugins are of two types: intrinsic or extensions. An intrinsic plugin is one (such as FileSystemPlugin
) defined within the Drill source tree. An extension lives outside of the Drill source tree, is packaged into a jar, and is placed in one of Drill's class path folders: often $DRILL_HOME/jars/3rdparty
or $DRILL_SITE/jars
.
Intrinsic plugins tend to put their configuration into the java-exec/src/main/resources/drill-module.conf
file which is packaged into the drill-java-exec-x.y.z.jar
file.
Extension plugins are packaged into a jar and that jar must contain a drill-module.conf
file in standard HOCON format. The bare minimum configuration file contents is to add the extension package to Drill's class path scan. Here is an example from the Kudu plugin:
drill: {
classpath.scanning: {
packages += "org.apache.drill.exec.store.kudu"
}
}
The file can, of course, add other configuration as desired.
The storage plugin class must implement StoragePlugin
(often via the AbstractStoragePlugin
class.) It is the implementation of this interface which marks the class as a storage plugin. Any class that implements StoragePlugin
is presumed to be one. To disable a class which is not really a plugin, create a configuration but mark the plugin as disabled in the configuration.
The class must implement a three-argument constructor:
public MockStorageEngine(MockStorageEngineConfig configuration,
DrillbitContext context, String name) {
The first argument is "magic". Drill uses Java introspection to find this constructor and find its first argument. The type of the first argument identifies the class of the configuration for this plugin. That is, it is this constructor that associates a storage plugin configuration (class) with the corresponding storage plugin class.
It appears that Drill attempts to create a single instance of the storage plugin per Drillbit. Some of the logic in StoragePluginRegistryImpl that seems to suggest the one-per-Drillbit semantics. However, empirical tests suggest that the plugin instance is created multiple times. (Need to clarify.)
The plugin configuration class must
- Be Jackson serializable, and
- implement the
StoragePluginConfig
interface (often via theStoragePluginConfigBase
class.) - Implement the
equals()
andhash()
methods. Evidently, plugins are considered based on their content and Drill must sometimes determine if two plugin configurations are identical.
Somewhere on the class path a file must exist called bootstrap-storage-plugins.json
which contains at least one serialized form of the storage plugin configuration. Without such as class, the plugin is invisible to Drill. (That is, the plugin exists only via a configuration.) Again, intrinsic plugins use the file in Drill's java-exec, an extension must provide its own file.
At one point, each operator also needed a unique enum value in UserBit.proto, CoreOperatorType
, but this use is now deprecated. It is used primarily to identify the operator IDs used in the query profile.
Drill maintains a storage registry storage in Zookeeper (normally) or on disk (for an embedded Drillbit.) When Drill starts, it scans the class path for storage plugins as indicated above. Drill then reads either the stored plugin configurations or the bootstrap-storage-plugins
file if no stored configurations exist. Each plugin configuration is deserialized using Jackson. Jackson maps from the JSON to Java class form using the type
field. For the mock storage configuration:
"type" : "mock",
This must map to an annotation on the configuration class:
@JsonTypeName("mock")
public class MockStorageEngineConfig extends StoragePluginConfigBase {
Given this, Jackson can search the class path looking for the class with the proper annotation. Jackson then deserializes the JSON form to produce the storage plugin configuration instance. (This is, by the way, why Drill fails if a storage plugin implementation is no longer available: the Jackson deserialization causes a fatal error if the required class does not exist on the class path.)
Next, the configuration instance is mapped to the storage plugin class using a mapping from storage plugin configuration class to storage plugin class. Recall the special three-argument constructor mentioned above: that gives Drill the information to associate the two classes. (Actually, Drill maintains a table from storage plugin configuration class to storage plugin constructor so that Drill can create a new instance on each reference.)