-
Notifications
You must be signed in to change notification settings - Fork 1
Batch Handling Upgrades
The batch handling framework consists of a number of layers that combine to enable Drill to control the size of each record batch, which in turn allows Drill to implement effective memory management and admission control.
The material here starts with concepts, then provides a tour of the various components. Each component is heavily commented, so after reading this material, you should be able to get the details from the code itself.
- Vector readers. Object categories. Vector indexes. Vector accessors. Array accessors. Generated code. Array wrappers for nullable, arrays.
-
Row-set writers. Top-level writers. Structure. Writing to arrays. Events. Offset vector updates.
-
Row set loader. Concept of overflow. Column states. Vector states. Overflow processing. Vector allocation. Vector cache and multi-reader model.
-
Operator framework. Split of concerns. Protocol adapter. Schema change detection.
-
Projection framework. Concepts. Project lists. Null columns. Implicit columns. Assembling the output batch. Column information in projection list. Recursive projection in maps. Schema smoothing and persistence.
-
Mock reader. CSV reader. Easy format plugin. Concept of Parquet support.
-
JSON concepts. JSON issues. Revised JSON parser. JSON semantics. Open issues. Possible opportunities.
-
Future opportunities. Code generation. Plugin APIs. Reader retrofits. Fixed-size buffers.