Skip to content

Vectorized, push-based query engine using Arrow.

Notifications You must be signed in to change notification settings

clflushopt/holocene

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Holocene

holocene is a follow up to eocene where we implement a vectorized, push based query engine using Arrow as the data format.

Push-based Vectorized Execution

Vectorized execution in the context of database workloads means batches of records, most often when speaking about vectorized execution the meaning is along the lines of the Volcano model, but instead of next() returning a single record, next() returns multiple records.

Actual vectorization as in, SIMD instructions, is sometimes used to implement faster compute kernels but they don't mean the entire query plan is vectorized, but the plan can indeed be executed in parallel.

Push-based in this context describes a paradigm different from the Volcano model , where operators push their results down the pipeline. This approach has the benefit that the query plan becomes a DAG that can be executed in parallel, except for pipeline breakers that can be seen as join points.

Vectorized + push-based models are extremely good for OLAP workloads and represent the union of two ideas, push-based models and vectorized models.

Parallelizable part of the pipeline
each step pushes, multiple records
down the pipeline
                                                       Pipeline breaking, since LIMIT
                                                       will be applied over all records
  +--------+                                                             |
  | Batch  |    +--------+    +------+    +--------+    +------------+   |    +-------+
  +--------+--->| Source |--->| Scan |--->| Filter |--->| Projection |   |    |       |
                +--------+    +------+    +--------+    +------------+   |    |       |
  +--------+                                                             |    |       |
  | Batch  |    +--------+    +------+    +--------+    +------------+   |    |       |
  +--------+--->| Source |--->| Scan |--->| Filter |--->| Projection |---+--->| Limit |
                +--------+    +------+    +--------+    +------------+   |    |       |
  +--------+                                                             |    |       |
  | Batch  |    +--------+    +------+    +--------+    +------------+   |    |       |
  +--------+--->| Source |--->| Scan |--->| Filter |--->| Projection |   |    |       |
                +--------+    +------+    +--------+    +------------+   |    +-------+
                                                                         |
  +--------+                                                             |
  | Batch  |                                                             |
  +--------+                                                             |

About

Vectorized, push-based query engine using Arrow.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages