Velox is a C++ database acceleration library which provides reusable, extensible, and high-performance data processing components. These components can be reused to build compute engines focused on different analytical workloads, including batch, interactive, stream processing, and AI/ML. Velox was created by Facebook and it is currently developed in partnership with Intel, ByteDance, and Ahana.
In common usage scenarios, Velox takes a fully optimized query plan as input and performs the described computation. Considering Velox does not provide a SQL parser, a dataframe layer, or a query optimizer, it is usually not meant to be used directly by end-users; rather, it is mostly used by developers integrating and optimizing their compute engines.
Velox provides the following high-level components:
- Type: a generic typing system that supports scalar, complex, and nested types, such as structs, maps, arrays, tensors, etc.
- Vector: an Arrow-compatible columnar memory layout module, which provides multiple encodings, such as Flat, Dictionary, Constant, Sequence/RLE, and Bias, in addition to a lazy materialization pattern and support for out-of-order writes.
- Expression Eval: a fully vectorized expression evaluation engine that allows expressions to be efficiently executed on top of Vector/Arrow encoded data.
- Function Packages: sets of vectorized function implementations following the Presto and Spark semantic.
- Operators: implementation of common data processing operators such as scans, projection, filtering, groupBy, orderBy, shuffle, hash join, unnest, and more.
- I/O: a generic connector interface that allows different file formats (ORC/DWRF and Parquet) and storage adapters (S3, HDFS, local files) to be used.
- Network Serializers: an interface where different wire protocols can be implemented, used for network communication, supporting PrestoPage and Spark’s UnsafeRow.
- Resource Management: a collection of primitives for handling computational resources, such as memory arenas and buffer management, tasks, drivers, and thread pools for CPU and thread execution, spilling, and caching.
Velox is extensible and allows developers to define their own engine-specific specializations, including:
- Custom types
- Simple and vectorized functions
- Aggregate functions
- Operators
- File formats
- Storage adapters
- Network serializers
Síguenos en Linkedin para estar al tanto de todas las novedades.