A library consuming partquet-dotnet used to build a feature abstractions for ParquetReader. The following are features that are available through the library.
- Ability to scale page reads across Service Fabric nodes
- Execute correlation matrix across large parquet DataSet
- Summary column statistics across millions or billions of rows
- Linear regression at Service Fabric scale with SGD
This project exposes a few packages:
A set of LINQ operators to calculate the following functions on a stream of numbers:
- Kurtosis
- Mean
- Quantile25, Quantile75
- Median
- Skewness
- Standard Deviation
- Variance
Common math data structures - Matrix<T>
, Series
, Frame
Integration with different data formats (Parquet, CSV etc.). Supported formats:
- Apache Parquet
- CSV
Please read contibuting section for general git guidanec and architecture for code orientation.