0.5.0
[0.5.0] - 2023-11-30
Added
- #852 - Possibility to filter partitions using scalar functions - @norberttech
- #852 - DSL functions for maps/structs/list/types - @norberttech
- #847 - capitalize scalar function - @norberttech
- #844 - ref()->isTrue() - @norberttech
- #844 - ref()->isFalse() - @norberttech
- #832 - Allow to change hash algorithm in
HashIdFactory
- @stloyd - #836 - Writing chartjs output to variable - @norberttech
- #830 - DataFrame::collectRefs() - @norberttech
- #825 - pivoting datasets - @norberttech
- #820 - Add testing of PHP 8.3 into the pipeline - @stloyd
- #808 - Added DataFrame::until - @norberttech
- #807 - validator to Parquet Writer - @norberttech
- #805 - Add
TypeDetectorBench
- @stloyd - #801 - Add top-level options support for ChartJS - @stloyd
- #780 - Add
MapEntry
- @stloyd - #780 - Add
EnumType
- @stloyd - #795 - Window function count - @norberttech
- #795 - Window function dens rank - @norberttech
- #791 - Extracted Flysystem dependency to standalone adapter - @norberttech
- #788 - BatchSizeOptimization - @norberttech
- #788 - httpClient option to Meilisearch loader configuration - @norberttech
- d92b51 - Docker installation manual - @norberttech
- #571 - Dockerfile - @norberttech
- #778 - Dremel to properly shred/assemble nested structures with nullable elements - @norberttech
- #772 - Add
TypeFactory
,ArrayType
&NullType
,ResourceType
,CallableType
- @stloyd - #773 - Added parquet commands to flow.phar - @norberttech
- #765 - Add new logical
StructureType
- @stloyd - #763 - Implement recursive type allowance in
MapType
&ListType
- @stloyd - #764 - Added Flow style guide - @norberttech
- #762 - Implement new
MapType
logical type - @stloyd - #759 - CLI App - Parquet viewer - @norberttech
- #749 - Implement new
ListType
logical type - @stloyd - #755 - Parquet - DataPageV2 statistics - @norberttech
- #754 - write column chunk statistics - @norberttech
- #744 - Parquet statistics reader - @norberttech
- #736 - Parquet - DataPageV2 support - @norberttech
- #730 - Pipeline Optimizer - @norberttech
- #730 - LimitOptimization - @norberttech
- #729 - LimitTransformer - @norberttech
- #729 - Limit directly to Extractors - @norberttech
- #720 - DataFrame::batchSize(int $size) method - @norberttech
- #716 - batchSize argument to DataFrame::collect method - @norberttech
- #714 - Missing tests for FilesystemProcessor - @norberttech
- #712 - Added support for partitioning in parquet loader - @norberttech
- #711 - Allow to append into parquet files - @norberttech
- #704 - ConvertedType to column definitions in parquet for compatibility with other readers - @norberttech
- #704 - Repetitions/Definitions levels encodings in DataPage - @norberttech
- #701 - count() method to dataframe - @norberttech
- #702 - number_format expression - @norberttech
- #700 - StandWithUkrain, StandWithUs - @norberttech
- #693 - Added library version to parquet created_by metadata - @norberttech
- #692 - Allow to write rows in batches into file and streams - @norberttech
- 3bbd6f - signed phar to gitignore - @norberttech
- #680 - Add GPG Signature to "flow-php.phar" artifact - @flavioheleno
- #678 - Parquet - added support for GZIP and SNAPPY compressions - @norberttech
- #677 - Parquet - writing & reading nullable structures with nullable fields - @norberttech
- #668 - Added support for writing simple types nullable columns into parquet - @norberttech
- #661 - Loaders benchmarks - @norberttech
- #660 - Create a comment on each PR with phpbench results - @norberttech
- #654 - Parquet - implement logic deciding when to apply dictionary encoding - @norberttech
- #653 - Blackfire PHP SDK to tools - @norberttech
- #652 - Parquet - Calculate row group/page size on the fly in order to decide when to flush data to disk - @norberttech
- #648 - JSON/UUID/ENUM - to dictionary encoding - Parquet - @norberttech
- #646 - Parquet - dictionary encoded pages - @norberttech
- #644 - Possibility to iterate through all parquet file column chunk page headers - @norberttech
- #642 - Extracted data conversion between parquet/php types to a standalone class - @norberttech
- #638 - date and datetime support for parquet writter - @norberttech
- #637 - first draft of parquet writer - @norberttech
- #632 - Add missing tools to the dependabot config - @stloyd
- #627 - Added possibility to easily identify schema root in Parquet Schema - @norberttech
- #622 - Possibility to build schema through static factories at Scheam,FlatColumn,NestedColumn - @norberttech
- #621 - static factories to Flat and Nested columns in parquet schema - @norberttech
- #621 - Decimal column support to parquet - @norberttech
- #611 - RLEBigPackingHybrid encode function - @norberttech
- #613 - Pure PHP implementation of Google Snappy library - @norberttech
- #601 - output of benchmark tests into github step summary - @norberttech
- #601 - Benchamrks for NativeEntryFactory - @norberttech
- #601 - Benchamrk groups - @norberttech
- #602 - allow to manually trigger benchmark tag workflow - @norberttech
- #581 - Add PHPBench tool and first benchmark example - @stloyd
- #587 - NativeEntryFactory structures detection - @norberttech
- #587 - Metadata to StructureEntry Definition - @norberttech
Changed
- #854 - All documentation pages are now moved to monorepo - @norberttech
- #853 - Updated dependencies - @norberttech
- #852 - Deprecated all DSL static classes in favor of functions - @norberttech
- #852 - Moved whole DSL to ETL functions.php - @norberttech
- #851 - Allow usage of Symfony 7 - @stloyd
- #849 - Rework Doctrine Bulk tests to not use deprecated code - @stloyd
- #842 - Default cache path - @norberttech
- #840 - reorganized data frame tests - @norberttech
- #824 - Improve performance of
ArrayContentDetector
- @stloyd - #835 - Simplified how charts are handling references - @norberttech
- #834 - Added php ~8.3 constraint to composer.json files - @norberttech
- #833 - Use references instead of array of strings in Charts - @norberttech
- #822 - Rework Psalm configuration & adjust codebase - @stloyd
- #831 - Updated dependencies - @norberttech
- #823 - Improve Dbal Bulk coverage - @stloyd
- #821 - Simplify PHPUnit configuration - @stloyd
- #819 - Simplify
ScalarType::isValid()
method - @stloyd - #813 - Improve performance of
ArrayContentDetector
- @stloyd - #810 - Moved handling SaveMode to FilesystemStreams - @norberttech
- #812 - Improve performance of
Types
- @stloyd - #811 - Improve performance of
TypeDetector
- @stloyd - #809 - Changed default value for format in toDate scalar function - @norberttech
- #799 - Expanded Parquet schema converter in order to support deeply nested data types - @norberttech
- 700dc7 - Docs fixes - @norberttech
- #797 - Expression is now ScalarFunction - @norberttech
- #797 - Aggregator is now AggregationFunction - @norberttech
- #780 - Rework
StructureEntry
to use types - @stloyd - #780 - Rework
NativeEntryFactory
to use types - @stloyd - #788 - Optimizer can be now configured through ConfigBuilder - @norberttech
- #788 - Execution Plan Processors can be now configured through ConfigBuilder - @norberttech
- #786 - Reduce docker image size - @norberttech
- #778 - Dremel algorithms are no longer working as Generators - @norberttech
- #777 - Make types implementation serializable, mark native types as nullable - @stloyd
- #773 - renamed phar file to flow.phar - @norberttech
- #770 - Rework
ScalarType
to hold optional value - @stloyd - #761 - Move PHAR runtime into
bin
- @stloyd - #728 - Extract entry types into new namespace for further re-usage - @stloyd
- #732 - Renamed threadSafe into appendSafe - @norberttech
- #730 - Renamed LogicalPlan into ExecutionPlan - @norberttech
- #726 - GoogleSheet rows_in_batch was renamed into rows_per_page - @norberttech
- #716 - Closure was moved to Loader namespace as it applies to Loaders, not Pipelines - @norberttech
- #708 - Rename
docker-compose.yml
tocompose.yml
to match specification - @stloyd - #710 - Improved parquet writer performance - @norberttech
- #706 - Adjusted time related columns to be compatibile with other parquet libraries - @norberttech
- #705 - Add missing flag for GPG 2.2+ - @stloyd
- #704 - Boolean columns can't be anymore dictionary packed for compatibility with spark - @norberttech
- 1e756d - CS Fixes - @norberttech
- #698 - Updated readme - @norberttech
- #694 - Replaced codename parquet with flow parquet library in parquet adapter - @norberttech
- #690 - Reduce a bit randomness in
NativeEntryFactoryBench
- @stloyd - #683 - Improve adapters benchmark stability - @stloyd
- #676 - Improve performance for
Rows
:dropRight
,partitionBy
&sortBy
- @stloyd - #675 - Reduce amount of runs for benchmark testing - @stloyd
- #674 - Reduce amount of runs for infection testing - @stloyd
- #671 - Improve performance for a few
Rows
methods - @stloyd - #671 - Rework benchmark GH action to use artifact for baseline - @stloyd
- #673 - Generate baseline benchmark in one run not per group - @stloyd
- #670 - Cover most methods of
Rows
class by benchmark - @stloyd - #666 - Simplify
Rows::drop()
method - @stloyd - #666 - Rework benchmark scripts in composer.json - @stloyd
- 91d6e8 - key and restore-keys for benchmark cache - @norberttech
- #662 - Rework benchmarks to set up test data in constructors - @stloyd
- #656 - Adjust Rows::chunk() to work on generators instead of arrays - @stloyd
- f17d62 - Save Benchmark baseline in cache - @norberttech
- 05d251 - Unified steps in pr-comment workflow - @norberttech
- 524fce - Use different action to download artifacts based on workflow_id - @norberttech
- #664 - Reverted pull-requests to issues write permissions - @norberttech
- #663 - benchmark workflow trigger and permissions - @norberttech
- #661 - Github benchmark comment template - @norberttech
- #655 - Rework
GroupBy::result()
method to not recreate entries & rows in loop - @stloyd - #655 - Use
FlowContext
inGroupBy
- @stloyd - #651 - Update
.php-cs-fixer.php
in matter to use predefined rule-sets - @stloyd - #651 - Run new cs-fixer configuration against codebase - @stloyd
- #650 - Run cs-fixer against codebase - @stloyd
- #649 - Cleanup of parquet PageBuilders - @norberttech
- #645 - Rework
SnappyCompressor::compressFragment()
method - @stloyd - #645 - Add code of Snappy to static analysis - @stloyd
- #644 - PHPUnit code coverage thresholds - @norberttech
- #642 - Parquet flat path is now cached inside of the column to reduce number of iterations through schema - @norberttech
- #643 - Infection will no longer format output for GitHub - @norberttech
- #643 - PHPStan and Psalm will now report in GitHub format - @norberttech
- #636 - Reduced analyse strictness - @norberttech
- #635 - Rework
NativeEntryFactory
to be stateless - @stloyd - #630 - Unify
LICENSE
files - @stloyd - #629 - Adjust
SheetRange
&GoogleSheetExtractor
with to low values - @stloyd - #624 - Remove
array_merge()
when adding input into rows - @stloyd - #627 - Simplified Schema cache - @norberttech
- #626 - Columns in GoggleSheet Adapter cannot contain unicode characters - @stloyd
- #623 - Adjust CI setup to ignore changes in a changelog file - @stloyd
- #621 - Calculation of max definitions/repetitions level to column - @norberttech
- #621 - Python parquet file generators compression from gzip to snappy - @norberttech
- #619 - Move implementation of entry structure creation into the
NativeEntryFactory
- @stloyd - #620 - Reduce CI load by cancelling previous runs with every new commit in PR - @stloyd
- #617 - Invalid schema has no fallback in
NativeEntryFactory
- @stloyd - #616 - Entry factory moved from extractors to
FlowContext
- @stloyd - #615 - Improve quality of snappy implementation - @stloyd
- #612 - Improve performance of array comparison & sorting - @stloyd
- #608 - Rework
FlysystemFS::scan()
to improve performance - @stloyd - #609 - Move
Context::shouldPutInputIntoRows()
out of loop - @stloyd - #605 - Improve performance of Path class - @stloyd
- #607 - Rework
Entries::toArray()
to improve performance - @stloyd - #604 - Adjust benchmark report to be more useful - @stloyd
- #600 - Unify adapter benchmark datasets - @stloyd
- #598 - Rework GH testing actions to run only after code changes - @stloyd
- #597 - Reduce the amount of test data in ElasticSearch tests - @stloyd
- #596 - Simplify
CacheSpy
test double class - @stloyd - #595 - Cleanup tests before starting tests - @norberttech
- #594 - Replace hashing algorithm from
sha256
toxxh128
- @stloyd - #591 - Skip JSON & XML checks for a string that doesn't look like this type of data - @stloyd
- #588 - Update minimum required version of Flow in all packages - @stloyd
- #586 - Rework adapter tests namespaces to be consistent - @stloyd
- #587 - Schema Formatter - support for structures - @norberttech
- #587 - Parquet Loader - simplified with support for Structure Entry - @norberttech
- #587 - Avro Loader - simplified with support for Structure Entry - @norberttech
- #587 - Parquet Extractor - default options - @norberttech
- #585 - Rework text adapter test fixtures to reduce memory load - @stloyd
- #584 - Adjust
phpunit.xml
to be more efficient - @stloyd
Fixed
- #848 - Use platform for column escaping in bulk insert - @stloyd
- #840 - multiple group by execution in single pipeline - @norberttech
- #840 - moved elasticsearch HTTP Spy test double under Test namespace - @norberttech
- #838 - Fix
Definition::isEqual()
with random class order - @stloyd - #826 - overwriting pivot columns - @norberttech
- #817 - Fixed some warnings in parquet library - @norberttech
- #816 - Prevent reading multiple times from the same partitioning cache - @norberttech
- #815 - Add missing
EnumType
detection inNativeEntryFactory
- @stloyd - #814 - Fix wrong return type on entry reference DSL functions - @stloyd
- #798 - Deprecated notice for using Connection::getSchemaManager() - @tomaszhanc
- #795 - Window function rank - @norberttech
- #789 - Fix hardcoded entry name for enum entry with schema - @stloyd
- cc4bd8 - incorrect tags in docker building workflows - @norberttech
- #779 - Covered additional parquet edge cases - @norberttech
- 523741 - broken phar builds - @norberttech
- #766 - Fix wrong namespace in PHAR runtime file - @stloyd
- eb7cc8 - Reverted snappy compressor if statements order - @norberttech
- #759 - missing dependencies in parquet lib - @norberttech
- #760 - Fixed snappy warnings - @norberttech
- #751 - Fixed reading varInt - @norberttech
- #750 - CSV loader working on remote streams - @norberttech
- #743 - Prevent uninitialized string check in
NativeEntryFactory
- @stloyd - #731 - bitpacking zero values - @norberttech
- #726 - JsonLoader when writing empty Rows - @norberttech
- #714 - Added missing FileExtractor interface to all file based extractors - @norberttech
- #707 - Small typo in build workflow - @norberttech
- #704 - Bug in RLEBitPackHybrid algorithm that was always bitpacking values - @norberttech
- #703 - Fix for signing PHAR in CI - @stloyd
- #672 - Fix for wrong cache data & key in benchmark baseline action - @stloyd
- #667 - Cleanup old summaries before creating new artifacts - @norberttech
- 35a55d - baseline cache restoring - @norberttech
- fdcaa3 - cmposer cache in benchmark baseline workflow - @norberttech
- 1f172c - pr-comment workflow job name - @norberttech
- a3df35 - restoring cache during test-benchmark workflow - @norberttech
- bfec21 - paths to downloaded artifacts in pr comment workflo - @norberttech
- #661 - NativeEntryFactoryBench - @norberttech
- #658 - Prevent fatal error when no values are set in
ColumnChunkStatistics
- @stloyd - #657 - parquet writer performance degradation - @norberttech
- #653 - Optimized BinaryBufferWriter - @norberttech
- #647 - bug in calculating number of values when pages are encoded with dictionary - @norberttech
- #641 - Fix extracting data from empty Google Sheet. - @scyzoryck
- #640 - bug in RLE/bitpacking hybrid algorithm - @norberttech
- #606 - Fix wrong truncate in
ASCIIValue
class - @stloyd - #601 - benchmarks not executing extractors - @norberttech
- 61e035 - github markdown syntax in benchmark tag workflow - @norberttech
- 4f820f - default PHPBENCH_TAG value for push events - @norberttech
- 14c092 - benchmark-tag workflow - @norberttech
- #603 - Fix Avro adapter handling nullable values - @stloyd
- f512e1 - benchmark-tag - @norberttech
- #599 - Reduced total available memory to make sure that MemorySort is doing a fallback to cache - @norberttech
- #593 - Change behavior of xml adapter when xml->deep > 1 and previousDeep > xml->deep - @norberttech
- #587 - JsonLoader closing not only json streams - @norberttech
- #583 - Fix warning when bytes are missing in Parquet
BinaryBufferReader
- @stloyd
Updated
- 8216ed - minimum required version of flow components - @norberttech
- a1db35 - README.md - @norberttech
- 6e0dd4 - README.md - @norberttech
- db45c3 - list of adapter - @norberttech
- a32cc4 - pr-check.yml - @norberttech
- 02e1c4 - style_guide.md - @norberttech
- e641e7 - README.md - @norberttech
- c9fb26 - test-benchmark.yml - @norberttech
Removed
- 749bb1 - specific version from composer require instructions - @norberttech
- #841 - removed class reference not available in the scope from README example in CSV adapter - @rzarno
- #832 - Remove
Sha1IdFactory
- @stloyd - #810 - Execution Plan and Processors - @norberttech
- #804 - Remove
from*
methods from scalar entry classes - @stloyd - #797 - struct reference - @norberttech
- #796 - Remove Faker library from benchmark - @stloyd
- #776 - Remove
CollectionEntry
- @stloyd - #794 - Removed async processing - @norberttech
- #794 - DataFrame::pipeline method - @norberttech
- #787 - Removed DSL functions:
datetime_string()
,json_string()
- @stloyd - #750 - BufferExtractor - @norberttech
- #750 - Batch Size parameter from MemoryExtractor - @norberttech
- #733 - Loaders are no longer allowing for setting chunk size - @norberttech
- #729 - LimitPipeline - @norberttech
- #726 - batching logic from extractors - @norberttech
- #716 - Rows from Closure::closure method - @norberttech
- #716 - BufferLoader - @norberttech
- #709 - Removed logger from parquet - @norberttech
- #694 - Codename Parquet library dependency - @norberttech
- 8d4882 - additional char in cache restore-keys paremeters for phpbench - @norberttech
- #661 - EntryExpressionEvalTransformerBench - @norberttech
- #634 - Remove Rector tool - @stloyd
- #618 - Remove dead
ArrayRowsFactory
class - @stloyd
Deprecated
- #732 - DataFrame::threadSafe() - @norberttech
- #720 - DataFrame::parallelize() - @norberttech
Contributors
Generated by Automation