Skip to content

Latest commit

 

History

History
1442 lines (1224 loc) · 154 KB

CHANGELOG.md

File metadata and controls

1442 lines (1224 loc) · 154 KB

Changelog

rust-v0.20.1 (2024-09-27)

Full Changelog

Implemented enhancements:

  • Allow to specify Azurite hostname and service port as backend #2900
  • docs section usage/Managing a table is out of date w.r.t. optimizing tables #2891
  • generate more sensible row group size #2545

Fixed bugs:

  • Cannot write to Minio with deltalake.write_deltalake or Polars #2894
  • Schema Mismatch Error When appending Parquet Files with Metadata using Rust Engine #2888
  • Assume role support has been broken since 2022 🤣 #2879
  • z-order fails on table that is partitioned by value with space #2834
  • "builder error for url" when creating an instance of a DeltaTable which is located in an azurite blob storage #2815

Closed issues:

  • delta-rs can't write to a table if datafusion is not enabled #2910

rust-v0.20.0 (2024-09-18)

Full Changelog

Fixed bugs:

  • DeltaTableBuilder flags ignored #2808
  • Require files in config is not anymore used to skip reading add actions #2796

Merged pull requests:

rust-v0.19.1 (2024-09-11)

Full Changelog

Implemented enhancements:

  • question: deletionVectors support #2829
  • [Minor] Make Add::get_json_stats public #2821
  • expose target_file_size in python side for WriterProperties #2810
  • expose default_column_properties, column_properties of parquet WriterProperties in python #2785
  • CDC support in deltalog when writing delta table #2720
  • Function behaving similarly to SHOW PARTITIONS in the Python API #2671
  • Expose set_statistics_truncate_length via Python WriterProperties #2630

Fixed bugs:

  • write_deltalake with predicate throw index out of bounds #2867
  • writing to blobfuse has stopped working in 0.19.2 #2860
  • cannot read from public GCS bucket if non logged in #2859
  • Stats missing for dataSkippingStatsColumns when escaping column name #2849
  • 0.19.2 install error when using poetry, pdm on Ubuntu #2848
  • deltalake-* crates use different version than specified in Cargo.toml, leading to unexpected behavior #2847
  • Databricks fails integrity check after compacting with delta-rs #2839
  • "failed to load region from IMDS" back in 0.19 despite AWS_EC2_METADATA_DISABLED=true #2819
  • min/max_row_groups not respected #2814
  • Large Memory Spike on Merge #2802
  • Deleting large number of records fails with no error message #2798
  • max_spill_size incorrect default value #2794
  • Delta-RS Saved Delta Table not properly ingested into Databricks #2779
  • Missing Linux binary releases and source tarball for Python release v0.19.0 #2777
  • Transaction log parsing performance regression #2760
  • RecordBatchWriter only creates stats for the first 32 columns; this prevents calling create_checkpoint. #2745
  • DeltaScanBuilder does not respect datafusion context's datafusion.execution.parquet.pushdown_filters #2739
  • IN (...) clauses appear to be ignored in merge commands with S3 - extra partitions scanned #2726
  • Trailing slash on AWS_ENDPOINT raises S3 Error #2656
  • AsyncChunkReader::get_bytes error: Generic MicrosoftAzure error: error decoding response body #2592

rust-v0.19.0 (2024-08-14)

Full Changelog

Implemented enhancements:

  • Only allow squash merge #2542

Fixed bugs:

  • Write also insert change types in writer CDC #2750
  • Regression in Python multiprocessing support #2744
  • SchemaError occurs during table optimisation after upgrade to v0.18.1 #2731
  • AWS WebIdentityToken exposure in log files #2719
  • Write performance degrades with multiple writers #2683
  • Write monotonic sequence, but read is non monotonic #2659
  • Python write_deltalake with schema_mode="merge" casts types #2642
  • Newest docs (potentially) not released #2587
  • CDC is not generated for Structs and Lists #2568

Closed issues:

Merged pull requests:

rust-v0.18.2 (2024-08-07)

Full Changelog

Implemented enhancements:

  • Choose which columns to store min/max values for #2709
  • Projection pushdown for load_cdf #2681
  • Way to check if Delta table exists at specified path #2662
  • Support HDFS via hdfs-native package #2611
  • Deletion _change_type does not appear in change data feed #2579

Fixed bugs:

  • Slow add_actions.to_pydict for tables with large number of columns, impacting read performance #2733
  • append is deleting records #2716
  • segmentation fault - Python 3.10 on Mac M3 #2706
  • Failure to delete dir and files #2703
  • DeltaTable.from_data_catalog not working #2699
  • Project should use the same version of ruff in the lint stage of python_build.yml as in pyproject.toml #2678
  • un-tracked columns are giving json error when pyarrow schema have feild with nullable=False and create_checkpoint is trigged #2675
  • [BUG]write_delta({'custom_metadata':str}) cannot be converted. str to pyDict error (0.18.2_DeltaPython/Windows10) #2697
  • Pyarrow engine not supporting schema overwrite with Append mode #2654
  • deltalake-core version re-exported by deltalake different than versions used by deltalake-azure and deltalake-gcp #2647
  • i32 limit in JSON stats #2646
  • Rust writer not encoding correct URL for partitions in delta table #2634
  • Large Types breaks merge predicate pruning #2632
  • Getting error when converting a partitioned parquet table to delta table #2626
  • Arrow: Parquet does not support writing empty structs when creating checkpoint #2622
  • InvalidTableLocation("Unknown scheme: gs") on 0.18.0 #2610
  • Unable to read delta table created using Uniform #2578
  • schema merging doesn't work when overwriting with a predicate #2567

Closed issues:

  • Unable to write new partitions with type timestamp on tables created with delta-rs 0.10.0 #2631

Merged pull requests:

rust-v0.19.0 (2024-08-14)

Full Changelog

Implemented enhancements:

  • Only allow squash merge #2542

Fixed bugs:

  • Write also insert change types in writer CDC #2750
  • Regression in Python multiprocessing support #2744
  • SchemaError occurs during table optimisation after upgrade to v0.18.1 #2731
  • AWS WebIdentityToken exposure in log files #2719
  • Write performance degrades with multiple writers #2683
  • Write monotonic sequence, but read is non monotonic #2659
  • Python write_deltalake with schema_mode="merge" casts types #2642
  • Newest docs (potentially) not released #2587
  • CDC is not generated for Structs and Lists #2568

Closed issues:

Merged pull requests:

rust-v0.18.2 (2024-08-07)

Full Changelog

Implemented enhancements:

  • Choose which columns to store min/max values for #2709
  • Projection pushdown for load_cdf #2681
  • Way to check if Delta table exists at specified path #2662
  • Support HDFS via hdfs-native package #2611
  • Deletion _change_type does not appear in change data feed #2579
  • Could you please explain in the README what "Deltalake" is for the uninitiated? #2523
  • Discuss: Allow protocol change during write actions #2444
  • Support for Arrow PyCapsule interface #2376

Fixed bugs:

  • Slow add_actions.to_pydict for tables with large number of columns, impacting read performance #2733
  • append is deleting records #2716
  • segmentation fault - Python 3.10 on Mac M3 #2706
  • Failure to delete dir and files #2703
  • DeltaTable.from_data_catalog not working #2699
  • Project should use the same version of ruff in the lint stage of python_build.yml as in pyproject.toml #2678
  • un-tracked columns are giving json error when pyarrow schema have feild with nullable=False and create_checkpoint is trigged #2675
  • [BUG]write_delta({'custom_metadata':str}) cannot be converted. str to pyDict error (0.18.2_DeltaPython/Windows10) #2697
  • Pyarrow engine not supporting schema overwrite with Append mode #2654
  • deltalake-core version re-exported by deltalake different than versions used by deltalake-azure and deltalake-gcp #2647
  • i32 limit in JSON stats #2646
  • Rust writer not encoding correct URL for partitions in delta table #2634
  • Large Types breaks merge predicate pruning #2632
  • Getting error when converting a partitioned parquet table to delta table #2626
  • Arrow: Parquet does not support writing empty structs when creating checkpoint #2622
  • InvalidTableLocation("Unknown scheme: gs") on 0.18.0 #2610
  • Unable to read delta table created using Uniform #2578
  • schema merging doesn't work when overwriting with a predicate #2567
  • Not working in AWS Lambda (0.16.2 - 0.17.4) OSError: Generic S3 error #2511
  • DataFusion filter on partition column doesn't work. (when the phsical schema ordering is different to logical one) #2494
  • Creating checkpoints for tables with missing column stats results in Err #2493
  • Cannot merge to a table with a timestamp column after upgrading delta-rs #2478
  • Azure AD Auth fails on ARM64 #2475
  • Generic S3 error: Error after 0 retries ... Broken pipe (os error 32) #2403
  • write_deltalake identifies large_string as datatype even though string is set in schema #2374
  • Inconsistent arrow timestamp type breaks datafusion query #2341

Closed issues:

  • Unable to write new partitions with type timestamp on tables created with delta-rs 0.10.0 #2631

Merged pull requests:

rust-v0.18.0 (2024-06-12)

Full Changelog

Implemented enhancements:

  • documentation: concurrent writes for non-S3 backends #2556
  • pyarrow options for write_delta #2515
  • [deltalake_aws] Allow configuring separate endpoints for S3 and DynamoDB clients. #2498
  • Include file stats when converting a parquet directory to a Delta table #2490
  • Adopt the delta kernel types #2489

Fixed bugs:

  • raise_if_not_exists for properties not configurable on CreateBuilder #2564
  • write_deltalake with rust engine fails when mode is append and overwrite schema is enabled #2553
  • Running the basic_operations examples fails with Error: Transaction { source: WriterFeaturesRequired(TimestampWithoutTimezone) } #2552
  • invalid peer certificate: BadSignature when connecting to s3 from arm64/aarch64 #2551
  • load_cdf() issue : Generic S3 error: request or response body error: operation timed out #2549
  • write_deltalake fails on Databricks volume #2540
  • Getting "Microsoft Azure Error: Operation timed out" when trying to retrieve big files #2537
  • Impossible to append to a DeltaTable with float data type on RHEL #2520
  • Creating DeltaTable object slow #2518
  • write_deltalake throws parser error when using rust engine and big decimals #2510
  • TypeError: Object of type int64 is not JSON serializable when writing using a Pandas dataframe #2501
  • unable to read delta table when table contains both null and non-null add stats #2477
  • Commits on WriteMode::MergeSchema cause table metadata corruption #2468
  • S3 object store always returns IMDS warnings #2460
  • File skipping according to documentation #2427
  • LockClientError #2379
  • get_app_transaction_version() returns wrong result #2340
  • Property setting in create is not handled correctly #2247
  • Handling of decimals in scientific notation #2221
  • Unable to append to delta table without datafusion feature #2204
  • Decimal Column with Value 0 Causes Failure in Python Binding #2193

Merged pull requests:

rust-v0.17.3 (2024-05-01)

Full Changelog

Implemented enhancements:

  • Limit concurrent ObjectStore access to avoid resource limitations in constrained environments #2457
  • How to get a DataFrame in Rust? #2404
  • Allow checkpoint creation when partion column is "timestampNtz " #2381
  • is there a way to make writing timestamp_ntz optional #2339
  • Update arrow dependency #2328
  • Release GIL in deltalake.write_deltalake #2234
  • Unable to retrieve custom metadata from tables in rust #2153
  • Refactor commit interface to be a Builder #2131

Fixed bugs:

  • Handle rate limiting during write contention #2451
  • regression : delta.logRetentionDuration don't seems to be respected #2447
  • Issue writing to mounted storage in AKS using delta-rs library #2445
  • TableMerger - when_matched_delete() fails when Column names contain special characters #2438
  • Generic DeltaTable error: External error: Arrow error: Invalid argument error: arguments need to have the same data type - while merge data in to delta table #2423
  • Merge on predicate throw error on date colum: Unable to convert expression to string #2420
  • Writing Tables with Append mode errors if the schema metadata is different #2419
  • Logstore issues on AWS Lambda #2410
  • Datafusion timestamp type doesn't respect delta lake schema #2408
  • Compacting produces smaller row groups than expected #2386
  • ValueError: Partition value cannot be parsed from string. #2380
  • Very slow s3 connection after 0.16.1 #2377
  • Merge update+insert truncates a delta table if the table is big enough #2362
  • Do not add readerFeatures or writerFeatures keys under checkpoint files if minReaderVersion or minWriterVersion do not satisfy the requirements #2360
  • Create empty table failed on rust engine #2354
  • Getting error message when running in lambda: message: "Too many open files" #2353
  • Temporary files filling up _delta_log folder - increasing table load time #2351
  • compact fails with merged schemas #2347
  • Cannot merge into table partitioned by date type column on 0.16.3 #2344
  • Merge breaks using logical datatype decimal128 #2343
  • Decimal types are not checked against max precision/scale at table creation #2331
  • Merge update+insert truncates a delta table #2320
  • Extract add.stats_parsed with wrong type #2312
  • Process fails without error message when executing merge #2310
  • delta_rs don't seems to respect the row group size #2309
  • Auth error when running inside VS Code #2306
  • Unable to read deltatables with binary columns: Binary is not supported by JSON #2302
  • Schema evolution not coercing with Large arrow types #2298
  • Panic in deltalake_core::kernel::snapshot::log_segment::list_log_files_with_checkpoint::{{closure}} #2290
  • Checkpoint does not preserve reader and writer features for the table protocol. #2288
  • Z-Order with larger dataset resulting in memory error #2284
  • Successful writes return error when using concurrent writers #2279
  • Rust writer should raise when decimal types are incompatible (currently writers and puts table in invalid state) #2275
  • Generic DeltaTable error: Version mismatch with new schema merge functionality in AWS S3 #2262
  • DeltaTable is not resilient to corrupted checkpoint state #2258
  • Inconsistent units of time #2256
  • Partition column comparison is an assertion rather than if block with raise exception #2242
  • Unable to merge column names starting from numbers #2230
  • Merging to a table with multiple distinct partitions in parallel fails #2227
  • cleanup_metadata not respecting custom logRetentionDuration #2180
  • Merge predicate fails with a field with a space #2167
  • When_matched_update causes records to be lost with explicit predicate #2158
  • Merge execution time grows exponetially with the number of column #2107
  • _internal.DeltaError when merging #2084

rust-v0.17.1 (2024-03-06)

Full Changelog

Implemented enhancements:

  • Get statistics metadata #2233
  • add option to append only a subsets of columns #2212
  • add documentation how to configure delta.logRetentionDuration #2072
  • Add drop constraint #2070
  • Add 0.16 deprecation warnings for DynamoDB lock #2049

Fixed bugs:

  • cleanup_metadata not respecting custom logRetentionDuration #2180
  • Rust writer panics on empty record batches #2253
  • DeltaLake executed Rust: write method not found in DeltaOps #2244
  • DELTA_FILE_PATTERN regex is incorrectly matching tmp commit files #2201
  • Failed to create checkpoint with "Parquet does not support writing empty structs" #2189
  • Error when parsing delete expressions #2187
  • terminate called without an active exception #2184
  • Now conda-installable on M1 #2178
  • Add error message for parition_by check #2177
  • deltalake 0.15.2 prints partitions_values and paths which is not desired #2176
  • cleanup_metadata can potentially delete most recent checkpoint, corrupting table #2174
  • Broken filter for newly created delta table #2169
  • Hash for StructField should consider more than the name #2045
  • Schema comparaison in writer #1853
  • fix(python): sort before schema comparison #2209 (ion-elgreco)
  • fix: prevent writing checkpoints with a version that does not exist in table state #1863 (rtyler)

Closed issues:

  • Bug/Question: arrow'sFixedSizeList is not roundtrippable #2162

Merged pull requests:

rust-v0.17.0 (2024-02-06)

⚠️ The release of 0.17.0 removes the legacy dynamodb lock functionality, AWS users must read these release notes! ⚠️

File handlers

The 0.17.0 release moves storage implementations into their own crates, such as deltalake-aws. A consequence of that refactoring is that custom storage and file scheme handlers must be registered/initialized at runtime. Storage subcrates conventionally define a register_handlers function which performs that task. Users may see errors such as:

thread 'main' panicked at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/deltalake-core-0.17.0/src/table/builder.rs:189:48:
The specified table_uri is not valid: InvalidTableLocation("Unknown scheme: s3")
  • Users of the meta-crate (deltalake) can call the storage crate via: deltalake::aws::register_handlers(None); at the entrypoint for their code.
  • Users who adopt core and storage crates independently (e.g. deltalake-aws) can register via deltalake_aws::register_handlers(None);.

The AWS, Azure, and GCP crates must all have their custom file schemes registered in this fashion.

dynamodblock to S3DynamoDbLogStore

The locking mechanism is fundamentally different between deltalake v0.16.x and v0.17.0, starting with this release the deltalake and deltalake-aws crates this library now relies on the same protocol for concurrent writes on AWS as the Delta Lake/Spark implementation.

Fundamentally the DynamoDB table structure changes, which is documented here. The configuration of a Rust process should continue to use the AWS_S3_LOCKING_PROVIDER environment value of dynamodb. The new table must be specified with the DELTA_DYNAMO_TABLE_NAME environment or configuration variable, and that should name the new S3DynamoDbLogStore compatible DynamoDB table.

Because locking is required to ensure safe cconsistent writes, there is no iterative migration, 0.16 and 0.17 writers cannot safely coexist. The following steps should be taken when upgrading:

  1. Stop all 0.16.x writers
  2. Ensure writes are completed, and lock table is empty.
  3. Deploy 0.17.0 writers

Full Changelog

Implemented enhancements:

  • Expose the ability to compile DataFusion with SIMD #2118
  • Updating Table log retention configuration with write_deltalake silently changes nothing #2108
  • ALTER table, ALTER Column, Add/Modify Comment, Add/remove/rename partitions, Set Tags, Set location, Set TBLProperties #2088
  • Docs: Update docs for check constraints #2063
  • Don't ensure_table_uri when creating a table with_log_store #2036
  • Exposing custom_metadata in merge operation #2031
  • Support custom table properties via TableAlterer and write/merge #2022
  • Remove parquet2 crate support #2004
  • Merge operation that only touches necessary partitions #1991
  • store userMetadata on write operations #1990
  • Create Dask integration page #1956
  • Merge: Filtering on partitions #1918
  • Rethink the load_version and load_with_datetime interfaces #1910
  • docs: Delta Lake + Arrow Integration #1908
  • docs: Delta Lake + Polars integration #1906
  • Rethink decision to expose the public interface in namespaces #1900
  • Add documentation on how to build and run documentation locally #1893
  • Add API to create an empty Delta Lake table #1892
  • Implementing CHECK constraints #1881
  • Check Invariants are respecting table features for write paths #1880
  • Organize docs with single lefthand sidebar #1873
  • Make sure invariants are handled properly throughout the codebase #1870
  • Unable to use deltalake Schema in write_deltalake #1862
  • Add a Rust-backed engine for write_deltalake #1861
  • Run doctest in CI for Python API examples #1783
  • [RFC] Use arrow for checkpoint reading and state handling #1776
  • Expose Python exceptions in public module #1771
  • Expose cleanup_metadata or create_checkpoint_from_table_uri_and_cleanup to the Python API #1768
  • Expose convert_to_delta to Python API #1767
  • Add high-level checking for append-only tables #1759

Fixed bugs:

  • Row order no longer preserved after merge operation #2165
  • Error when reading delta table with IDENTITY column #2152
  • Merge on IS NULL condition doesn't work for empty table #2148
  • JsonWriter converts structured parsing error into plain string #2143
  • Pandas import error when merging tables #2112
  • test_repair_on_update broken in main #2109
  • WriteBuilder::with_input_execution_plan does not apply the schema to the log's metadata fields #2105
  • MERGE logical plan vs execution plan schema mismatch #2104
  • Partitions not pushed down #2090
  • Cant create empty table with write_deltalake #2086
  • Unexpected high costs on Google Cloud Storage #2085
  • Unable to read s3 table: Unknown scheme: s3 #2065
  • write_deltalake not respecting writer_properties #2064
  • Unable to read/write tables with the "gs" schema in the table_uri in 0.15.1 #2060
  • LockClient requiered error for S3 backend in 0.15.1 python #2057
  • Error while writing Pandas DataFrame to Delta Lake (S3) #2051
  • Error with dynamo locking provider on 0.15 #2034
  • Conda version 0.15.0 is missing files #2021
  • Rust panicking through Python library when a delete predicate uses a nullable field #2019
  • No snapshot or version 0 found, perhaps /Users/watsy0007/resources/test_table/ is an empty dir? #2016
  • Generic DeltaTable error: type_coercion in Struct column in merge operation #1998
  • Constraint expr not formatted during commit action #1971
  • .load_with_datetime() is incorrectly rounding to nearest second #1967
  • vacuuming log files #1965
  • Unable to merge uppercase column names #1960
  • Schema error: Invalid data type for Delta Lake: Null #1946
  • Python v0.14 wheel files not up to date #1945
  • python Release 0.14 is missing Windows wheels #1942
  • CI integration test fails randomly: test_restore_by_datetime #1925
  • Merge data freezes indefenetely #1920
  • Load DeltaTable from non-existing folder causing empty folder creation #1916
  • Reoptimizes merge bins with only 1 file, even though they have no effect. #1901
  • The Python Docs link in README.MD points to old docs #1898
  • optimize.compact() fails with bad schema after updating to pyarrow 8.0 #1889
  • Python build is broken on main #1856
  • Checkpoint error with Azure Synapse #1847
  • merge very slow compared to delete + append on larger dataset #1846
  • get_add_actions fails with deltalake 0.13 #1835
  • Handle PyArrow CVE-2023-47248 #1834
  • Delta-rs writer hangs with to many file handles open (Azure) #1832
  • Encountering NotATable("No snapshot or version 0 found, perhaps xxx is an empty dir?") #1831
  • write_deltalake is not creating checkpoints #1815
  • Problem writing tables in directory named with char ~ #1806
  • DeltaTable Merge throws in merging if there are uppercase in Schema. #1797
  • rust merge error - datafusion panics #1790
  • expose use_dictionary=False when writing Delta Table and running optimize #1772

Closed issues:

  • Is this print necessary? Can we remove this. #2110
  • Azure concurrent writes #2069
  • Fix docs deployment #1867
  • Add a header in old docs and direct users to new docs #1865

rust-v0.16.5 (2023-11-15)

Full Changelog

Implemented enhancements:

  • When will upgrade object_store to 0.8? #1858
  • No Official Help #1849
  • Auto assign GitHub issues with a "take" message #1791

Fixed bugs:

  • cargo clippy fails on core in main #1843

rust-v0.16.4 (2023-11-12)

Full Changelog

Implemented enhancements:

  • Unable to add deltalake git dependency to cargo.toml #1821

rust-v0.16.3 (2023-11-08)

Full Changelog

Implemented enhancements:

  • Docs: add release GitHub action #1799
  • Use bulk deletes where possible #1761

Fixed bugs:

  • Code Owners no longer valid #1794
  • MERGE works incorrectly with partitioned table if the data column order is not same as table column order #1787
  • errors when using pyarrow dataset as a source #1779
  • Write to Microsoft OneLake failed. #1764

rust-v0.16.2 (2023-10-21)

Full Changelog

rust-v0.16.1 (2023-10-21)

Full Changelog

rust-v0.16.0 (2023-09-27)

Full Changelog

Implemented enhancements:

  • Expose Optimize option min_commit_interval in Python #1640
  • Expose create_checkpoint_for #1513
  • integration tests regularly fail for HDFS #1428
  • Add Support for Microsoft OneLake #1418
  • add support for atomic rename in R2 #1356

Fixed bugs:

  • Writing with large arrow types (e.g. large_utf8), writes wrong partition encoding #1669
  • [python] Different stringification of partition values in reader and writer #1653
  • Unable to interface with data written from Spark Databricks #1651
  • get_last_checkpoint does some unnecessary listing #1643
  • PartitionWriter's buffer_len doesn't include incomplete row groups #1637
  • Slack community invite link has expired #1636
  • delta-rs does not appear to support tables with liquid clustering #1626
  • Internal Parquet panic when using a Map type. #1619
  • partition_by with "$" on local filesystem #1591
  • ProtocolChanged error when perfoming append write #1585
  • Unable to cargo update using git tag or rev on Rust 1.70 #1580
  • NoMetadata error when reading detlatable #1562
  • Cannot read delta table: Delta protocol violation #1557
  • Update the CODEOWNERS to capture the current reviewers and contributors #1553
  • [Python] Incorrect file URIs when partition values contain escape character #1533
  • add documentation how to Query Delta natively from datafusion #1485
  • Python: write_deltalake to ADLS Gen2 issue #1456
  • Partition values that have been url encoded cannot be read when using deltalake #1446
  • Error optimizing large table #1419
  • Cannot read partitions with special characters (including space) with pyarrow >= 11 #1393
  • ImportError: deltalake/_internal.abi3.so: cannot allocate memory in static TLS block #1380
  • Invalid JSON in log record missing field schemaString for DLT tables #1302
  • Special characters in partition path not handled locally #1299

Merged pull requests:

  • chore: bump rust crate version #1675 (rtyler)
  • fix: change partitioning schema from large to normal string for pyarrow<12 #1671 (ion-elgreco)
  • feat: allow to set large dtypes for the schema check in write_deltalake #1668 (ion-elgreco)
  • docs: small consistency update in guide and readme #1666 (ion-elgreco)
  • fix: exception string in writer.py #1665 (sebdiem)
  • chore: increment python library version #1664 (wjones127)
  • docs: fix some typos #1662 (ion-elgreco)
  • fix: more consistent handling of partition values and file paths #1661 (roeap)
  • docs: add docstring to protocol method #1660 (MrPowers)
  • docs: make docs.rs build docs with all features enabled #1658 (simonvandel)
  • fix: enable offset listing for s3 #1654 (eeroel)
  • chore: fix the incorrect Slack link in our readme #1649 (rtyler)
  • fix: compensate for invalid log files created by Delta Live Tables #1647 (rtyler)
  • chore: proposed updated CODEOWNERS to allow better review notifications #1646 (rtyler)
  • feat: expose min_commit_interval to optimize.compact and optimize.z_order #1645 (ion-elgreco)
  • fix: avoid excess listing of log files #1644 (eeroel)
  • fix: introduce support for Microsoft OneLake #1642 (rtyler)
  • fix: explicitly require chrono 0.4.31 or greater #1641 (rtyler)
  • fix: include in-progress row group when calculating in-memory buffer length #1638 (BnMcG)
  • chore: relax chrono pin to 0.4 #1635 (houqp)
  • chore: update datafusion to 31, arrow to 46 and object_store to 0.7 #1634 (houqp)
  • docs: update Readme #1633 (dennyglee)
  • chore: pin the chrono dependency #1631 (rtyler)
  • feat: pass known file sizes to filesystem in Python #1630 (eeroel)
  • feat: implement parsing for the new domainMetadata actions in the commit log #1629 (rtyler)
  • ci: fix python release #1624 (wjones127)
  • ci: extend azure timeout #1622 (wjones127)
  • feat: allow multiple incremental commits in optimize #1621 (kvap)
  • fix: change map nullable value to false #1620 (cmackenzie1)
  • Introduce the changelog for the last couple releases #1617 (rtyler)
  • chore: bump python version to 0.10.2 #1616 (wjones127)
  • perf: avoid holding GIL in DeltaFileSystemHandler #1615 (wjones127)
  • fix: don't re-encode paths #1613 (wjones127)
  • feat: use url parsing from object store #1592 (roeap)
  • feat: buffered reading of transaction logs #1549 (eeroel)
  • feat: merge operation #1522 (Blajda)
  • feat: expose create_checkpoint_for to the public #1514 (haruband)
  • docs: update Readme #1440 (roeap)
  • refactor: re-organize top level modules #1434 (roeap)
  • feat: integrate unity catalog with datafusion #1338 (roeap)

rust-v0.15.0 (2023-09-06)

Full Changelog

Implemented enhancements:

  • Configurable number of retries for transaction commit loop #1595

Fixed bugs:

  • Unable to read table using VM Managed Identity on Azure #1462
  • Unable to query by partition column #1445

Merged pull requests:

rust-v0.14.0 (2023-08-01)

Full Changelog

Implemented enhancements:

  • Define common dependencies in Cargo Workspace #1572
  • Make delta_datafusion::find_files public #1559

Fixed bugs:

  • Excessive integration test sizes causing builds to fail #1550
  • Slack invite link is not working #1530

Merged pull requests:

rust-v0.13.1 (2023-07-18)

Fixed bugs:

  • Revert premature merge of an attempted fix for binary column statistics #1544

rust-v0.13.0 (2023-07-15)

Full Changelog

Implemented enhancements:

  • Add nested struct supports #1518
  • Support FixedLenByteArray UUID statistics as a logical scalar #1483
  • Exposing create_add in the API #1458
  • Update features table on README #1404
  • docs(python): show data catalog options in Python API reference #1347
  • Add optimization to only list log files starting at a certain name #1252
  • Support configuring parquet compression #1235
  • parallel processing in Optimize command #1171

Fixed bugs:

  • get_add_actions() MAX is not showing complete value #1534
  • Can't get stats's minValues in add actions #1515
  • Pyarrow is_null filter not working as expected after loading using deltalake #1496
  • Can't write to table that uses generated columns #1495
  • Json error: Binary is not supported by JSON when writing checkpoint files #1493
  • _last_checkpoint size field is incorrect #1468
  • Error when Z Ordering a larger dataset #1459
  • Timestamp parsing issue #1455
  • File options are ignored when writing delta #1444
  • Slack Invite Link No Longer Valid #1425
  • cleanup_metadata doesn't remove .checkpoint.parquet files #1420
  • The test of reading the data from the blob storage located in Azurite container failed #1415
  • The test of reading the data from the bucket located in Minio container failed #1408
  • Datafusion: unreachable code reached when parsing statistics with missing columns #1374
  • vacuum is very slow on Cloudflare R2 #1366

Closed issues:

  • Expose Compression Options or WriterProperties for writing to Delta #1469
  • Support out-of-core Z-order using DataFusion #1460
  • Expose Z-order in Python #1442

Merged pull requests:

rust-v0.12.0 (2023-05-30)

Full Changelog

Implemented enhancements:

  • Release delta-rs 0.11.0 (next release after 0.10.0) #1362
  • Support writing statistics for date columns in Rust #1209

Fixed bugs:

  • Rust writer in operations makes a lot of data copies #1394
  • Unable to read timestamp fields from column statistics #1372
  • Unable to write custom metadata via configuration since version 0.9.0 #1353
  • .get_add_actions() returns wrong column statistics when dataSkippingNumIndexedCols property of the table was changed #1223
  • Ensure decimal statistics are written correctly in Rust #1208

Merged pull requests:

  • feat: add list_with_offset to DeltaObjectStore #1410 (ognis1205)
  • chore: type-check friendlier exports #1407 (roeap)
  • chore: remove ancillary crates from the git tree #1406 (rtyler)
  • chore: bump the version for the next release #1405 (rtyler)
  • feat: more efficient parquet writer and more statistics #1397 (wjones127)
  • perf: improve record batch partitioning #1396 (roeap)
  • chore: bump datafusion to 25 #1389 (roeap)
  • refactor!: remove DeltaDataType aliases #1388 (cmackenzie1)
  • feat: vacuum with concurrent requests #1382 (wjones127)
  • feat: add datafusion storage catalog #1381 (roeap)
  • docs: updated schema.rs to use the right signature for decimal data type in documentation #1377 (rahulj51)
  • fix: delete operation when partition and non partition columns are used #1375 (Blajda)
  • fix: add conversion for string for Field::TimestampMicros (#1372) #1373 (cmackenzie1)
  • fix: allow user defined config keys #1365 (roeap)
  • ci: disable full debug symbol generation #1364 (roeap)
  • fix: include stats for all columns (#1223) #1342 (mrjoe7)

rust-v0.11.0 (2023-05-12)

Full Changelog

Implemented enhancements:

  • Implement simple delete case #832

Merged pull requests:

  • chore: update Rust package version #1346 (rtyler)
  • fix: replace deprecated arrow::json::reader::Decoder #1226 (rtyler)
  • feat: delete operation #1176 (Blajda)
  • feat: add wasbs to known schemes #1345 (iajoiner)
  • test: add some missing unit and doc tests for DeltaTablePartition #1341 (rtyler)
  • feat: write command improvements #1267 (roeap)
  • feat: added support for Databricks Unity Catalog #1331 (nohajc)
  • fix: double url encode of partition key #1324 (mrjoe7)

rust-v0.10.0 (2023-05-02)

Full Changelog

Implemented enhancements:

  • Support Optimize on non-append-only tables #1125

Fixed bugs:

  • DataFusion integration incorrectly handles partition columns defined "first" in schema #1168
  • Datafusion: SQL projection returns wrong column for partitioned data #1292
  • Unable to query partitioned tables #1291

Merged pull requests:

  • chore: add deprecation notices for commit logic on DeltaTable #1323 (roeap)
  • fix: handle local paths on windows #1322 (roeap)
  • fix: scan partitioned tables with datafusion #1303 (roeap)
  • fix: allow special characters in storage prefix #1311 (wjones127)
  • feat: upgrade to Arrow 37 and Datafusion 23 #1314 (rtyler)
  • Hide the parquet/json feature behind our own JSON feature #1307 (rtyler)
  • Enable the json feature for the parquet crate #1300 (rtyler)

rust-v0.9.0 (2023-04-14)

Full Changelog

Implemented enhancements:

  • hdfs support #300
  • Add decimal primitive type to document #1280
  • Improve error message when filtering on non-existant partition columns #1218

Fixed bugs:

  • Datafusion table provider: issues with timestamp types #441
  • Not matching column names when creating a RecordBatch from MapArray #1257
  • All stores created using DeltaObjectStore::new have an identical object_store_url #1188

Merged pull requests:

  • Upgrade datafusion to 22 which brings arrow upgrades with it #1249 (rtyler)
  • chore: df / arrow changes after update #1288 (roeap)
  • feat: read schema from parquet files in datafusion scans #1266 (roeap)
  • HDFS storage support via datafusion-objectstore-hdfs #1279 (iajoiner)
  • Add description of decimal primitive to SchemaDataType #1281 (ognis1205)
  • Fix names and nullability when creating RecordBatch from MapArray #1258 (balbok0)
  • Simplify the Store Backend Configuration code #1265 (mrjoe7)
  • feat: optimistic transaction protocol #632 (roeap)
  • Write support for additional Arrow datatypes #1044(chitralverma)
  • Unique delta object store url #1212 (gruuya)
  • improve err msg on use of non-partitioned column #1221 (marijncv)

rust-v0.8.0 (2023-03-10)

Full Changelog

Implemented enhancements:

  • feat(rust): support additional types for partition values #1170

Fixed bugs:

  • File pruning does not occur on partition columns #1175
  • Bug: Error loading Delta table locally #1157
  • Deltalake 0.7.0 with s3 feature compliation error due to rusoto_dynamodb version conflict #1191
  • Writing from a Delta table scan using WriteBuilder fails due to missing object store #1186

Merged pull requests:

rust-v0.7.0 (2023-02-11)

Full Changelog

Implemented enhancements:

  • Support FSCK REPAIR TABLE Operation #1092
  • Expose the Delta Log in a DataFrame that's easy for analysis #1031
  • Provide case-insensitive storage options in backend #999
  • Support local file path in CreateBuilder::with_location() #998
  • Save operational params in the same way with delta io #1054 (ismoshkov)

Fixed bugs:

  • DeltaTable DataFusion TableProvider does not support filter pushdown #1064
  • DeltaTable DataFusion scan does not prune files properly #1063
  • deltalake.DeltaTable constructor hangs in Jupyter #1093
  • Transaction log JSON formatting issue when writing data via Python bindings #1017
  • crates.io entry is missing link to rustdoc documentation #1076
  • URL Registered with ObjectStore registry is different from url in DeltaScan #1018
  • Not able to connect to Azure Storage with client id/secret #977
  • Deltalake 0.5 crate s3 feature dynamodb version mismatch #973
  • Overwrite mode does not work with Azure #939
  • Use Chrono without default features #914
  • cargo test does not run due to tls conflict #985
  • Azure SAS authorization fails with <AuthenticationErrorDetail>Signature fields not well formed. #910

Merged pull requests:

  • Make rustls default across all packages #1097 (wjones127)
  • Implement filesystem check #1103 (Blajda)
  • refactor: move vacuum command to operations module #1045 (roeap)
  • feat: enable passing storage options to Delta table builder via DataFusion's CREATE EXTERNAL TABLE #1043 (gruuya)
  • feat: improve storage location handling #1065 (roeap)
  • Fix to support UTC timezone #1022 (andrei-ionescu)
  • feat: harmonize and simplify storage configuration #1052 (roeap)
  • feat: expose function to get table of add actions #1033 (wjones127)
  • fix: change unexpected field logging level to debug #1112 (houqp)
  • fix: datafusion predicate pushdown and dependencies #1071 (roeap)
  • fix: azure sas key url encoding #1036 (roeap)
  • Add provisional workaround to support CDC #1039 #1042 (Fazzani)
  • improve debuggability of json ser/de errors #1119 (houqp)
  • Add an example of writing to a delta table with a RecordBatch #1085 (rtyler)
  • minor: optimize partition lookup for vacuum loop #1120 (houqp)
  • Add missing documentation metadata to Cargo.toml #1077 (johnbatty)
  • add test for null_count_schema_for_fields #1135 (marijncv)
  • add test for min_max_schema_for_fields #1122 (marijncv)
  • add test for get_boolean_from_metadata #1121 (marijncv)
  • add test for left_larger_than_right #1110 (marijncv)
  • Add test for: to_scalar_value #1086 (marijncv)
  • Fix typo in delta-inspect #1072 (byteink)
  • chore: update datafusion #1114 (roeap)

rust-v0.6.0 (2022-12-16)

Full Changelog

Implemented enhancements:

  • Support Apache Arrow DataFusion 15 #1020
  • Python package: Loosen version requirements for maturin #1004
  • Remove Cargo.lock from library crates and add Cargo.lock to binary ones #1000
  • More frequent Rust releases #969
  • Thoughts on adding read_delta to pandas #869
  • Add the support of the AWS_PROFILE environment variable for S3 #986 (fvaleye)

Fixed bugs:

  • Azure SAS signatures ending in "=" don't work #1003
  • Fail to compile deltalake crate, need to update dynamodb_lock in crates.io #1002
  • error reading delta table to pandas: runtime dropped the dispatch task #975
  • MacOS arm64 wheels are generated incorrectly #972
  • Overwrite creates new file #960
  • The written delta file has corrupted structure #956
  • Write mode doesn't work with Azure storage #955
  • Python: We don't error on reader protocol v2 #886
  • Cannot open a deltatable in S3 using AWS_PROFILE based credentials from a local machine #855

Merged pull requests:

* This Changelog was automatically generated by github_changelog_generator