Releases: delta-io/delta-rs
python-v0.15.3
Bug Fixes
- fix: rm println in python lib by @ion-elgreco in #2166
- fix(python): skip empty row groups during stats gathering by @ion-elgreco in #2172
Other Changes
Full Changelog: python-v0.15.2...python-v0.15.3
python-v0.15.2: predicate overwrite, improved table state replay
New features
- feat: allow merge_execute to release the GIL by @emcake in #2091
- feat: arrow backed log replay and table state by @roeap in #2037
- feat: update table config to contain new config keys by @roeap in #2127
- feat: expose stats schema on Snapshot by @roeap in #2128
- feat: implementation for replaceWhere by @r3stl355 in #1996
- feat: implement clone for DeltaTable struct by @mightyshazam in #2160
- feat: introduce schema evolution on RecordBatchWriter by @rtyler in #2024
Bug Fixes
- fix: properly deserialize percent-encoded file paths of Remove actions, to make sure tombstone and file paths match by @sigorbor in #2035
- fix: reinstate copy-if-not-exists passthrough by @emcake in #2083
- refactor: add deltalake-gcp crate by @ion-elgreco in #2061
- fix: schema issue within writebuilder by @universalmind303 in #2106
- fix: temporarily skip s3 roundtrip test by @roeap in #2124
- fix: set partition values for added files when building compaction plan by @alexwilcoxson-rel in #2119
- fix: clean-up paths created during tests by @roeap in #2126
- fix: add missing pandas import by @Tim-Haarman in #2116
- fix: order logical schema to match physical schema by @Blajda in #2129
- fix: do not write empty parquet file/add on writer close; accurately … by @alexwilcoxson-rel in #2123
- fix: prevent empty stats struct during parquet write by @alexwilcoxson-rel in #2125
- fix(#2143): keep specific error type when writing fails by @abaerptc in #2144
- fix(s3): restore working test for DynamoDb log store repair log on read by @dispanser in #2120
- fix: made generalize_filter less permissive, also added more cases by @emcake in #2149
- fix: allow loading of tables with identity columns by @rtyler in #2155
- fix: replace BTreeMap with IndexMap to preserve insertion order by @roeap in #2150
Other Changes
- chore(deps): update serial_test requirement from 2 to 3 by @dependabot in #2052
- chore: update documentation for S3 / DynamoDb log store configuration by @dispanser in #2041
- docs: make an overview tab visible in docs by @r3stl355 in #2080
- docs: update docs for rust print statement by @skariyania in #2077
- docs: add usage guide for check constraints by @hntd187 in #2079
- docs: add page on why to use delta lake by @MrPowers in #2076
- docs: how delta lake transactions work by @MrPowers in #2089
- docs: move dynamo docs into new docs page by @ion-elgreco in #2093
- docs: delta lake file skipping by @MrPowers in #2096
- chore: removed unnecessary print statement from update method by @LilMonk in #2111
- chore: temporarily ignore the repair on update test by @rtyler in #2114
- chore: bump python by @ion-elgreco in #2092
- docs: add dask page to integration docs by @avriiil in #2122
- docs: fix arg indent by @wchatx in #2103
- docs: delta lake is great for small data by @MrPowers in #2113
- docs: use transparent logo in README by @roeap in #2132
- chore: shorten up the crate folder names in the tree by @rtyler in #2145
- refactor(python): drop custom filesystem in write_deltalake by @ion-elgreco in #2137
- chore: upgrade to DataFusion 35.0 by @philippemnoel in #2121
- chore: cleanup minor clippies and other warns by @rtyler in #2161
- fix: allow checkpoints to contain metadata actions without a createdTime value by @rtyler in #2059
New Contributors
- @skariyania made their first contribution in #2077
- @LilMonk made their first contribution in #2111
- @alexwilcoxson-rel made their first contribution in #2119
- @Tim-Haarman made their first contribution in #2116
- @avriiil made their first contribution in #2122
- @wchatx made their first contribution in #2103
- @abaerptc made their first contribution in #2144
- @philippemnoel made their first contribution in #2121
- @mightyshazam made their first contribution in #2160
Full Changelog: python-v0.15.1...python-v0.15.2
python-v0.15.1
New features
- feat(python, rust): expose custom_metadata for all operations by @ion-elgreco in #2032
- feat: refactor WriterProperties class by @ion-elgreco in #2030
- refactor: increase metadata action usage by @roeap in #2027
- feat(rust): add more commit info to most operations by @ion-elgreco in #2009
- feat(python): add schema conversion of FixedSizeBinaryArray and FixedSizeList by @balbok0 in #2005
- feat: retry with exponential backoff for DynamoDb interaction by @dispanser in #1975
Bug Fixes
- fix: ensure metadata cleanup do not corrupt tables without checkpoints by @Blajda in #2044
- fix: remove casts of structs to record batch by @Blajda in #2033
- fix: use temporary table names during the constraint checks by @r3stl355 in #2017
Other Changes
- chore: refactoring AWS code out of the core crate by @rtyler in #1995
- refactor: move azure integration to dedicated crate by @roeap in #2023
- docs: update docs for merge by @Blajda in #2042
- chore: update datafusion by @roeap in #2029
- fix: github actions for releasing docs by @r3stl355 in #2026
- docs: add alterer by @ion-elgreco in #2014
- docs: add writer properties to docs by @ion-elgreco in #2002
Full Changelog: python-v0.15.0...python-v0.15.1
python-v0.15.0: check constraints operation, and faster MERGE
New features
- feat: merge using partition filters by @emcake in #1958
- feat: omit unmodified files during merge write by @Blajda in #1969
- feat: check constraints by @hntd187 in #1915
- feat(python): expose
add constraint
operation by @ion-elgreco in #1973 - feat: add kernel ExpressionEvaluator by @roeap in #1829
- feat: implement S3 log store with transactions backed by DynamoDb by @dispanser in #1904
- feat(python): add writer_properties to all operations by @ion-elgreco in #1980
- feat(python): combine load_version/load_with_datetime into
load_as_version
by @ion-elgreco in #1968 - feat: update to include pyarrow-hotfix by @dennyglee in #1930
- feat: cast list items to default before write with different item names by @JonasDev1 in #1959
- feat(python): expose large_dtype param in
merge
@ion-elgreco in #2003 - feat(python): expose custom metadata to writers @ion-elgreco in #1994
Bug Fixes
- fix: respect case sensitivity on operations by @Blajda in #1954
- fix: case sensitivity for z-order by @Blajda in #1982
- fix: implement consistent formatting for constraint expressions by @Blajda in #1985
- fix: remove the get_data_catalog() function by @rtyler in #1941
- fix: handle empty table response in unity api by @JonasDev1 in #1963
- fix: flakey gcs test by @roeap in #1987
- fix: enable S3 integration tests to be configured via environment vars by @dispanser in #1966
- fix: properly decode percent-encoded file paths coming from parquet checkpoints by @sigorbor in #1970
Breaking Changes
To control the writer properties in .update
you need to pass the deltalake.WriterProperties class instead of a dicationary.
Other Changes
- chore: relocate cast_record_batch into its own module to shed the datafusion dependency by @rtyler in #1955
- chore: break Glue support into its own crate without rusoto by @rtyler in #1825
- docs: add polars integration by @MrPowers in #1949
- docs: add better installation instructions by @MrPowers in #1951
- docs: datafusion integration by @MrPowers in #1993
- fix: add arrow page back by @ion-elgreco in #1944
- refactor: trigger metadata retrieval only during
DeltaTable.metadata
by @ion-elgreco in #1979 - docs: add auto-release when docs are merged to main by @r3stl355 in #1962
- chore: fix CI breaking lint issues by @r3stl355 in #1999
New Contributors
- @JonasDev1 made their first contribution in #1963
Full Changelog: python-v0.14.0...python-v0.15.0
python-v0.14.0
New features
- feat: adopt kernel schemas and improve protocol support by @roeap in #1756
- feat: drop python 3.7 and adopt 3.12 by @roeap in #1859
- feat: expose cleanup_metadata in Python by @r3stl355 in #1826
- feat: handle protocol compatibility by @roeap in #1807
- feat(python): expose
convert_to_deltalake
by @ion-elgreco in #1842 - feat(python): add pyarrow to delta compatible schema conversion in writer/merge by @ion-elgreco in #1820
- feat(python): expose create to DeltaTable class by @ion-elgreco in #1932
- feat(python): expose rust writer as additional engine v2 by @ion-elgreco in #1891
- feat: extend write_deltalake to accept Deltalake schema by @r3stl355 in #1922
Bug Fixes
- fix: prevent writing checkpoints with a version that does not exist in table state by @rtyler in #1863
- fix: checkpoint error with Azure Synapse by @PierreDubrulle in #1848
- fix: improve catalog failure error message, add missing Glue native-tls feature dependency by @r3stl355 in #1883
- fix: use physical name for column name lookup in partitions by @aersam in #1836
- fix(rust/python):
optimize.compact
not working with tables with mixed large/normal arrow by @ion-elgreco in #1926 - fix: support os.PathLike for table references by @bolkedebruin in #1809
- fix: add buffer flushing to filesystem writes by @r3stl355 in #1911
- fix: fail fast for opening non-existent path by @dimonchik-suvorov in #1917
- fix: compare timestamp partition values as timestamps instead of strings by @sigorbor in #1895
- fix: add high-level checking for append-only tables by @junjunjd in #1887
- fix: prune each merge bin with only 1 file by @haruband in #1902
- fix: get rid of panic in during table by @dimonchik-suvorov in #1928
Other Changes
- docs: add release action by @r3stl355 in #1801
- chore: upgrade to the latest dynamodb-lock crate by @rtyler in #1816
- refactor: default logstore implementation by @dispanser in #1742
- ci: adopt
ruff format
for formatting by @roeap in #1841 - chore: upgrade datafusion 33 by @Blajda in #1775
- docs: add docs on small file compaction with optimize by @MrPowers in #1850
- refactor: express log schema in delta types by @roeap in #1876
- refactor: merge to use logical plans by @Blajda in #1720
- chore: create benchmarks for merge by @Blajda in #1857
- docs: on append, overwrite, delete and z-ordering by @MrPowers in #1897
- docs: update python docs link in readme.md by @thomasfrederikhoeck in #1899
- docs: update docs home page and add pandas integration by @MrPowers in #1905
- ci: run doctest in CI for Python API examples by @marijncv in #1840
- docs: delta lake arrow integration page by @MrPowers in #1914
- docs: fix all examples and change overall structure by @ion-elgreco in #1931
- refactor: simplify
DeltaTableState
by @roeap in #1877 - docs: add logo, dark mode, boost search by @ion-elgreco in #1936
- refactor: prefer usage of metadata and protocol fields by @roeap in #1935
New Contributors
- @PierreDubrulle made their first contribution in #1848
- @thomasfrederikhoeck made their first contribution in #1899
- @dimonchik-suvorov made their first contribution in #1917
- @sigorbor made their first contribution in #1895
- @bolkedebruin made their first contribution in #1809
Full Changelog: python-v0.13.0...python-v0.14.0
rust-v0.16.5
This release includes a number of minor bug fixes including one for users of create_checkpoint_for()
which previously allowed the caller to specify a version which did not match the loaded table state, leading to incorrect _last_checkjpoint
files and a broke Delta table.
rust-v0.16.4
The v0.16.4 version of the deltalake
crate contains one notable and important fix: an upgrade to the dynamodb_lock crate to v0.6.1.
That release changes the expected of the format for leaseDuration
in DynamoDb from String
to Number
, which is a long-overlooked bug in the lock code which prevented stale locks from being reaped automatically using DynamoDb's TTL attribute
Pre-existing locks should be properly respected by this newer version of dynamodb_lock however the consequences of a lock not being respected can result in data corruption of Delta tables. It is therefore recommended that when upgrading:
- All writers using a given DynamoDb table for locking are stopped
- DynamoDb is inspected and stale locks are cleared.
- TTL is enabled on the table on the leaseDuration attribute (adjust if the application uses a different attribute name for lease duration).
- Writers are restarted.
python-v0.13.0: Repair operation and PyArrow 13+ support
New features
- feat(python): expose FSCK (repair) operation by @ion-elgreco in #1730
- feat: add VACUUM operation as commit in transaction log by @ion-elgreco in #1728
- fix(python): add support for pyarrow 13+ by @ion-elgreco in #1804
- feat(python): allow python objects to be passed as new values in
.update()
by @ion-elgreco in #1749 - feat(python): allow for multiple
when
calls in MERGE operation by @ion-elgreco in #1750
Bug fixes
- fix: ignore binary columns for stats generation by @emcake in #1766
- fix(python): add write support explicitly for pyarrow dataset by @ion-elgreco in #1780
- fix(python): ignore infinity in stats by @wjones127 in #1784
- fix: delta scan partition ordering bug by @Blajda in #1789
Other changes
- fix: relax
pyarrow
pin by @dhirschfeld in #1743 - fix: remove
pandas
pin by @dhirschfeld in #1746 - refactor!: update operations to use delta scan by @Blajda in #1639
- chore: update datafusion by @roeap in #1741
- docs: convert docs to use mkdocs by @r3stl355 in #1731
- docs: dynamodb lock configuration by @brayanjuls in #1752
- refactor: perform bulk deletes during metadata cleanup by @cmackenzie1 in #1763
- docs: enhance docs to enable multi-lingual examples by @r3stl355 in #1781
- chore: refactor into the deltalake meta crate and deltalake-core crates by @rtyler in #1774
- feat: add deltalake sql crate by @roeap in #1757
- feat: initial table features implementation by @hntd187 in #1796
- docs: add CI for docs by @r3stl355 in #1798
New Contributors
- @dhirschfeld made their first contribution in #1743
- @brayanjuls made their first contribution in #1752
- @emcake made their first contribution in #1766
- @hntd187 made their first contribution in #1796
Full Changelog: python-v0.12.0...python-v0.13.0
python-v0.12.0: Delete, Update, and Merge
What's Changed
New features
- feat: allow to set large dtypes for the schema check in
write_deltalake
by @ion-elgreco in #1668 - feat(python): expose delete operation by @guilhem-dvr in #1687
- feat(python): expose UPDATE operation by @ion-elgreco in #1694
- feat(python): expose MERGE operation by @ion-elgreco in #1685
- feat: add version number in
.history()
and display in reversed chronological order by @ion-elgreco in #1710
Bug fixes
- fix: exception string in writer.py by @sebdiem in #1665
- fix: change partitioning schema from large to normal string for pyarrow<12 by @ion-elgreco in #1671
- fix: use epoch instead of ce for date stats by @universalmind303 in #1672
- fix: unify environment variables referenced by Databricks docs by @rtyler in #1673
- fix!: ensure predicates are parsable by @Blajda in #1690
- fix: merge operation with string predicates by @Blajda in #1705
- fix: reorder encode_partition_value() checks and add tests by @ldacey in #1733
Other contributions
- perf: improve read performance by 7x with prebuffer by @ion-elgreco in #1709
- docs: small consistency update in guide and readme by @ion-elgreco in #1666
- docs: fix typo in readme by @JosiahParry in #1696
- docs: add Python API reference to mkdocs by @wjones127 in #1563
- docs(python): document the delete operation by @guilhem-dvr in #1704
- docs: add a write example to delta.rs by @r3stl355 in #1711
- chore: remove deprecated functions by @wjones127 in #1735
Breaking changes
The DeltaTable.history()
method now returns transactions in reverse chronological order. This matches the Spark implementation.
DeltaTable.files_by_partitions()
has been removed. It has been deprecated since 0.7.0. Use DeltaTable.file_uris()
instead.
DeltaTable.pyarrow_schema()
has been removed. it has been deprecated since 0.7.0. Use DeltaTable.schema().to_pyarrow()
instead.
New Contributors
- @sebdiem made their first contribution in #1665
- @universalmind303 made their first contribution in #1672
- @JosiahParry made their first contribution in #1696
- @r3stl355 made their first contribution in #1711
- @ldacey made their first contribution in #1733
Full Changelog: python-v0.11.0...python-v0.12.0
rust-v0.16.0
Implemented enhancements:
- Expose Optimize option min_commit_interval in Python #1640
- Expose create_checkpoint_for #1513
- integration tests regularly fail for HDFS #1428
- Add Support for Microsoft OneLake #1418
- add support for atomic rename in R2 #1356
Fixed bugs:
- Writing with large arrow types (e.g. large_utf8), writes wrong partition encoding #1669
- [python] Different stringification of partition values in reader and writer #1653
- Unable to interface with data written from Spark Databricks #1651
get_last_checkpoint
does some unnecessary listing #1643PartitionWriter
'sbuffer_len
doesn't include incomplete row groups #1637- Slack community invite link has expired #1636
- delta-rs does not appear to support tables with liquid clustering #1626
- Internal Parquet panic when using a Map type. #1619
- partition_by with "$" on local filesystem #1591
- ProtocolChanged error when perfoming append write #1585
- Unable to
cargo update
using git tag or rev on Rust 1.70 #1580 - NoMetadata error when reading detlatable #1562
- Cannot read delta table:
Delta protocol violation
#1557 - Update the CODEOWNERS to capture the current reviewers and contributors #1553
- [Python] Incorrect file URIs when partition values contain escape character #1533
- add documentation how to Query Delta natively from datafusion #1485
- Python: write_deltalake to ADLS Gen2 issue #1456
- Partition values that have been url encoded cannot be read when using deltalake #1446
- Error optimizing large table #1419
- Cannot read partitions with special characters (including space) with pyarrow >= 11 #1393
- ImportError: deltalake/_internal.abi3.so: cannot allocate memory in static TLS block #1380
- Invalid JSON in log record missing field
schemaString
for DLT tables #1302 - Special characters in partition path not handled locally #1299
Merged pull requests:
- chore: bump rust crate version #1675 (rtyler)
- fix: change partitioning schema from large to normal string for pyarrow<12 #1671 (ion-elgreco)
- feat: allow to set large dtypes for the schema check in
write_deltalake
#1668 (ion-elgreco) - docs: small consistency update in guide and readme #1666 (ion-elgreco)
- fix: exception string in writer.py #1665 (sebdiem)
- chore: increment python library version #1664 (wjones127)
- docs: fix some typos #1662 (ion-elgreco)
- fix: more consistent handling of partition values and file paths #1661 (roeap)
- docs: add docstring to protocol method #1660 (MrPowers)
- docs: make docs.rs build docs with all features enabled #1658 (simonvandel)
- fix: enable offset listing for s3 #1654 (eeroel)
- chore: fix the incorrect Slack link in our readme #1649 (rtyler)
- fix: compensate for invalid log files created by Delta Live Tables #1647 (rtyler)
- chore: proposed updated CODEOWNERS to allow better review notifications #1646 (rtyler)
- feat: expose min_commit_interval to
optimize.compact
andoptimize.z_order
#1645 (ion-elgreco) - fix: avoid excess listing of log files #1644 (eeroel)
- fix: introduce support for Microsoft OneLake #1642 (rtyler)
- fix: explicitly require chrono 0.4.31 or greater #1641 (rtyler)
- fix: include in-progress row group when calculating in-memory buffer length #1638 (BnMcG)
- chore: relax chrono pin to 0.4 #1635 (houqp)
- chore: update datafusion to 31, arrow to 46 and object_store to 0.7 #1634 (houqp)
- docs: update Readme #1633 (dennyglee)
- chore: pin the chrono dependency #1631 (rtyler)
- feat: pass known file sizes to filesystem in Python #1630 (eeroel)
- feat: implement parsing for the new
domainMetadata
actions in the commit log #1629 (rtyler) - ci: fix python release #1624 (wjones127)
- ci: extend azure timeout #1622 (wjones127)
- feat: allow multiple incremental commits in optimize #1621 (kvap)
- fix: change map nullable value to false #1620 (cmackenzie1)
- Introduce the changelog for the last couple releases #1617 (rtyler)
- chore: bump python version to 0.10.2 #1616 (wjones127)
- perf: avoid holding GIL in DeltaFileSystemHandler #1615 (wjones127)
- fix: don't re-encode paths #1613 (wjones127)
- feat: use url parsing from object store #1592 (roeap)
- feat: buffered reading of transaction logs #1549 (eeroel)
- feat: merge operation #1522 (Blajda)
- feat: expose create_checkpoint_for to the public #1514 (haruband)
- docs: update Readme #1440 (roeap)
- refactor: re-organize top level modules #1434 (roeap)
- feat: integrate unity catalog with datafusion #1338 (roeap)