Python Polars 0.20.8
🏆 Highlights
- Implemented tree formatting for LogicalPlan (#14221)
⚠️ Deprecations
- Deprecate positional args in
pivot
to prepare new functionality (#14428)
🚀 Performance improvements
- Combine small chunks in sinks for streaming pipelines (#14346)
- reduce heap allocs in expression/logical-plan iteration (#14440)
- simplify and speed up cum_sum and cum_prod (#14409)
- simplify negated predicates to improve row groups skipping (#14370)
✨ Enhancements
- Increase verbosity of duplicate column error message (#11899)
- change print to warn in reading csv from python file like object (#14469)
- Raise if
pivot
would introduce duplicate column names (#14431) - apply negate in simplify expression pass (#14436)
- restrict more cloud interop to semaphore budget (#14435)
- Implement
min
/max
for categorical dtype (#14112) - Hide
polars.testing.*
in pytest stack traces (#14399) - expose numpy view to integer types (#14405)
- Allow column name input in
clip
(#14410) - add boolean rle decoding for parquet (#14403)
- Allow brackets in SQL join conditions (#14263)
- Implemented tree formatting for LogicalPlan (#14221)
- Implement
mean_horizontal
expression (#14369) - support decimal comparison (#14338)
- Implements
arr.shift
(#14298) - Implements
list.n_unique
(#14306) - Do not panic when casting from an empty Series to pl.Decimal (#14330)
- unset WRITEABLE flag in zero-copy output (#14283)
- Support
Categorical/Enum
inSeries.to_numpy
(#14275) - add parametric testing support for the
Array
dtype (#14265)
🐞 Bug fixes
- don't gc after variadic buffers are written (#14473)
- Increase verbosity of duplicate column error message (#11899)
- Return appropriate data type for duration
mean
andmedian
(#14376) - change print to warn in reading csv from python file like object (#14469)
- regression in out-of-core group-by by new string-type (#14464)
- DataFrame.pivot was returning incorrect results when multiple columns were passed to
index
and one of them was Struct (#14438) - remove literal
Series
from projection state (#14437) - pivot was producing incorrect results when (single)
index
was Struct (#14308) - Error on some invalid
clip
inputs (#14416) - Series.hist panicking on empty/all-null (#14407)
- rechunk series when apply_lambda (#14406)
- Raise if invalid strategy is passed to
map_elements
(#14397) - Require exact checking for Decimals in assertion utils (#14357)
- fix ufunc for unlimited column args (#14328)
- Handle chunked Series in
Series.to_numpy
(#14341) - Remove duplicated content in error messages (#8107)
- Fix
set_operation
if the input is sliced and be broadcast (#14303) - Wrap
par_iter
inlist.to_struct
byPOOL.install
(#14304) - Do not panic when casting from an empty Series to pl.Decimal (#14330)
- Preserve name when casting to Enum (#14320)
list.get
does not work on list of decimals (#14276)- relax precision when up scaling (#14270)
- Allow format object series with registry (#14272)
📖 Documentation
- Update
read_database
docstring note about getting the connection URI string for sqlalchemy (#14461) - Fix typo in plugins section (#14402)
- Add debugging section to contributing docs (#10576)
- Define what a 'character' means in
slice
/len_chars
(#14395) - Clarify behavior of
DataFrame.rows_by_key
(#14149) - Fix some typos (#14394)
- Realign file structure of user guide (#14360)
- Rust examples for data structures in user guide (#14339)
- Add deprecation period policy example for post-1.0.0 (#14184)
- Add example for
Series.bin.contains
(#14297) - Small clarifications in the contributing guide (#14310)
- Fix capitalization of user guide references (#14291)
- Fix explode docstring mentioning String types (#14285)
- Update deltalake docstrings to new link (#14282)
🛠️ Other improvements
- Ignore unclosed file warnings for now (#14467)
- Raise better error in import timings test (#14441)
- Refactor
arg_min/max
test case (#14439) - Skip some OOC tests that fail randomly in the CI (#14434)
- Bump release drafter to v6 (#14429)
- Set specific temp dir for OOC tests (#14420)
- Bump
setup-graphviz
action to v2 (#14418) - Minor test refactor (#14404)
- Update
make clean
command (#14408) - Internal rename of
_or
toor_
in PyO3 (same for_xor/_and
) (#14393) - Minor refactor of
DataFrame.to_numpy
structured code (#14348) - Update
Series.to_numpy
to handle Decimal/Time types in Rust (#14296) - Add test for
Series.to_numpy
with timezones (#14337) - Bump ruff version to 0.2.0 (#14294)
- Temporarily fix failing deltalake test (#14288)
- remove dataframe consortium standard api entrypoint (#14279)
Thank you to all our contributors for making this release possible!
@BGR360, @CaselIT, @MarcoGorelli, @migi, @NedJWestern, @Vincenthays, @alexander-beedie, @deanm0000, @dependabot, @dependabot[bot], @engdoreis, @flisky, @grinya007, @itamarst, @janosh, @kalekundert, @lukemanley, @mbuhidar, @mcrumiller, @petrosbar, @r-brink, @rben01, @reswqa, @ritchie46, @stinodego, @taki-mekhalfa and @thomasfrederikhoeck