Releases: pola-rs/polars
Python Polars 1.0.0-rc.1
💥 Breaking changes
- Make
hive_partitioning
parameter default toNone
, which is automatically enabled for single directory inputs, and disabled otherwise (#17106) - Split
replace
functionality into two separate functions (#16921) - Default to writing binview data to IPC (#17084)
- Do not parse hive partitions from user provided directory/glob path (#17055)
- Remove re-export of type aliases (#17032)
- Add
strict
parameter toDataFrame/LazyFrame.drop
and fix behavior to default to True (#17044) - Rename
ModuleUpgradeRequired
andPolarsPanicError
error, removeInvalidAssert
error (#17033) - Change data orientation inference logic for DataFrame construction and warn when row orientation is inferred (#16976)
- Properly apply
strict
parameter in Series constructor (#16939) - Remove supertype definition of List and non-List types (#16918)
- Consistently convert to given time zone in Series constructor (#16828)
- Update
reshape
to return Array types instead of List types (#16825) - Default to raising on out-of-bounds indices in all
get
/gather
operations (#16841) - Native
selector
XOR set operation, guarantee consistent selector column-order (#16833) - Set
infer_schema_length
as keyword-only argument instr.json_decode
(#16835) - Update
set_sorted
to only accept a single column (#16800) - Remove deprecated parameters in
Series.cut/qcut
and update struct field names (#16741) - Expedited removal of certain deprecated functionality (#16754)
- Update some error types to more appropriate variants (#15030)
- Scheduled removal of deprecated functionality (#16715)
- Change default
offset
ingroup_by_dynamic
from 'negativeevery
' to 'zero' (#16658) - Constrain access to globals from
DataFrame.sql
in favor of top-levelpl.sql
(#16598) - Read 2D NumPy arrays as
Array
type instead ofList
(#16710) - Update
clip
to no longer propagate nulls in the given bounds (#14413) - Change
str.to_datetime
to default to microsecond precision for format specifiers"%f"
and"%.f"
(#13597) - Update resulting column names in
pivot
when pivoting by multiple values (#16439) - Preserve nulls in
ewm_mean
,ewm_std
, andewm_var
(#15503) - Restrict casting for temporal data types (#14142)
- Support Decimal types by default when converting from Arrow (#15324)
- Remove serde functionality from
pl.read_json
andDataFrame.write_json
(#16550) - Update function signature of
nth
to allow positional input of indices, removecolumns
parameter (#16510) - Rename struct fields of
rle
output tolen
/value
and update data type oflen
field (#15249) - Remove class variables from some DataTypes (#16524)
- Add
check_names
parameter toSeries.equals
and default toFalse
(#16610)
⚠️ Deprecations
- Deprecate
size
parameter in parametric testing strategies in favor ofmin_size
/max_size
(#17128) - Split
replace
functionality into two separate functions (#16921) - Rename
DataFrame.melt
tounpivot
and make parameters consistent withpivot
(#17095) - Remove re-export of exceptions at top-level (#17059)
- Deprecate
dt.mean
/dt.median
in favor ofmean
/median
(#16888) - Deprecate
LazyFrame.with_context
in favor of horizontal concatenation (#16860) - Rename parameter
descending
toreverse
intop_k
methods (#16817) - Rename
str.concat
tostr.join
and update default delimiter (#16790) - Deprecate
arctan2d
in favor ofarctan2(...).degrees()
(#16786)
🚀 Performance improvements
- Default to writing binview data to IPC (#17084)
- Parallelize arrow conversion if binview -> large_bin (#17083)
- GC buffers in if_then_else view kernel (#16993)
- Desugar
AND
filter into multiple nodes (#16992) - Optimize generic argsort of row-encoding (#16894)
- Improve rle_id iteration perf and set sorted flags (#16893)
- Optimize string/binary sort (#16871)
- Use
split_at
insplit
(#16865) - Use
split_at
instead of double slice in chunk splits. (#16856) - Don't rechunk in
align_
if arrays are aligned (#16850) - Don't create small chunks in parallel collect. (#16845)
- Add dedicated no-null branch in
arg_sort
(#16808) - Speed up
dt.offset_by
2x for constant durations (#16728) - Toggle coalesce in
join
if non-coalesced key isn't projected (#16677) - Make
dt.truncate
1.5x faster whenevery
is just a single duration (and not an expression) (#16666) - Always prune unused columns in semi/anti join (#16665)
✨ Enhancements
- Update
DataFrame.pivot
to allowindex=None
whenvalues
is set (#17126) - Make
hive_partitioning
parameter default toNone
, which is automatically enabled for single directory inputs, and disabled otherwise (#17106) - Improve ipython autocomplete for LazyFrame and DataFrame (#17091)
- Split
replace
functionality into two separate functions (#16921) - Improve schema inference for hive partitions (#17079)
- Rename
DataFrame.melt
tounpivot
and make parameters consistent withpivot
(#17095) - print row index in explain + dot (#17074)
- Support top-level
pl.col
autocompletion for iPython (#17080) - Remove re-export of exceptions at top-level (#17059)
- predicate + projection pushdown in NDJson (#17068)
- Allow (non-)coalescing in join_asof (#17066)
- Turn of coalescing and fix mutation of join on expressions (#17061)
- Expand NDJson glob into one SCAN (#17063)
- Do not parse hive partitions from user provided directory/glob path (#17055)
- Support directory paths in scans for Parquet, IPC and CSV (#17017)
- Implement general array equality checks (#17043)
- Add
strict
parameter toDataFrame/LazyFrame.drop
and fix behavior to default to True (#17044) - Rename
ModuleUpgradeRequired
andPolarsPanicError
error, removeInvalidAssert
error (#17033) - Add
rechunk
parameter toread_delta
(#16991) - allow experimental metadata use on release (#17005)
- first working prototype of new streaming engine (#16970)
- Add simple version of
json_normalize
(#17015) - Change data orientation inference logic for DataFrame construction and warn when row orientation is inferred (#16976)
- Desugar
AND
filter into multiple nodes (#16992) - Handle textio even if not correct (#16971)
- Properly apply
strict
parameter in Series constructor (#16939) - Add SQL support for
INTERSECT
andEXCEPT
ops (#16960) - Add
PerformanceWarning
to LazyFrame properties (#16964) - Add
collect_schema
method toLazyFrame
andDataFrame
(#16929) - Allow setting file cache TTL on a per-file basis (#16891)
- Support Decimal inputs for
lit
(#16950) - Implement multiply and division for lhs duration (#16948)
- Raise on invalid temporal arithmetic (#16934)
- Always end with a in-memory sink on collect (#16928)
- add style namespace (which defers to Great Tables) (#16809)
- Add
Schema
class (#16873) - Normalize
value_counts
(#16917) - add
eq
/ne
for moreFixedSizeList
s (#16902) - setup skeleton (#16900)
- add fundamentals for new async-based streaming execution engine (#16884)
- Cache downloaded cloud IPC files (#16892)
- Consistently convert to given time zone in Series constructor (#16828)
- Improve
read_csv
SQL table reading function defaults (better handle dates) (#16866) - Support SQL
VALUES
clause and inline renaming of columns in CTE & derived table definitions (#16851) - Support Python
Enum
values inlit
(#16858) - convert to give time zone in
.str.to_datetime
when values are offset-aware (#16742) - Update
reshape
to return Array types instead of List types (#16825) - Default to raising on out-of-bounds indices in all
get
/gather
operations (#16841) - Support
SQL
"SELECT" with no tables, optimise registration of globals (#16836) - Native
selector
XOR set operation, guarantee consistent selector column-order (#16833) - Extend recognised
EXTRACT
andDATE_PART
SQL part abbreviations (#16767) - Improve error message when raising integers to negative integers, improve docs (#16827)
- Return datetime for mean/median of Date colum (#16795)
- Update
set_sorted
to only accept a single column (#16800) - Expose overflowing cast (#16805)
- Update
group_by
iteration andpartition_by
to always return tuple keys (#16793) - Support array arithmetic for equally sized shapes (#16791)
- Expedited removal of certain deprecated functionality (2) (#16779)
- Removal of
read_database_uri
passthrough fromread_database
(#16783) - Remove
pyxlsb
engine fromread_database
(#16784) - Add
check_order
parameter toassert_series_equal
(#16778) - Enforce deprecation of keyword arguments as positional (#16755)
- Support cloud storage in
scan_csv
(#16674) - Streamline SQL
INTERVAL
handling and improve related error messages, updatesqlparser-rs
lib (#16744) - Support use of ordinal values in SQL
ORDER BY
clause (#16745) - Support executing polars SQL against
pandas
andpyarrow
objects (#16746) - Remove deprecated parameters in
Series.cut/qcut
and update struct field names (#16741) - Expedited removal of certain deprecated functionality (#16754)
- Remove deprecated functionality from rolling methods (#16750)
- Update
date_range
to no longer produce datetime ranges (#16734) - Mark
min_periods
as keyword-only forrolling
methods (#16738) - Remove deprecated
top_k
parametersnulls_last
,maintain_order
, andmultithreaded
(#16599) - Support order-by in window functions (#16743)
- Add SQL support for
NULLS FIRST/LAST
ordering (#16711) - Update some error types to more appropriate variants (#15030)
- Initial SQL support for
INTERVAL
strings (#16732) - Scheduled removal of deprecated functionality (2) (#16724)
- Scheduled removal of deprecated functionality (#16715)
- Enforce deprecation of
offset
arg intruncate
andround
(#16655) - Change default
offset
ingroup_by_dynamic
from 'negativeevery
' to 'zero' (#16658) - Constrain access to globals from
DataFrame.sql
in favor of top-levelpl.sql
(#16598) - Read 2D NumPy arrays as
Array
type instead ofList
(#16710) - Upda...
Rust Polars 0.41.0
💥 Breaking changes
- Make
hive_partitioning
parameter default toNone
, which is automatically enabled for single directory inputs, and disabled otherwise (#17106) - Split
replace
functionality into two separate functions (#16921) - Rename
DataFrame.melt
tounpivot
and make parameters consistent withpivot
(#17095) - Default to writing binview data to IPC (#17084)
- Do not parse hive partitions from user provided directory/glob path (#17055)
- Add
strict
parameter toDataFrame/LazyFrame.drop
and fix behavior to default to True (#17044) - Remove supertype definition of List and non-List types (#16918)
- Native
selector
XOR set operation, guarantee consistent selector column-order (#16833) - move offset_by implementation from polars-plan to polars-time, rename feature from DateOffset to OffsetBy (#16796)
- Rename
str.concat
tostr.join
and update default delimiter (#16790) - Remove deprecated parameters in
Series.cut/qcut
and update struct field names (#16741) - Expedited removal of certain deprecated functionality (#16754)
- Update some error types to more appropriate variants (#15030)
- Change default
offset
ingroup_by_dynamic
from 'negativeevery
' to 'zero' (#16658) - Update
clip
to no longer propagate nulls in the given bounds (#14413) - Change
str.to_datetime
to default to microsecond precision for format specifiers"%f"
and"%.f"
(#13597) - Update resulting column names in
pivot
when pivoting by multiple values (#16439) - Preserve nulls in
ewm_mean
,ewm_std
, andewm_var
(#15503) - Restrict casting for temporal data types (#14142)
- Rename struct fields of
rle
output tolen
/value
and update data type oflen
field (#15249) - Add
check_names
parameter toSeries.equals
and default toFalse
(#16610) - Deprecate
str.explode
in favor ofstr.split("").explode()
(#16508) - Deprecate
how="outer"
join type in favour ofhow="full"
(left/right are *also* outer joins) (#16417) - Change
DataFrame.is_empty()
to checkheight == 0
instead ofwidth == 0
(#16351)
🚀 Performance improvements
- Default to writing binview data to IPC (#17084)
- Parallelize arrow conversion if binview -> large_bin (#17083)
- GC buffers in if_then_else view kernel (#16993)
- Desugar
AND
filter into multiple nodes (#16992) - Optimize generic argsort of row-encoding (#16894)
- Improve rle_id iteration perf and set sorted flags (#16893)
- Optimize string/binary sort (#16871)
- Use
split_at
insplit
(#16865) - Use
split_at
instead of double slice in chunk splits. (#16856) - Don't rechunk in
align_
if arrays are aligned (#16850) - Don't create small chunks in parallel collect. (#16845)
- Add dedicated no-null branch in
arg_sort
(#16808) - Speed up
dt.offset_by
2x for constant durations (#16728) - Toggle coalesce in
join
if non-coalesced key isn't projected (#16677) - Make
dt.truncate
1.5x faster whenevery
is just a single duration (and not an expression) (#16666) - Always prune unused columns in semi/anti join (#16665)
- make truncate 4x faster in simple cases (#16615)
- Cache arena's (and conversion) in SQL context (#16566)
- Partial schema cache. (#16549)
- improved numeric fill_(forward/backward) (#16475)
- only rechunk once per aggregate (#16469)
- Fix pathological small chunk parquet writing (#16433)
✨ Enhancements
- Make
hive_partitioning
parameter default toNone
, which is automatically enabled for single directory inputs, and disabled otherwise (#17106) - Split
replace
functionality into two separate functions (#16921) - Improve schema inference for hive partitions (#17079)
- Rename
DataFrame.melt
tounpivot
and make parameters consistent withpivot
(#17095) - print row index in explain + dot (#17074)
- Support top-level
pl.col
autocompletion for iPython (#17080) - predicate + projection pushdown in NDJson (#17068)
- Allow (non-)coalescing in join_asof (#17066)
- Turn of coalescing and fix mutation of join on expressions (#17061)
- Expand NDJson glob into one SCAN (#17063)
- Do not parse hive partitions from user provided directory/glob path (#17055)
- Support directory paths in scans for Parquet, IPC and CSV (#17017)
- Implement general array equality checks (#17043)
- Add
strict
parameter toDataFrame/LazyFrame.drop
and fix behavior to default to True (#17044) - allow experimental metadata use on release (#17005)
- first working prototype of new streaming engine (#16970)
- Desugar
AND
filter into multiple nodes (#16992) - use min/max metadata on debug builds with
POLARS_METADATA_FLAGS=extensive
(#16963) - Add SQL support for
INTERSECT
andEXCEPT
ops (#16960) - Allow setting file cache TTL on a per-file basis (#16891)
- Implement multiply and division for lhs duration (#16948)
- Raise on invalid temporal arithmetic (#16934)
- Always end with a in-memory sink on collect (#16928)
- Normalize
value_counts
(#16917) - add
eq
/ne
for moreFixedSizeList
s (#16902) - setup skeleton (#16900)
- add fundamentals for new async-based streaming execution engine (#16884)
- Cache downloaded cloud IPC files (#16892)
- Improve
read_csv
SQL table reading function defaults (better handle dates) (#16866) - Support SQL
VALUES
clause and inline renaming of columns in CTE & derived table definitions (#16851) - convert to give time zone in
.str.to_datetime
when values are offset-aware (#16742) - Support
SQL
"SELECT" with no tables, optimise registration of globals (#16836) - Native
selector
XOR set operation, guarantee consistent selector column-order (#16833) - Extend recognised
EXTRACT
andDATE_PART
SQL part abbreviations (#16767) - Improve error message when raising integers to negative integers, improve docs (#16827)
- Return datetime for mean/median of Date colum (#16795)
- Expose overflowing cast (#16805)
- Expose a few more expression nodes in the expression IR (#16781)
- Support array arithmetic for equally sized shapes (#16791)
- Support cloud storage in
scan_csv
(#16674) - Streamline SQL
INTERVAL
handling and improve related error messages, updatesqlparser-rs
lib (#16744) - Support use of ordinal values in SQL
ORDER BY
clause (#16745) - Support executing polars SQL against
pandas
andpyarrow
objects (#16746) - add
env
locked metadata functions (#16719) - Remove deprecated parameters in
Series.cut/qcut
and update struct field names (#16741) - Expedited removal of certain deprecated functionality (#16754)
- Update
date_range
to no longer produce datetime ranges (#16734) - Remove deprecated
top_k
parametersnulls_last
,maintain_order
, andmultithreaded
(#16599) - Support order-by in window functions (#16743)
- Add SQL support for
NULLS FIRST/LAST
ordering (#16711) - Update some error types to more appropriate variants (#15030)
- Initial SQL support for
INTERVAL
strings (#16732) - Enforce deprecation of
offset
arg intruncate
andround
(#16655) - eliminate ProjectionExprs and handle CSE by stacking extra columns (#16682)
- Change default
offset
ingroup_by_dynamic
from 'negativeevery
' to 'zero' (#16658) - Update
clip
to no longer propagate nulls in the given bounds (#14413) - Change
str.to_datetime
to default to microsecond precision for format specifiers"%f"
and"%.f"
(#13597) - Update resulting column names in
pivot
when pivoting by multiple values (#16439) - Preserve nulls in
ewm_mean
,ewm_std
, andewm_var
(#15503) - Restrict casting for temporal data types (#14142)
- Add many more auto-inferable datetime formats for
str.to_datetime
(#16634) - Rename struct fields of
rle
output tolen
/value
and update data type oflen
field (#15249) - Add
check_names
parameter toSeries.equals
and default toFalse
(#16610) - Dedicated
SQLInterface
andSQLSyntax
errors (#16635) - Add
DIV
function support to the SQL interface (#16678) - add additional control to
write_parquet::statistics
parameter (#16575) - Support non-coalescing streaming left join (#16672)
- Allow wildcard and exclude before struct expansions (#16671)
- Support per-column
nulls_last
on sort operations (#16639) - Add
split_at
method to arrowArray
(#16620) - Initial support for SQL
ARRAY
literals and theUNNEST
table function (#16330) - Don't allow
struct.with_fields
in grouping (#16629) - Add SQL support for
TRY_CAST
function (#16589) - add fuzzer for expressions (#16581)
- handle CSE dtypes in NodeTraverser.get_dtype (#16552)
- check if by column is sorted, rather than just checking sorted flag, in
group_by_dynamic
,upsample
, androlling
(#16494) - Add general metadata structure to
ChunkedArray
(#16399) - Add
is_column_selection()
to expression meta, enhanceexpand_selector
(#16479) - NDarray/Tensor support (#16466)
- Allow designation of a custom name for the
value_counts
"count" column (#16434) - Default rechunk=False for read_parquet (#16427)
- Add
field
expression as selector with an struct scope (#16402) - Field expansion renaming (#16397)
- add cluster_with_columns plan optimization (#16274)
- Change
DataFrame.is_empty()
to checkheight == 0
instead ofwidth == 0
(#16351) - add Expr.interpolate_by (#16313)
🐞 Bug fixes
- Expand i128 primitive type match (#17076)
- Fix decompress_impl for csv with n_rows set (#17118)
- adds "polars-ops/timezones" dependency for "timezones" feature (#17115)
- Fix incorrect window std for chunked series (#17110)
- make
GetOutput::get_field
fallible (#17114) - bubble error when no available bitrepr (#17116)
- Fix melt panic (#17088)
- Exclude index from expansion in rolling/group_by_dynamic (#17086)
- fix #17043 binary compare (#17052)
- Fix oob of join with literals and empty table (#17047)
- Don't silently accept multi-table FROM clauses (implicit JOIN syntax) (#17028)
- fix get categories on multiple row groups (#17041...
Python Polars 1.0.0-beta.1
💥 Breaking changes
- Change data orientation inference logic for DataFrame construction and warn when row orientation is inferred (#16976)
- Properly apply
strict
parameter in Series constructor (#16939) - Remove supertype definition of List and non-List types (#16918)
- Consistently convert to given time zone in Series constructor (#16828)
- Update
reshape
to return Array types instead of List types (#16825) - Default to raising on out-of-bounds indices in all
get
/gather
operations (#16841) - Native
selector
XOR set operation, guarantee consistent selector column-order (#16833) - Set
infer_schema_length
as keyword-only argument instr.json_decode
(#16835) - Update
set_sorted
to only accept a single column (#16800) - Remove deprecated parameters in
Series.cut/qcut
and update struct field names (#16741) - Expedited removal of certain deprecated functionality (#16754)
- Update some error types to more appropriate variants (#15030)
- Scheduled removal of deprecated functionality (#16715)
- Change default
offset
ingroup_by_dynamic
from 'negativeevery
' to 'zero' (#16658) - Constrain access to globals from
DataFrame.sql
in favor of top-levelpl.sql
(#16598) - Read 2D NumPy arrays as
Array
type instead ofList
(#16710) - Update
clip
to no longer propagate nulls in the given bounds (#14413) - Change
str.to_datetime
to default to microsecond precision for format specifiers"%f"
and"%.f"
(#13597) - Update resulting column names in
pivot
when pivoting by multiple values (#16439) - Preserve nulls in
ewm_mean
,ewm_std
, andewm_var
(#15503) - Restrict casting for temporal data types (#14142)
- Support Decimal types by default when converting from Arrow (#15324)
- Remove serde functionality from
pl.read_json
andDataFrame.write_json
(#16550) - Update function signature of
nth
to allow positional input of indices, removecolumns
parameter (#16510) - Rename struct fields of
rle
output tolen
/value
and update data type oflen
field (#15249) - Remove class variables from some DataTypes (#16524)
- Add
check_names
parameter toSeries.equals
and default toFalse
(#16610)
⚠️ Deprecations
- Deprecate
dt.mean
/dt.median
in favor ofmean
/median
(#16888) - Deprecate
LazyFrame.with_context
in favor of horizontal concatenation (#16860) - Rename parameter
descending
toreverse
intop_k
methods (#16817) - Rename
str.concat
tostr.join
and update default delimiter (#16790) - Deprecate
arctan2d
in favor ofarctan2(...).degrees()
(#16786)
🚀 Performance improvements
- GC buffers in if_then_else view kernel (#16993)
- Desugar
AND
filter into multiple nodes (#16992) - Optimize generic argsort of row-encoding (#16894)
- Improve rle_id iteration perf and set sorted flags (#16893)
- Optimize string/binary sort (#16871)
- Use
split_at
insplit
(#16865) - Use
split_at
instead of double slice in chunk splits. (#16856) - Don't rechunk in
align_
if arrays are aligned (#16850) - Don't create small chunks in parallel collect. (#16845)
- Add dedicated no-null branch in
arg_sort
(#16808) - Speed up
dt.offset_by
2x for constant durations (#16728) - Toggle coalesce in
join
if non-coalesced key isn't projected (#16677) - Make
dt.truncate
1.5x faster whenevery
is just a single duration (and not an expression) (#16666) - Always prune unused columns in semi/anti join (#16665)
✨ Enhancements
- allow experimental metadata use on release (#17005)
- first working prototype of new streaming engine (#16970)
- Add simple version of
json_normalize
(#17015) - Change data orientation inference logic for DataFrame construction and warn when row orientation is inferred (#16976)
- Desugar
AND
filter into multiple nodes (#16992) - Handle textio even if not correct (#16971)
- Properly apply
strict
parameter in Series constructor (#16939) - Add SQL support for
INTERSECT
andEXCEPT
ops (#16960) - Add
PerformanceWarning
to LazyFrame properties (#16964) - Add
collect_schema
method toLazyFrame
andDataFrame
(#16929) - Allow setting file cache TTL on a per-file basis (#16891)
- Support Decimal inputs for
lit
(#16950) - Implement multiply and division for lhs duration (#16948)
- Raise on invalid temporal arithmetic (#16934)
- Always end with a in-memory sink on collect (#16928)
- add style namespace (which defers to Great Tables) (#16809)
- Add
Schema
class (#16873) - Normalize
value_counts
(#16917) - add
eq
/ne
for moreFixedSizeList
s (#16902) - setup skeleton (#16900)
- add fundamentals for new async-based streaming execution engine (#16884)
- Cache downloaded cloud IPC files (#16892)
- Consistently convert to given time zone in Series constructor (#16828)
- Improve
read_csv
SQL table reading function defaults (better handle dates) (#16866) - Support SQL
VALUES
clause and inline renaming of columns in CTE & derived table definitions (#16851) - Support Python
Enum
values inlit
(#16858) - convert to give time zone in
.str.to_datetime
when values are offset-aware (#16742) - Update
reshape
to return Array types instead of List types (#16825) - Default to raising on out-of-bounds indices in all
get
/gather
operations (#16841) - Support
SQL
"SELECT" with no tables, optimise registration of globals (#16836) - Native
selector
XOR set operation, guarantee consistent selector column-order (#16833) - Extend recognised
EXTRACT
andDATE_PART
SQL part abbreviations (#16767) - Improve error message when raising integers to negative integers, improve docs (#16827)
- Return datetime for mean/median of Date colum (#16795)
- Update
set_sorted
to only accept a single column (#16800) - Expose overflowing cast (#16805)
- Update
group_by
iteration andpartition_by
to always return tuple keys (#16793) - Support array arithmetic for equally sized shapes (#16791)
- Expedited removal of certain deprecated functionality (2) (#16779)
- Removal of
read_database_uri
passthrough fromread_database
(#16783) - Remove
pyxlsb
engine fromread_database
(#16784) - Add
check_order
parameter toassert_series_equal
(#16778) - Enforce deprecation of keyword arguments as positional (#16755)
- Support cloud storage in
scan_csv
(#16674) - Streamline SQL
INTERVAL
handling and improve related error messages, updatesqlparser-rs
lib (#16744) - Support use of ordinal values in SQL
ORDER BY
clause (#16745) - Support executing polars SQL against
pandas
andpyarrow
objects (#16746) - Remove deprecated parameters in
Series.cut/qcut
and update struct field names (#16741) - Expedited removal of certain deprecated functionality (#16754)
- Remove deprecated functionality from rolling methods (#16750)
- Update
date_range
to no longer produce datetime ranges (#16734) - Mark
min_periods
as keyword-only forrolling
methods (#16738) - Remove deprecated
top_k
parametersnulls_last
,maintain_order
, andmultithreaded
(#16599) - Support order-by in window functions (#16743)
- Add SQL support for
NULLS FIRST/LAST
ordering (#16711) - Update some error types to more appropriate variants (#15030)
- Initial SQL support for
INTERVAL
strings (#16732) - Scheduled removal of deprecated functionality (2) (#16724)
- Scheduled removal of deprecated functionality (#16715)
- Enforce deprecation of
offset
arg intruncate
andround
(#16655) - Change default
offset
ingroup_by_dynamic
from 'negativeevery
' to 'zero' (#16658) - Constrain access to globals from
DataFrame.sql
in favor of top-levelpl.sql
(#16598) - Read 2D NumPy arrays as
Array
type instead ofList
(#16710) - Update
clip
to no longer propagate nulls in the given bounds (#14413) - Change
str.to_datetime
to default to microsecond precision for format specifiers"%f"
and"%.f"
(#13597) - Update resulting column names in
pivot
when pivoting by multiple values (#16439) - Preserve nulls in
ewm_mean
,ewm_std
, andewm_var
(#15503) - Restrict casting for temporal data types (#14142)
- Add many more auto-inferable datetime formats for
str.to_datetime
(#16634) - Support Decimal types by default when converting from Arrow (#15324)
- Remove serde functionality from
pl.read_json
andDataFrame.write_json
(#16550) - Update function signature of
nth
to allow positional input of indices, removecolumns
parameter (#16510) - Rename struct fields of
rle
output tolen
/value
and update data type oflen
field (#15249) - Remove class variables from some DataTypes (#16524)
- Add
check_names
parameter toSeries.equals
and default toFalse
(#16610) - Dedicated
SQLInterface
andSQLSyntax
errors (#16635) - Add
DIV
function support to the SQL interface (#16678) - Support non-coalescing streaming left join (#16672)
- Allow wildcard and exclude before struct expansions (#16671)
🐞 Bug fixes
- properly catch not found explode cols (#17020)
- Correctly convert data frames to NumPy for C index order (#17000)
- Raise on invalid arithmetic shapes (#16986)
- Don't pushdown predicates in cross join if the refer to both tables (#16983)
- Fix projection pushdown with literal joins (#16981)
- Fix edge case in DataFrame constructor data orientation inference (#16975)
- Raise on list of objects (#16959)
- Handle strictness for Decimal Series construction (#15309)
- Don't panic in object to anyvalue (#16957)
- properly set
FAST_EXPLODE_LIST
metadata (#16951) - Raise informative error when writing object to file (#16954)
- Remove supertype definition of List and non-List types (#16918)
- Remove unwrap in
extend()
(#16890) - Fix
should_rechunk
check (#16852) - Ensure
read_excel
andread_ods
return identical frames across all engines when given empty spreadsheet tables (#16802) - Consistent behaviour when "infer_schema_length=0" for
read_excel
(#16840) - Standardised additional SQL interface errors (#16829)
- Ensure that splitted ChunkedArray also flattens chunks (#16837)
- Reduce needless panics in comparisons (#16831)
-...
Python Polars 1.0.0-alpha.1
💥 Breaking changes
- Consistently convert to given time zone in Series constructor (#16828)
- Update
reshape
to return Array types instead of List types (#16825) - Default to raising on out-of-bounds indices in all
get
/gather
operations (#16841) - Native
selector
XOR set operation, guarantee consistent selector column-order (#16833) - Set
infer_schema_length
as keyword-only argument instr.json_decode
(#16835) - Update
set_sorted
to only accept a single column (#16800) - Update
group_by
iteration andpartition_by
to always return tuple keys (#16793) - Default to
coalesce=False
in left outer join (#16769) - Remove
pyxlsb
engine fromread_database
(#16784) - Remove deprecated parameters in
Series.cut/qcut
and update struct field names (#16741) - Expedited removal of certain deprecated functionality (#16754)
- Remove deprecated
top_k
parametersnulls_last
,maintain_order
, andmultithreaded
(#16599) - Update some error types to more appropriate variants (#15030)
- Scheduled removal of deprecated functionality (#16715)
- Enforce deprecation of
offset
arg intruncate
andround
(#16655) - Change default
offset
ingroup_by_dynamic
from 'negativeevery
' to 'zero' (#16658) - Constrain access to globals from
DataFrame.sql
in favor of top-levelpl.sql
(#16598) - Read 2D NumPy arrays as multidimensional
Array
instead ofList
(#16710) - Update
clip
to no longer propagate nulls in the given bounds (#14413) - Change
str.to_datetime
to default to microsecond precision for format specifiers"%f"
and"%.f"
(#13597) - Update resulting column names in
pivot
when pivoting by multiple values (#16439) - Preserve nulls in
ewm_mean
,ewm_std
, andewm_var
(#15503) - Restrict casting for temporal data types (#14142)
- Support Decimal types by default when converting from Arrow (#15324)
- Remove serde functionality from
pl.read_json
andDataFrame.write_json
(#16550) - Update function signature of
nth
to allow positional input of indices, removecolumns
parameter (#16510) - Rename struct fields of
rle
output tolen
/value
and update data type oflen
field (#15249) - Remove class variables from some DataTypes (#16524)
- Add
check_names
parameter toSeries.equals
and default toFalse
(#16610)
⚠️ Deprecations
- Deprecate
LazyFrame.with_context
(#16860) - Rename parameter
descending
toreverse
intop_k
methods (#16817) - Rename
str.concat
tostr.join
(#16790) - Deprecate
arctan2d
(#16786)
🚀 Performance improvements
- Optimize string/binary sort (#16871)
- Use
split_at
insplit
(#16865) - Use
split_at
instead of double slice in chunk splits. (#16856) - Don't rechunk in
align_
if arrays are aligned (#16850) - Don't create small chunks in parallel collect. (#16845)
- Add dedicated no-null branch in
arg_sort
(#16808) - Speed up
dt.offset_by
2x for constant durations (#16728) - Toggle coalesce if non-coalesced key isn't projected (#16677)
- Make
dt.truncate
1.5x faster whenevery
is just a single duration (and not an expression) (#16666) - Always prune unused columns in semi/anti join (#16665)
✨ Enhancements
- Consistently convert to given time zone in Series constructor (#16828)
- Improve
read_csv
SQL table reading function defaults (better handle dates) (#16866) - Support SQL
VALUES
clause and inline renaming of columns in CTE & derived table definitions (#16851) - Support Python
Enum
values inlit
(#16858) - convert to give time zone in
.str.to_datetime
when values are offset-aware (#16742) - Update
reshape
to return Array types instead of List types (#16825) - Default to raising for oob on all
get
/gather
operations (#16841) - Support
SQL
"SELECT" with no tables, optimise registration of globals (#16836) - Native
selector
XOR set operation, guarantee consistent selector column-order (#16833) - Extend recognised
EXTRACT
andDATE_PART
SQL part abbreviations (#16767) - Improve error message when raising integers to negative integers, improve docs (#16827)
- Return datetime for mean/median of Date colum (#16795)
- Only accept a single column in
set_sorted
(#16800) - Expose overflowing cast (#16805)
- Update group-by iteration to always return tuple keys (#16793)
- Support array arithmetic for equally sized shapes (#16791)
- Default to
coalesce=False
in left outer join (#16769) - More removal of deprecated functionality (#16779)
- Removal of
read_database_uri
passthrough fromread_database
(#16783) - Remove
pyxlsb
engine fromread_database
(#16784) - Add
check_order
parameter toassert_series_equal
(#16778) - Enforce deprecation of keyword arguments as positional (#16755)
- Support cloud storage in
scan_csv
(#16674) - Streamline SQL
INTERVAL
handling and improve related error messages, updatesqlparser-rs
lib (#16744) - Support use of ordinal values in SQL
ORDER BY
clause (#16745) - Support executing polars SQL against
pandas
andpyarrow
objects (#16746) - Remove deprecated parameters in
Series.cut/qcut
(#16741) - Expedited removal of certain deprecated functionality (#16754)
- Remove deprecated functionality from rolling methods (#16750)
- Update
date_range
to no longer produce datetime ranges (#16734) - Mark
min_periods
as keyword-only forrolling
methods (#16738) - Remove deprecated
top_k
parameters (#16599) - Support order-by in window functions (#16743)
- Add SQL support for
NULLS FIRST/LAST
ordering (#16711) - Update some error types to more appropriate variants (#15030)
- Initial SQL support for
INTERVAL
strings (#16732) - More scheduled removal of deprecated functionality (#16724)
- Scheduled removal of deprecated functionality (#16715)
- Enforce deprecation of
offset
arg intruncate
andround
(#16655) - Change default of
offset
in group_by_dynamic from "negativeevery
" to "zero" (#16658) - Constrain access to globals from
df.sql
in favour of top-levelpl.sql
(#16598) - Read 2D numpy arrays as Array[dt, shape] instead of Listst[dt] (#16710)
- Activate decimal by default (#16709)
- Do not propagate nulls in
clip
bounds (#14413) - Change
.str.to_datetime
to default to microsecond precision for format specifiers"%f"
and"%.f"
(#13597) - Remove redundant column name when pivoting by multiple values (#16439)
- Preserve nulls in
ewm_mean
,ewm_std
, andewm_var
(#15503) - Restrict casting for temporal data types (#14142)
- Add many more auto-inferable datetime formats for
str.to_datetime
(#16634) - Support decimals by default when converting from Arrow (#15324)
- Remove serde functionality from
pl.read_json
andDataFrame.write_json
(#16550) - Update function signature of
nth
to allow positional input of indices, removecolumns
parameter (#16510) - Rename struct fields of
rle
output tolen
/value
and update data type oflen
field (#15249) - Remove default class variable values on DataTypes (#16524)
- Add
check_names
parameter toSeries.equals
and default toFalse
(#16610) - Dedicated
SQLInterface
andSQLSyntax
errors (#16635) - Add
DIV
function support to the SQL interface (#16678) - Support non-coalescing streaming left join (#16672)
- Allow wildcard and exclude before struct expansions (#16671)
🐞 Bug fixes
- Fix
should_rechunk
check (#16852) - Ensure
read_excel
andread_ods
return identical frames across all engines when given empty spreadsheet tables (#16802) - Consistent behaviour when "infer_schema_length=0" for
read_excel
(#16840) - Standardised additional SQL interface errors (#16829)
- Ensure that splitted ChunkedArray also flattens chunks (#16837)
- Reduce needless panics in comparisons (#16831)
- Reset if next caller clones inner series (#16812)
- Raise on non-positive json schema inference (#16770)
- Rewrite implementation of
top_k/bottom_k
and fix a variety of bugs (#16804) - Fix comparison of UInt64 with zero (#16799)
- Fix incorrect parquet statistics written for UInt64 values > Int64::MAX (#16766)
- Fix boolean distinct (#16765)
DATE_PART
SQL syntax/parsing, improve some error messages (#16761)- Include
pl.
qualifier for inner dtypes into_init_repr
(#16235) - Column selection wasn't applied when reading CSV with no rows (#16739)
- Panic on empty df / null List(Categorical) (#16730)
- Only flush if operator can flush in streaming outer join (#16723)
- Raise unsupported cat array (#16717)
- Assert SQLInterfaceError is raised (#16713)
- Restrict casting for temporal data types (#14142)
- Handle nested categoricals in
assert_series_equal
whencategorical_as_str=True
(#16700) - Improve
read_database
check for SQLAlchemy async Session objects (#16680) - Reduce scope of multi-threaded numpy conversion (#16686)
- Full null on dyn int (#16679)
- Fix filter shape on empty null (#16670)
📖 Documentation
- Update version switcher for 1.0.0 prereleases (#16847)
- Update link from Python API reference to user guide (#16849)
- Update docstring/test/etc usage of
select
andwith_columns
to idiomatic form (#16801) - Update versioning docs for 1.0.0 (#16757)
- Add docstring example for
DataFrame.limit
(#16753) - Fix incorrect stated value of
include_nulls
inDataFrame.update
docstring (#16701) - Update deprecation docs in the user guide (#14315)
- Add example for index count in
DataFrame.rolling
(#16600) - Improve docstring of
Expr/Series.map_elements
(#16079) - Add missing
polars.sql
docs entry and small docstring update (#16656)
🛠️ Other improvements
- Remove inner
Arc
fromFileCacheEntry
(#16870) - Do not update stable API reference on prerelease (#16846)
- Update links to API references (#16843)
- Prepare update of API reference URLs (#16816)
- Rename allow_overflow to wrap_numerical (#16807)
- Set
infer_schema_length
as keyword-only forstr.json_decode
(#168...
Python Polars 0.20.31
Important
The decision to change the default coalesce behavior of left join has been reversed.
You can ignore the associated deprecation warning.
⚠️ Deprecations
- Rename
dtypes
parameter toschema_overrides
forread_csv
/scan_csv
/read_csv_batched
(#16628) - Deprecate
nulls_last
/maintain_order
/multithreaded
parameters fortop_k
methods (#16597) - Rename
SQLContext
"eager_execution" param to "eager" (#16595) - Rename
Series.equals
parameterstrict
tocheck_dtypes
and rename assertion utils parametercheck_dtype
tocheck_dtypes
(#16573) - Add
DataFrame.serialize/deserialize
(#16545) - Deprecate
str.explode
in favor ofstr.split("").explode()
(#16508) - Deprecate default coalesce behavior of left join (#16532) - !! Reversed in 1.0.0 - see message above !!
🚀 Performance improvements
- make truncate 4x faster in simple cases (#16615)
- Cache arena's (and conversion) in SQL context (#16566)
- Partial schema cache. (#16549)
✨ Enhancements
- Support per-column
nulls_last
on sort operations (#16639) - Initial support for SQL
ARRAY
literals and theUNNEST
table function (#16330) - Don't allow
struct.with_fields
in grouping (#16629) - improve support for user-defined functions that return scalars (#16556)
- Add SQL support for
TRY_CAST
function (#16589) - Add top-level
pl.sql
function (#16528) - Expose temporal function expression ops to expr ir (#16546)
- Add
DataFrame.serialize/deserialize
(#16545) - check if by column is sorted, rather than just checking sorted flag, in
group_by_dynamic
,upsample
, androlling
(#16494)
🐞 Bug fixes
- Potentially deal with empty range (#16650)
- Use of SQL
ORDER BY
should not cause reordering ofSELECT
cols (#16579) - ensure df in empty parquet (#16621)
- Fix Array constructor when inner type is another Array (#16622)
- Fix parsing of
shape
inArray
constructor and deprecatewidth
parameter (#16567) - Crash using empty
Series
inLazyFrame.select()
(#16592) - improve support for user-defined functions that return scalars (#16556)
- Resolve multiple SQL
JOIN
issues (#16507) - Project last column if count query (#16569)
- Properly split struct columns (#16563)
- Ensure strict chunking in chunked partitioned group by (#16561)
- Error selecting columns after non-coalesced join (multiple join keys) (#16559)
- Don't panic on hashing nested list types (#16555)
- Crash selecting columns after non-coalesced join (#16541)
- Fix group gather of single literal (#16539)
- throw an invalid operation exception on performing a
sum
over alist
ofstr
s (#16521) - Fix
DataFrame.__getitem__
for empty list input -df[[]]
(#16520) - Fix issue in
DataFrame.__getitem__
with 2 column inputs (#16517)
📖 Documentation
- Overview of available SQL functions (#16268)
- Update filter description to clarify that null evaluations are removed (#16632)
- Include warning in docstrings that accessing
LazyFrame
properties may be expensive (#16618) - Add a few
versionadded
tags, and addis_column_selection
to the Expr meta docs (#16590) - Fix bullet points not rendering correctly in
DataFrame.join
docstring (#16576) - Remove erroneous
implode
reference from the user guide section on window functions (#16544)
📦 Build system
- Run
cargo update
(#16574)
🛠️ Other improvements
- Add test for 16642 (#16646)
- Remove duplicate tag in CODEOWNERS (#16625)
- Update dprint hook versions and enable JSON linting (#16611)
- Fewer
typing.no_type_check
(#16497)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @coastalwhite, @hattajr, @itamarst, @mcrumiller, @nameexhaustion, @r-brink, @ritchie46, @stinodego, @twoertwein and @wence-
Python Polars 0.20.30
⚠️ Deprecations
- Add
Series/Expr.has_nulls
and deprecateSeries.has_validity
(#16488) - Deprecate
tree_format
parameter forLazyFrame.explain
in favor offormat
(#16486)
🚀 Performance improvements
✨ Enhancements
- Minor
DataFrame.__getitem__
improvements (#16495) - Add
is_column_selection()
to expression meta, enhanceexpand_selector
(#16479) - Add
Series/Expr.has_nulls
and deprecateSeries.has_validity
(#16488) - NDarray/Tensor support (#16466)
🐞 Bug fixes
- Fix df.chunked for struct (#16504)
- Mix of column and field expansion (#16502)
- Fix
split_chunks
for nested dtypes (#16493) - Fix handling NaT values when creating Series from NumPy ndarray (#16490)
- Fix boolean trap issue in
top_k
/bottom_k
(#16489) - Handle struct.fields as special case of alias (#16484)
- Correct schema for list.sum (#16483)
- allow search_sorted directly on multiple chunks, and fix behavior around nulls (#16447)
- Fix use of
COUNT(*)
in SQLGROUP BY
operations (#16465) - respect
nan_to_null
when using multi-thread inpl.from_pandas
(#16459) - write_delta() apparently does support Categorical columns (#16454)
📖 Documentation
- Update the Overview section of the contributing guide (#15674)
- Use
pl.field
insidewith_fields
examples. (#16451) - Change ordering of values in example for
cum_max
(#16456)
🛠️ Other improvements
- Refactor
Series/DataFrame.__getitem__
logic (#16482)
Thank you to all our contributors for making this release possible!
@BGR360, @alexander-beedie, @cmdlineluser, @coastalwhite, @itamarst, @marenwestermann, @mdavis-xyz, @messense, @orlp, @ritchie46 and @stinodego
Python Polars 0.20.29
⚠️ Deprecations
- Deprecate
how="outer"
join type in favour ofhow="full"
(left/right are *also* outer joins) (#16417)
🚀 Performance improvements
- Fix pathological small chunk parquet writing (#16433)
✨ Enhancements
- Support zero-copy conversion for temporal types in
DataFrame.to_numpy
(#16429) - Allow designation of a custom name for the
value_counts
"count" column (#16434) - Default rechunk=False for read_parquet (#16427)
- Add "ignore_spaces" to
alpha
andalphanumeric
selectors, add "ascii_only" todigit
(#16362) - Update
__array__
method for Series and DataFrame to supportcopy
parameter (#16401)
🐞 Bug fixes
- add cluster_with_columns optimization toggle in python (#16446)
- Fix struct 'with_fields' schema for update dtypes (#16428)
- Fix error reading lists of CSV files that contain comments (#16426)
- make read_parquet() respect rechunk flag when using pyarrow (#16418)
- Improve
read_excel
dtype inference of "calamine" int/float results that include NaN (#16400) - Update
apply
call instr_duration_
util. (#16412)
📖 Documentation
Thank you to all our contributors for making this release possible!
@KDruzhkin, @alexander-beedie, @ankane, @cmdlineluser, @coastalwhite, @itamarst, @nameexhaustion, @ritchie46 and @stinodego
Python Polars 0.20.28
⚠️ Deprecations
- Deprecate
use_pyarrow
parameter forto_numpy
methods (#16391)
✨ Enhancements
- Add
field
expression as selector with an struct scope (#16402) - Field expansion renaming (#16397)
- Respect index order in
DataFrame.to_numpy
also for non-numeric frames (#16390) - add Expr.interpolate_by (#16313)
- Implement Struct support for
Series.to_numpy
(#16383)
🐞 Bug fixes
- Fix struct arithmetic schema (#16396)
- Handle non-Sequence iterables in filter (#16254)
- Fix don't panic on chunked to_numpy conversion (#16393)
- Don't check nulls before conversion (#16392)
- Add support for generalized ufunc with different size input and output (#16336)
- Improve cursor close behaviour with respect to Oracle "thick mode" connections (#16380)
- Fix
DataFrame.to_numpy
for Array/Struct types (#16386) - Handle ambiguous/nonexistent datetimes in Series construction (#16342)
- Fix
DataFrame.to_numpy
for Struct columns whenstructured=True
(#16358) - Use strings to expose
ClosedInterval
in expr IR (#16369)
📖 Documentation
- Expand docstrings for
to_numpy
methods (#16394) - Add a not about index access on struct.field (#16389)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @coastalwhite, @dangotbanned, @itamarst, @ritchie46, @stinodego and @wence-
Rust Polars 0.40.0
💥 Breaking changes
- Remove incremental read based batched CSV reader (#16259)
- separate
rolling_*_by
fromrolling_*(..., by=...)
in Rust (#16102) - Move CSV read options from
CsvReader
toCsvReadOptions
(#16126) - Rename all 'Chunk's to RecordBatch (#16063)
- prepare for join coalescing argument (#15418)
- Rename to
CsvParserOptions
toCsvReaderOptions
, use inCsvReader
(#15919) - Add context trace to
LazyFrame
conversion errors (#15761) - Move schema resolving of file scan to IR phase (#15739)
- Move schema resolving to IR phase. (#15714)
- Rename LogicalPlan and builders to reflect their uses better (#15712)
🚀 Performance improvements
- Use branchless uleb128 decoding for parquet (#16352)
- Reduce error bubbling in parquet hybrid_rle (#16348)
- use is_sorted in ewm_mean_by, deprecate check_sorted (#16335)
- Optimize
is_sorted
for numeric data (#16333) - do not use pyo3-built (#16309)
- Faster bitpacking for Parquet writer (#16278)
- Avoid importing
ctypes.util
in CPU check script if possible (#16307) - Don't rechunk when converting DataFrame to numpy/ndarray (#16288)
- use zeroed vec in ewm_mean_by for sorted fastpath (#16265)
- use zeroable_vec in ewm_mean_by (#16166)
- Improve cost of chunk_idx compute (#16154)
- Don't rechunk by default in
concat
(#16128) - Ensure rechunk is parallel (#16127)
- Don't traverse deep datasets that we repr as union in CSE (#16096)
- Ensure better chunk sizes (#16071)
- Don't rechunk in parallel collection (#15907)
- Improve non-trivial list aggregations (#15888)
- Ensure we hit specialized gather for binary/strings (#15886)
- Limit the cache size for
to_datetime
(#15826) - skip initial null items and don't recompute
slope
ininterpolate
(#15819) - Fix quadratic in binview growable same source (#15734)
✨ Enhancements
- Raise when joining on the same keys twice (#16329)
- Don't require data to be sorted by
by
column inrolling_*_by
operations (#16249) - Add struct.field expansion (regex, wildcard, columns) (#16320)
- Faster bitpacking for Parquet writer (#16278)
- Add
struct.with_fields
(#16305) - Handle implicit SQL string → temporal conversion in the
BETWEEN
clause (#16279) - Add new index/range based selector
cs.by_index
, allow multiple indices fornth
(#16217) - Show warning if expressions are very deep (#16233)
- Native CSV file list reading (#16180)
- Register memory mapped files and raise when written to (#16208)
- Raise when encountering invalid supertype in functions during conversion (#16182)
- Add SQL support for
GROUP BY ALL
syntax and fix several issues with aliased group keys (#16179) - Allow implicit string → temporal conversion in SQL comparisons (#15958)
- separate
rolling_*_by
fromrolling_*(..., by=...)
in Rust (#16102) - Add run-length encoding to Parquet writer (#16125)
- add date pattern
dd.mm.YYYY
(#16045) - Add RLE to
RLE_DICTIONARY
encoder (#15959) - Support non-coalescing joins in default engine (#16036)
- Move diagonal & horizontal concat schema resolving to IR phase (#16034)
- raise more informative error messages in rolling_* aggregations instead of panicking (#15979)
- Convert concat during IR conversion (#16016)
- Improve dynamic supertypes (#16009)
- Additional
uint
datatype support for the SQL interface (#15993) - Support Decimal read from IPC (#15965)
- Add typed collection from par iterators (#15961)
- Add
by
argument forExpr.top_k
andExpr.bottom_k
(#15468) - Add option to disable globbing in csv (#15930)
- Add option to disable globbing in parquet (#15928)
- Rename to
CsvParserOptions
toCsvReaderOptions
, use inCsvReader
(#15919) - Expressify
dt.round
(#15861) - Improve error messages in context stack (#15881)
- Add dynamic literals to ensure schema correctness (#15832)
dt.truncate
supports broadcasting lhs (#15768)- Expressify
str.json_path_match
(#15764) - Support decimal float parsing in CSV (#15774)
- Add context trace to
LazyFrame
conversion errors (#15761)
🐞 Bug fixes
- correct AExpr.to_field for bitwise and logical and/or (#16360)
- cargo clippy for uleb128 safety comment (#16368)
- Infer CSV schema as supertype of all files (#16349)
- Address overflow combining u64 hashes in Debug builds (#16323)
- Don't exclude explicitly named columns in group-by context' expr expansion (#16318)
- Harden
Series.reshape
against invalid parameters (#16281) - Fix list.sum dtype for boolean (#16290)
- Don't stackoverflow on all/any horizontal (#16287)
- compilation error when both lazy and ipc features are enabled (#16284)
- `rolling_*_by was throwing incorrect error when dataframe was sorted by contained multiple chunks (#16247)
- Clippy Error for CPUID (#16241)
- Reading CSV with low_memory gave no data (#16231)
- Empty unique (#16214)
- Fix empty drop nulls (#16213)
- Fix get expression group-by state (#16189)
- Fix rolling empty group OOB (#16186)
- offset=-0i was being treated differently to offset=0i in rolling (#16184)
- Fix panic on empty frame joins (#16181)
- Fix streaming glob slice (#16174)
- Fix CSV skip_rows_after_header for streaming (#16176)
- Flush parquet at end of batches tick (#16073)
- Check CSE name aliases for collisions. (#16149)
- Don't override CSV reader encoding with lossy UTF-8 (#16151)
- Add missing allow macros for windows (#16130)
- Ensure hex and bitstring literals work inside SQL
IN
clauses (#16101) - Revert "Add RLE to
RLE_DICTIONARY
encoder" (#16113) - Respect user passed 'reader_schema' in 'scan_csv' (#16080)
- Lazy csv + projection; respect null values arg (#16077)
- Materialize dtypes when converting to arrow (#16074)
- Fix casting decimal to decimal for high precision (#16049)
- Fix printing max scale decimals (#16048)
- Decimal supertype for dyn int (#16046)
- Do not set sorted flag on lexical sorting (#16032)
- properly handle nulls in DictionaryArray::iter_typed (#16013)
- Fix CSE case where upper plan has no projection (#16011)
- Crash/incorrect group_by/n_unique on categoricals created by (q)cut (#16006)
- Ternary supertype dynamics (#15995)
- Treat splitting by empty string as iterating over chars (#15922)
- Fix PartialEq for DataType::Unknown (#15992)
- Do not reverse null indices in descending arg_sort (#15974)
- Finish adding
typed_lit
to help schema determination in SQL "extract" func (#15955) - do not panic when comparing against categorical with incompatible dtype (#15857)
- Join validation for multiple keys (#15947)
- Set default limit for String column display to 30 and fix edge cases (#15934)
- typo in add_half_life takes ln(negative) (#15932)
- Remove ffspec from parquet reader (#15927)
- avoid WRITE+EXEC for CPUID check (#15912)
- fix inconsistent decimal formatting (#15457)
- Preserve NULLs for
is_not_nan
(#15889) - double projection check should only take the upstream projections into account (#15901)
- Ensure we don't create invalid frames when combining unit lit + … (#15903)
- Clear cached rename schema (#15902)
- Fix OOB in struct lit/agg aggregation (#15891)
- create (q)cut labels in fixed order (#15843)
- Tag
shrink_dtype
as non-streaming (#15828) - drop-nulls edge case; remove drop-nulls special case (#15815)
- ewm_mean_by was skipping initial nulls when it was already sorted by "by" column (#15812)
- Consult cgroups to determine free memory (#15798)
- raise if index count like 2i is used when performing rolling, group_by_dynamic, upsample, or other temporal operatios (#15751)
- Don't deduplicate sort that has slice pushdown (#15784)
- Fix incorrect
is_between
pushdown toscan_pyarrow_dataset
(#15769) - Handle null index correctly for list take (#15737)
- Preserve lexical ordering on concat (#15753)
- Remove incorrect unsafe pointer cast for int -> enum (#15740)
- pass series name to apply for cut/qcut (#15715)
- count of null column shouldn't panic in agg context (#15710)
📖 Documentation
- Clarify arrow usage (#16152)
- Solve inconsistency between code and comment (#16135)
- add filter docstring examples to date and datetime (#15996)
- update the link to R API docs (#15973)
- Fix a typo in categorical section of the user guide (#15777)
- Fix incorrect column name in
LazyFrame.sort
doc example (#15658)
📦 Build system
- Update Rust nightly toolchain version (#16222)
- Don't import jemalloc (#15942)
- Use default allocator for lts-cpu (#15941)
- replace all macos-latest referrals with macos-13 (#15926)
- pin mimalloc and macos-13 (#15925)
- use jemalloc in lts-cpu (#15913)
🛠️ Other improvements
- simplify interpolate code, add test for rolling_*_by with nulls (#16334)
- Move expression expansion to conversion module (#16331)
- Add
polars-expr
README (#16316) - Move physical expressions to new crate (#16306)
- Use
cls
(notself
) in classmethods (#16303) - conditionally print the CSEs (#16292)
- Rename
ChunkedArray.chunk_id
tochunk_lengths
(#16273) - Use Scalar instead of Series some aggregations (#16277)
- Use
CsvReadOptions
inLazyCsvReader
(#16283) - Do not hardcode bash path in Makefile (#16263)
- Add IR::Reduce (not yet implemented) (#16216)
- Remove incremental read based batched CSV reader (#16259)
- move all describe, describe_tree and dot-viz code to IR instead of DslPlan (#16237)
- move describe to IR instead of DSL (#16191)
- Use
Duration.is_zero
instead of comparing Duration.duration_ns to 0 (#16195) - Remove unused code (#16175)
- Don't override CSV reader encoding with lossy UTF-8 (#16151)
- Move CSV read options from
CsvReader
toCsvReadOptions
(#16126) - Bump
sccache
action (#16088) - Fix failures in test coverage workflow (#16083)
- Rename all 'Chunk's to RecordBatch (#16063)
- Use UnionArgs for DSL side (#16017)
- Add some comments (#16008)
- prepare for join coalescing argument (#15418)
- Pin c...
Python Polars 0.20.27
Warning
This release was yanked. Please use the 0.20.28 release instead.
⚠️ Deprecations
- Change parameter
chunked
toallow_chunks
in parametric testing strategies (#16264)
🚀 Performance improvements
- Use branchless uleb128 decoding for parquet (#16352)
- Reduce error bubbling in parquet hybrid_rle (#16348)
- use is_sorted in ewm_mean_by, deprecate check_sorted (#16335)
- Optimize
is_sorted
for numeric data (#16333) - do not use pyo3-built (#16309)
- Faster bitpacking for Parquet writer (#16278)
- Improve
Series.to_numpy
performance for chunked Series that would otherwise be zero-copy (#16301) - Further optimise initial
polars
import (#16308) - Avoid importing
ctypes.util
in CPU check script if possible (#16307) - Don't rechunk when converting DataFrame to numpy/ndarray (#16288)
- use zeroed vec in ewm_mean_by for sorted fastpath (#16265)
✨ Enhancements
- expose BooleanFunction in expr IR (#16355)
- Allow
read_excel
to handle bytes/BytesIO directly when using the "calamine" (fastexcel) engine (#16344) - Raise when joining on the same keys twice (#16329)
- Don't require data to be sorted by
by
column inrolling_*_by
operations (#16249) - Support List types in
Series.to_numpy
(#16315) - Add
to_jax
methods to support Jax Array export fromDataFrame
andSeries
(#16294) - Enable generating data with time zones in parametric testing (#16298)
- Add struct.field expansion (regex, wildcard, columns) (#16320)
- Add new
alpha
,alphanumeric
anddigit
selectors (#16310) - Faster bitpacking for Parquet writer (#16278)
- Add
require_all
parameter to theby_name
column selector (#15028) - Start updating
BytecodeParser
for Python 3.13 (#16304) - Add
struct.with_fields
(#16305) - Handle implicit SQL string → temporal conversion in the
BETWEEN
clause (#16279) - Expose string expression nodes to python (#16221)
- Add new index/range based selector
cs.by_index
, allow multiple indices fornth
(#16217) - Show warning if expressions are very deep (#16233)
- Fix some issues in parametric testing with nested dtypes (#16211)
🐞 Bug fixes
- pick a consistent order for the sort options in PyIR (#16350)
- Infer CSV schema as supertype of all files (#16349)
- Fix issue in parametric testing where
excluded_dtypes
list would grow indefinitely (#16340) - Address overflow combining u64 hashes in Debug builds (#16323)
- Don't exclude explicitly named columns in group-by context' expr expansion (#16318)
- Improve
map_elements
typing (#16257) - Harden
Series.reshape
against invalid parameters (#16281) - Fix list.sum dtype for boolean (#16290)
- Don't stackoverflow on all/any horizontal (#16287)
- Fix
Series.to_numpy
for Array types with nulls and nested Arrays (#16230) - `rolling_*_by was throwing incorrect error when dataframe was sorted by contained multiple chunks (#16247)
- Don't allow passing missing data to generalized ufuncs (#16198)
- Address overly-permissive
expand_selectors
function, minor fixes (#16250) - Add missing support for parsing instantiated Object dtypes
Object()
(#16260) - Reading CSV with low_memory gave no data (#16231)
- Add missing
read_database
overload (#16229) - Fix a rounding error in parametric test datetimes generation (#16228)
- Fix some issues in parametric testing with nested dtypes (#16211)
📖 Documentation
- Add missing word in
join
docstring (#16299) - document that month_start/month_end preserve the current time (#16293)
- Add example for separator parameter in pivot (#15957)
📦 Build system
🛠️ Other improvements
- Move
DataFrame.to_numpy
implementation to Rust side (#16354) - Organize PyO3 NumPy code into
interop::numpy
module (#16346) - simplify interpolate code, add test for rolling_*_by with nulls (#16334)
- Very minor refactor of
DataFrame.to_numpy
code (#16325) InterchangeDataFrame.version
should be aClassVar
(not aproperty
) (#16312)- Add
polars-expr
README (#16316) - Raise import timing test threshold (#16302)
- Use
cls
(notself
) in classmethods (#16303) - Use Scalar instead of Series some aggregations (#16277)
- Do not hardcode bash path in Makefile (#16263)
- Add IR::Reduce (not yet implemented) (#16216)
- move all describe, describe_tree and dot-viz code to IR instead of DslPlan (#16237)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @NickCondron, @ShivMunagala, @alexander-beedie, @brandon-b-miller, @coastalwhite, @datenzauberai, @itamarst, @jsarbach, @max-muoto, @nameexhaustion, @orlp, @r-brink, @ritchie46, @stinodego, @thalassemia, @twoertwein and @wence-