Releases · pola-rs/polars

17 Nov 18:50

github-actions

py-1.14.0

34ee4ee

Python Polars 1.14.0 Latest

Latest

🚀 Performance improvements

Increase default async thread count for low core count systems (#19829)
Move row group decode off async thread for local streaming parquet scan (#19828)
Support use of Duration in to_string, ergonomic/perf improvement, tz-aware Datetime bugfix (#19697)

✨ Enhancements

Raise informative error on Unknown unnest (#19830)
Support DataFrame init from raw SQLAlchemy rows (#19820)
Support use of Duration in to_string, ergonomic/perf improvement, tz-aware Datetime bugfix (#19697)
Add an is_literal method to expression meta namespace (#19773)
A different approach to warning users of fork() issues with Polars (#19197)

🐞 Bug fixes

Fix read_database(…,iter_batches=True) type annotations (#19832)
Validate subnodes in validate IR (#19831)
Raise if merge non-global categoricals in unpivot (#19826)
Type hints for window_size incorrectly included timedelta in some rolling functions (#19827)
Don't panic if column not found (#19824)
Fix gather of Scalar null + idx w/ validity (#19823)
Replace _kwargs in collect method (#19618)
Fix object chunked gather (#19811)
Fix filter scalar nulls (#19786)
Replace spaces with   to support showing multiple spaces in HTML repr (#19783)
Altair tooltip was being incorrectly applied to plots which did not accept it (#19789)
Respect schema_overrides in batched csv reader (#19755)
Fix scanning google cloud with service account credentials file (#19782)
Release the GIL in Python APIs, part 2 of 2 (#19762)
Fix incorrect filter after right-join on LazyFrame (#19775)
Fix incorrect lazy schema for explode on array columns (#19776)
Fixed typo in file lazy.py (#19769)

📖 Documentation

Update bokeh to use cdn to avoid Bokeh Error (#19788)
Change dprint config (#19747)
Mention row_by_keys in the to_dict documentation (#19767)
Fix link to Graphviz download (#19791)

🛠️ Other improvements

Add ToField context for common args (#19833)
Use polars parquet reader for delta scan (#19103)
Migrate polars-expr AggregationContext to use Column (#19736)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @TNieuwdorp, @YichiZhang0613, @alexander-beedie, @braaannigan, @coastalwhite, @engylemure, @gab23r, @iliya-malecki, @ion-elgreco, @itamarst, @jackxxu, @nameexhaustion, @orlp, @ritchie46, @rodrigogiraoserrao and @sn0rkmaiden

Contributors

orlp, jackxxu, and 15 other contributors

Assets 3

13 Nov 21:02

github-actions

py-1.13.1

9f79100

Python Polars 1.13.1

✨ Enhancements

Add IPC source node for new streaming engine (#19454)

🐞 Bug fixes

Release GIL in Python APIs, part 1 (#19705)
Fix incorrect lazy schema for aggregations (#19753)
Address incorrect selector & col expansion (#19742)

📖 Documentation

Fix formatting of nested list (#19746)
Add meta.is_column to API docs (#19744)
Fix join API reference links (#19745)

Thank you to all our contributors for making this release possible!
@alexander-beedie, @coastalwhite, @etiennebacher, @itamarst, @nameexhaustion, @orlp, @ritchie46 and @rodrigogiraoserrao

Contributors

orlp, alexander-beedie, and 6 other contributors

Assets 3

12 Nov 12:19

github-actions

py-1.13.0

7f0b3e0

Python Polars 1.13.0

🚀 Performance improvements

Improve DataFrame.sort().limit/top_k performance (#19731)
Improve cloud scan performance (#19728)
Fix quadratic 'with_columns' behavior (#19701)
Improve hive partition pruning with datetime predicates from SQL (#19680)
Allow for arbitrary skips in Parquet Dictionary Decoding (#19649)
Reorder conditions in is_leap_year (#19602)
Rechunk in DataFrame.rows if needed (#19628)
Dispatch Parquet Primitive PLAIN decoding to faster kernels when possible (#19611)
Use faster iteration in 'starts_with'/'ends_with' (#19583)
Branchless Parquet Prefiltering (#19190)
Reduce size of IdxVec from 24 -> 16 bytes (#19550)

✨ Enhancements

Try to support native SAP HANA driver via read_database (#19733)
Implement max/min methods for dtypes (#19494)
Improve n_chunks typing (#19727)
Improve hive partition pruning with datetime predicates from SQL (#19680)
Identify inefficient use of Python string removeprefix, removesuffix, and zfill in map_elements (#19672)
Automatically use boto3 / google-auth if installed when scanning cloud (#19677)
Identify inefficient use of Python string replace in map_elements (#19668)
Parallel IPC sink for the new streaming engine (#19622)
Add SQL support for RIGHT JOIN, fix an issue with wildcard aliasing (#19626)
Add show_graph to display a GraphViz plot for expressions (#19365)
Streamline use of predicates connected by & with IEJoin (join_where) (#19552)
Support use of is_between range predicate with IEJoin operations (join_where) (#19547)

🐞 Bug fixes

Use cls for to_python (#19726)
Fix validation for inner and left join when join_nulls unflaged (#19698)
SQL ELSE clause should be implicitly NULL when omitted (#19714)
Improve n_chunks typing (#19727)
Ensure NoDataError raised consistently between engines for Excel reads (#19712)
In group_by_dynamic, period and every were getting applied in reverse order for the window upper boundary (#19706)
Only allow list.to_struct to be elementwise when width is fixed (#19688)
Make Array arithmetic ops fully elementwise (#19682)
Address inconsistency with use of Python types in frame-level cast (#19657)
Update line-splitting logic in batched CSV reader (#19508)
Fix incorrect lazy schema for explode() in agg() (#19629)
Fix fill null types (#19656)
Fix filter incorrectly pushed past struct unnest when unnested column name matches upper column name (#19638)
Fix typing for SchemaDefinition (#19647)
Ensure mean_horizontal raises on non-numeric input (#19648)
Reorder conditions in is_leap_year (#19602)
Copy height in .vstack() for empty dataframes (#19641) (#19642)
Correct wildcard and input expansion for some more functions (#19588)
Allow .struct.with_fields inside list.eval (#19617)
Sortedness was incorrectly being preserved in dt.offset_by when offsetting by non-constant durations in the timezone-naive case (#19616)
Fix incorrect scan_parquet().with_row_index() with non-zero slice or with streaming collect (#19609)
Fix mask and validity confusion in Parquet String decoding (#19614)
Parquet decoding of nested dictionary values (#19605)
Do not attempt to load default credentials when credential_provider is given (#19589)
Fix gather len in group-by state (#19586)
Added input validation for explode operation in the array namespace (#19163)
Improve error message (#19546)
Fix predicate pushdown into inequality joins (#19582)
Correct categorical namespace error message (#19558)
Fix performance regression for sort/gather on list/array columns (#19564)
Ignore quoted newlines when skipping lines in CSV (#19543)
Incorrect gather for FixedSizeList with outer validity but no inner validities (#19489)
Make Duration parsing fallible and not panic (#19490)

📖 Documentation

Revise and rework user-guide/expressions (#19360)
Update Excel page of user guide to refer to fastexcel as the default engine (#19691)
Alter examples for round_sig_figs to make behaviour clearer (#19667)
Assorted fixes to Rust API docs (#19664)
Improve replace and replace_all docstring explanation of the "$" character with reference to capture groups (vs use as a literal) (#19529)
Add credential provider section and examples to user guide (#19487)
Fix various instances of repeated words in docs and comments (#19516)

📦 Build system

Bump Rust toolchain to nightly-2024-10-28 (#19492)

🛠️ Other improvements

Remove unused Excel code (#19710)
Use Column for the {try,}_apply_columns{_par,} functions on DataFrame (#19683)
Remove more @scalar-opt (#19666)
Move Series bitops to std::ops::Bit... (#19673)
Mark test_parquet.py test_dict_slices as slow (#19675)
Get Column into polars-expr (#19660)
Streamline internal SQL join condition processing (#19658)
Factor out logic for re-use by new streaming CSV source (#19637)
Configure grouped Dependabot updates (#19604)
Fix PyO3 error in CI (#19545)
Update nightly compiler version (#19590)
Added input validation for explode operation in the array namespace (#19163)
Fix lint (#19584)
Add a Column::Partitioned variant (#19557)
Move to fast-float2 (#19578)
Only run remote bench on rust changes (#19581)
Remove unsafe *_release functions (#19554)
Fix test_rolling_by_integer not using parameterized dtype (#19555)
Add mindebug-dev rust profile (#19524)
Add CI step to process benchmark results (#19530)
Add CI benchmark on merge (#19518)
Skip client check with env var (#19517)
Improve makefile build commands (#19498)

Thank you to all our contributors for making this release possible!
@3tilley, @HansBambel, @MarcoGorelli, @alexander-beedie, @barak1412, @braaannigan, @cmdlineluser, @coastalwhite, @corwinjoy, @dependabot, @dependabot[bot], @eitsupi, @janpipek, @jqnatividad, @letkemann, @max-muoto, @nameexhaustion, @orlp, @ritchie46, @rodrigogiraoserrao, @siddharth-vi, @stinodego and @wence-

Contributors

janpipek, orlp, and 20 other contributors

Assets 3

01 Nov 09:07

github-actions

rs-0.44.2

2dce3d3

Rust Polars 0.44.2

🚀 Performance improvements

Reduce size of IdxVec from 24 -> 16 bytes (#19550)

✨ Enhancements

Streamline use of predicates connected by & with IEJoin (join_where) (#19552)
Support use of is_between range predicate with IEJoin operations (join_where) (#19547)

🐞 Bug fixes

Correct categorical namespace error message (#19558)
Fix performance regression for sort/gather on list/array columns (#19564)
Ignore quoted newlines when skipping lines in CSV (#19543)

🛠️ Other improvements

Remove ad-hoc buffer pool (#19553)
Remove SyncCounter (#19556)
Removed unnecessary flatten function (#19551)
Remove unsafe *_release functions (#19554)
Improve new-streaming groupby performance for high cardinality (#19537)
Add mindebug-dev rust profile (#19524)
Add CI step to process benchmark results (#19530)

Thank you to all our contributors for making this release possible!
@HansBambel, @alexander-beedie, @barak1412, @coastalwhite, @nameexhaustion, @orlp and @ritchie46

Contributors

orlp, alexander-beedie, and 5 other contributors

Assets 2

29 Oct 18:33

github-actions

rs-0.44.1

18fbf52

Rust Polars 0.44.1

🐞 Bug fixes

Incorrect gather for FixedSizeList with outer validity but no inner validities (#19489)
Make Duration parsing fallible and not panic (#19490)

📖 Documentation

Fix various instances of repeated words in docs and comments (#19516)

📦 Build system

Fix some feature flag issues (#19512)
Bump Rust toolchain to nightly-2024-10-28 (#19492)

🛠️ Other improvements

Add CI benchmark on merge (#19518)
Skip client check with env var (#19517)
Rename ComputeNode::spawn parameters (#19514)
Enable new_streaming feature by default (#19502)
Add groupby partitioning and parallel groupby finishing to new-streaming engine (#19451)
Improve makefile build commands (#19498)
Reduce Vec allocations in new-streaming parquet source (#19493)

Thank you to all our contributors for making this release possible!
@alexander-beedie, @coastalwhite, @nameexhaustion, @orlp, @ritchie46 and @stinodego

Contributors

orlp, alexander-beedie, and 4 other contributors

Assets 2

27 Oct 17:02

github-actions

rs-0.44.0

6a7d140

Rust Polars 0.44.0

💥 Breaking changes

Purge arrow-rs support (#19312)

🚀 Performance improvements

Address inadvertent quadratic behaviour in expand_columns (#19469)
Move rolling_corr/cov to an actual implementation on Series (#19466)
Don't split par if cast to categorical (#19462)
Improve var/cov/corr performance (#19381)
Reduce memcopy in parquet (#19350)
Optimize array and list gather (#19327)
Add/fix unordered row decode, change unordered format (#19284)
Fast decision for Parquet dictionary encoding (#19256)
Make date_range / datetime_range ~10x faster for constant durations (#19216)
Batch utf8-validation in csv 18% / 25% on 1.9.0 (#19124)
Use two-pass algorithm for csv to ensure correctness and SIMDize more ~17% (#19088)
Use List's TotalEqKernel (#18984)
Improve rename performace for Lazy API (#18890)
Collapse cross-joins to faster joins (#18633)
Cache register plugin function (#18860)

✨ Enhancements

Implement nested Parquet writing for High-Precision Decimals (#19476)
Improve read_database typing (#19444)
Add IPC sink in new streaming engine (#19431)
Added escape_regex operation to the str namespace and as a global function (#19257)
Add SQL support for bit_count and bitwise &, |, and xor operators (#19114)
Add credential provider utility classes for AWS, GCP (#19297)
Support decoding Float16 in Parquet (#19278)
Experimental credential_provider argument for scan_parquet (#19271)
Allow DeltaTable input to scan_delta and read_delta (#19229)
Make FlightConsumer Send and support compressed data (#19262)
New quantile interpolation method & QUANTILE_DISC function in SQL (#19139)
Conserve Parquet SortingColumns for ints (#19251)
Low level flight interface (#19239)
Improved list arithmetic support (#19162)
Expose LTS CPU in show_versions() (#19193)
Check Python version when deserializing UDFs (#19175)
Quantile function in SQL (#18047)
Improve scalar strict message (#19117)
Add Series::{first, last, approx_n_unique} (#19093)
Allow for rolling_*_by to use index count as window (#19071)
Delay deserialization of python function until physical plan (#19069)
Add cum(_min/_max) for pl.Boolean (#19061)
Bitwise operations / aggregations (#18994)
Improved error message DSL -> IR resolving (#19032)
Add strict param to eager/lazy frame "rename" (#19017)
Support schema arg in read/scan_parquet() (#19013)
Add allow_missing_columns option to read/scan_parquet (#18922)
Use FFI to extract Series from different Polars binaries (#18964)
Allow for zero-width fixed size lists (#18940)
Improve scalar strict message (#18904)
Support arithmetic between Series with dtype list (#17823)
Relaxed schema alignment for parquet file list read (#18803)
Always preserve sorted flag for .dt.date (#18692)
Implement single inequality joins for join_where (#18727)

🐞 Bug fixes

Include Array in to_physical (#19474)
Don't panic in SQL temporal string check; raise suitable ColumnNotFound error (#19473)
Properly raise on mean_horizontal with wrong dtypes (#19472)
Make output dtype known for list.to_struct when fields are passed (#19439)
Address inadvertent quadratic behaviour in expand_columns (#19469)
Ensure sorted flag is unset after Int->String cast (#19470)
Fix row_index of batched reader (#19465)
Fix perfect groupby (#19461)
Correct wildcard expansion for functions (#19449)
Ensure struct eq/ne_missing also compares outer validity (#19443)
Fix incorrect reverse on struct containing NULLs (#19446)
Faulty escape_regex example (#19440)
Capture groups should be ignored in replace when literal=True (#19413)
Fix ColumnNotFound when using pl.element() inside list.eval (#19438)
Updates error message in csv parser to recommend schema_overrides instead of deprecated dtypes argument (#19416)
Incorrect .join(..., how="left").head(N) if N <= left_df.height() and there are duplicate matches (#19422)
Support Array type in more DataType methods (#19427)
Bug in group_tuples_perfect, tail was not processed properly (#19417)
Ensure that ASCII* table formats do not use the UTF8 ellipsis char when truncating rows/cols/values (#19404)
Allow .get(null) in groupby context (#19401)
Fix include_file_paths and with_row_index for streaming CSV scan (#19394)
Flaky parametric parquet test (#19393)
Raise on data mismatch in str.json_decode (#19347)
Fix unsoundness in group_tuples_perfect (#19359)
Ensure Python version matches version used to serialize credential provider (#19375)
Capture groups should be ignored in replace_all when literal=True (#19366)
Ignore Parquet is_{min,max}_value_exact when set to true (#19344)
Projection pushdown was ignored by include_file_paths (#19341)
Don't produce duplicate column names in Series.to_dummies (#19326)
Use of HAVING outside of GROUP BY should raise a suitable SQLSyntaxError (#19320)
Fix empty array gather (#19316)
Merge categorical rev-map in unpivot (#19313)
DataFrame descending sorting by single list element (#19233)
Fix cse union schema (#19305)
Correctly load Parquet statistics for f16 (#19296)
Error on invalid query (#19303)
Fix enum scalar output (#19301)
Fix list gather invalid fast path (#19299)
Fix quoting style of decimal csv output (#19298)
Don't vertically parallelize literal select (#19295)
Fix struct reshape fast path (#19294)
Also split on forward slashes during hive path inference on Windows (#19282)
Don't cse as_struct (#19280)
Only apply string parsing to String dtype (#19222)
Compilation error missing use JsonLineReader (#19244)
Don't remember Parquet statistics if filtered (#19248)
Do not check dtypes of non-projected columns for parquet (#19254)
Parquet predicate pushdown for lit(_) != (#19246)
Use all chunks in Series from arrow struct (#19218)
Implement is_nested_null for Null Array (#19219)
Fix struct literals (#19214)
Plotting was not interacting well with Altair schema wrappers (#19213)
Fixing infer_schema for DataType::Null (#19201)
Migrate to PyO3 0.22 and released verion of rust-numpy crate (#19199)
Don't unwrap() expansion (#19196)
Properly handle non-nullable nested Parquet (#19192)
Fix invalid list collection in expression engine (#19191)
Implement to_arrow functionality properly for Arrays (#19077)
Fix incorrect (eq|ne)_missing on List/Array types (#19155)
Properly broadcast Struct when then validity (#19148)
Allow partial name overlap in join_where resolution (#19128)
Fix floordiv / modulo with scalar 0 on LHS (#19143)
Ensure aligned chunks in OOC sort (#19118)
Recursively align when converting to ArrowArray (#19097)
Raise on invalid shape of shape 1, empty combination (#19113)
Use two-pass algorithm for csv to ensure correctness and SIMDize more ~17% (#19088)
Allow converting DatetimeOwned to ChunkedArray (#19094)
Throw proper error for empty char params in scan_csv (#19100)
Ensure parquet schema arg is propagated to IR (#19084)
Only rewrite numeric ineq joins (#19083)
Check validity of columns of keys/aggs in dsl->ir (#19082)
Bitwise aggregations should ignore null values (#19067)
Remove failing datetime subclass test (#19068)
Fix ser/de PlSmallStr error (#19060)
Remove failing temporal lit tests (#19056)
Divide-by-zero in OOC sort (#19048)
Ensure must_flush flag is not reset (#19046)
Error node should be on top (#19045)
Force nested struct missing equality (#19031)
Fix invalid alias udf (#19021)
Raise invalid predicate join_where (#19020)
Fix nested flag of functions with multiple arguments (#19016)
Fix projection pushdown bug in IEJOINS (#19015)
Separate temporal tests (#19012)
Return the truth values of ne_missing and eq_missing operations for struct instead of null (#18930)
Fix struct broadcasting comparisons (#19003)
Wrong result on when().then().otherwise() on struct when both result are broadcast (#19000)
Improve literals for temporal subclasses (#18998)
Ensure same fmt in Series/AnyValue to string cast (#18982)
Return correct value for when().then().else() on structs when using first()\last() (#18969)
IPC don't write variadic_buffer_counts in blocks, but only dictionaries (#18980)
Respect allow_threading in TernaryExpr (#18977)
Make join test order-agnostic (#18975)
Window function had incorrect output name on ExprIR (#18970)
Fix lit().shrink_dtype() broadcasting (#18958)
Parallel evaluation of cumulative_eval (#18959)
Properly implement AnyValue::Binary into_py (#18960)
Fix Expr.over with order_by did not take effect if group keys were sorted (#18947)
Properly fetch type of full None List Series (#18916)
Incorrect mode for sorted input (#18945)
Properly choose inner physical type for Array (#18942)
Disable very old date in timezone test for CI (#18935)
Infer reshape dims when determining schema (#18923)
Incorrect broadcasting on list-of-string set ops (#18918)
Adding with_row_index() to previously collected lazy scan does not take effect (#18913)
Properly zip struct validities (#18886)
Ensure ListPrimitiveBuilder dtype invariant is asserted (#18889)
Out-of-bounds gather in categorical->int cast (#18897)
AnyValue Series from Categorical/Enum (#18893)
Properly cast AnyValue string (#18888)
Fix SO in json inference (#18887)
Use proper thread pool in cumulative_eval (#18885)
Properly calculate duration units (#18869)
Check values in strict cast Int to Time (#18854)
Fix typo in DuplicateError error message (#18855)
Properly merge live- and dead columns in prefiltered (#18862)
DataFrame plot was raising when some extra keywords were pas...

Contributors

orlp, wolfgang-noichl, and 53 other contributors

Assets 2

27 Oct 12:02

github-actions

py-1.12.0

d51b12c

Python Polars 1.12.0

⚠️ Deprecations

Make some parameters of dt.add_business_days keyword-only (#19428)

🚀 Performance improvements

Address inadvertent quadratic behaviour in expand_columns (#19469)
Move rolling_corr/cov to an actual implementation on Series (#19466)
Don't split par if cast to categorical (#19462)

✨ Enhancements

Implement nested Parquet writing for High-Precision Decimals (#19476)
Improve read_database typing (#19444)
Respect include_index for pandas series (#19453)
Add credential_provider argument to more read functions (#19421)
Add IPC sink in new streaming engine (#19431)
Support querying specific snapshot by id in scan_iceberg (#19388)

🐞 Bug fixes

Include Array in to_physical (#19474)
Don't panic in SQL temporal string check; raise suitable ColumnNotFound error (#19473)
Properly raise on mean_horizontal with wrong dtypes (#19472)
Make output dtype known for list.to_struct when fields are passed (#19439)
Address inadvertent quadratic behaviour in expand_columns (#19469)
Ensure sorted flag is unset after Int->String cast (#19470)
Fix row_index of batched reader (#19465)
Fix perfect groupby (#19461)
Correct wildcard expansion for functions (#19449)
Ensure struct eq/ne_missing also compares outer validity (#19443)
Fix incorrect reverse on struct containing NULLs (#19446)
Faulty escape_regex example (#19440)
Capture groups should be ignored in replace when literal=True (#19413)
Fix ColumnNotFound when using pl.element() inside list.eval (#19438)
Updates error message in csv parser to recommend schema_overrides instead of deprecated dtypes argument (#19416)
Incorrect .join(..., how="left").head(N) if N <= left_df.height() and there are duplicate matches (#19422)
Support Array type in more DataType methods (#19427)
Bug in group_tuples_perfect, tail was not processed properly (#19417)
Ensure that ASCII* table formats do not use the UTF8 ellipsis char when truncating rows/cols/values (#19404)

📖 Documentation

Fix docstrings for ATAN2 and ATAN2D SQL functions (#19351)

🛠️ Other improvements

Undo conflicting fix (#19463)
Simplify rust side of datetime (#19459)
Add tests for data mismatch on read_json (#19425)
Remove code in examples folder in favor of the user guide (#19430)

Thank you to all our contributors for making this release possible!
@alexander-beedie, @cmdlineluser, @coastalwhite, @corleyma, @corwinjoy, @dvillaveces, @eitsupi, @gab23r, @janscholten, @nameexhaustion, @orlp, @ritchie46, @siddharth-vi, @stinodego and @wakabame

Contributors

orlp, corleyma, and 13 other contributors

Assets 3

23 Oct 17:51

github-actions

py-1.11.0

1d144c8

Python Polars 1.11.0

🚀 Performance improvements

Improve var/cov/corr performance (#19381)
Reduce memcopy in parquet (#19350)
Optimize array and list gather (#19327)

✨ Enhancements

Various Schema improvements (equality/init dtype checks) (#19379)
AssumeRole support for AWS Credential Provider (#19346)
Added escape_regex operation to the str namespace and as a global function (#19257)
Improve read_database_uri typing (#19334)

🐞 Bug fixes

Allow .get(null) in groupby context (#19401)
Fix include_file_paths and with_row_index for streaming CSV scan (#19394)
Flaky parametric parquet test (#19393)
Release GIL in gather_with_series() and friend (#19383)
Raise on data mismatch in str.json_decode (#19347)
Ensure Python version matches version used to serialize credential provider (#19375)
Capture groups should be ignored in replace_all when literal=True (#19366)
Ignore Parquet is_{min,max}_value_exact when set to true (#19344)
Projection pushdown was ignored by include_file_paths (#19341)

📖 Documentation

Spurious import in example (#19398)
Tiny correction post dask-expr (#19354)

📦 Build system

Revert PyO3 version back to 0.21 (#19376)

🛠️ Other improvements

Expose group_by_dynamic in pyir (#19385)
Add AlignedBytes types (#19308)
Remove unsued bytes->BytesIO conversion (#19369)
Improve error message for Zero-Field Structs with Parquet (#19370)
Reduce memcopy in parquet (#19350)

Thank you to all our contributors for making this release possible!
@alexander-beedie, @barak1412, @benrutter, @coastalwhite, @corwinjoy, @itamarst, @max-muoto, @nameexhaustion, @orlp, @ritchie46, @stinodego, @wence- and @wolfgang-noichl

Contributors

orlp, wolfgang-noichl, and 11 other contributors

Assets 3

20 Oct 11:13

github-actions

py-1.10.0

f3eba22

Python Polars 1.10.0

🚀 Performance improvements

Add/fix unordered row decode, change unordered format (#19284)
Fast decision for Parquet dictionary encoding (#19256)
Make date_range / datetime_range ~10x faster for constant durations (#19216)
Batch utf8-validation in csv 18% / 25% on 1.9.0 (#19124)
Use two-pass algorithm for csv to ensure correctness and SIMDize more ~17% (#19088)

✨ Enhancements

Add SQL support for bit_count and bitwise &, |, and xor operators (#19114)
Add credential provider utility classes for AWS, GCP (#19297)
Support decoding Float16 in Parquet (#19278)
Experimental credential_provider argument for scan_parquet (#19271)
Allow DeltaTable input to scan_delta and read_delta (#19229)
New quantile interpolation method & QUANTILE_DISC function in SQL (#19139)
Conserve Parquet SortingColumns for ints (#19251)
Low level flight interface (#19239)
Improved list arithmetic support (#19162)
Add Expr.struct.unnest() as alias for Expr.struct.field("*") (#19212)
Add 'drop_empty_rows' parameter for read_ods (#19202)
Add 'drop_empty_rows' parameter for read_excel (#18253)
Expose LTS CPU in show_versions() (#19193)
Check Python version when deserializing UDFs (#19175)
Raise an error when users try to use Polars API in a fork()-without-execve() child (#19149)
Quantile function in SQL (#18047)
Improve scalar strict message (#19117)
Add Series::{first, last, approx_n_unique} (#19093)
Allow for rolling_*_by to use index count as window (#19071)
Delay deserialization of python function until physical plan (#19069)
Add cum(_min/_max) for pl.Boolean (#19061)

🐞 Bug fixes

Don't produce duplicate column names in Series.to_dummies (#19326)
Use of HAVING outside of GROUP BY should raise a suitable SQLSyntaxError (#19320)
More accurate from_dicts typing/signature (#19322)
Fix empty array gather (#19316)
Merge categorical rev-map in unpivot (#19313)
DataFrame descending sorting by single list element (#19233)
Fix cse union schema (#19305)
Correctly load Parquet statistics for f16 (#19296)
Error on invalid query (#19303)
Fix enum scalar output (#19301)
Fix list gather invalid fast path (#19299)
Fix quoting style of decimal csv output (#19298)
Don't vertically parallelize literal select (#19295)
Fix struct reshape fast path (#19294)
Also split on forward slashes during hive path inference on Windows (#19282)
Don't cse as_struct (#19280)
Only apply string parsing to String dtype (#19222)
Make the SQLAlchemy connection check more robust (#19270)
Ensure that read_database takes advantage of Arrow return from a duckdb_engine connection when using a SQLAlchemy Selectable (#19255)
Compilation error missing use JsonLineReader (#19244)
Don't remember Parquet statistics if filtered (#19248)
Do not check dtypes of non-projected columns for parquet (#19254)
Parquet predicate pushdown for lit(_) != (#19246)
Use all chunks in Series from arrow struct (#19218)
Don't trigger row limit in array construction (#19215)
Fix struct literals (#19214)
Plotting was not interacting well with Altair schema wrappers (#19213)
Fixing infer_schema for DataType::Null (#19201)
Migrate to PyO3 0.22 and released verion of rust-numpy crate (#19199)
Add 'drop_empty_rows' parameter for read_excel (#18253)
Don't unwrap() expansion (#19196)
Properly handle non-nullable nested Parquet (#19192)
Fix invalid list collection in expression engine (#19191)
Fix use of "hidden_columns" parameter in write_excel (#19029)
Implement to_arrow functionality properly for Arrays (#19077)
Remove incorrect warning when using an IO[bytes] instance (#19154)
Don't fail test if e.g. jax has been used first, since jax installs a fork handler that warns (#19178)
Fix incorrect (eq|ne)_missing on List/Array types (#19155)
Properly broadcast Struct when then validity (#19148)
Allow partial name overlap in join_where resolution (#19128)
Fix floordiv / modulo with scalar 0 on LHS (#19143)
Ensure aligned chunks in OOC sort (#19118)
Recursively align when converting to ArrowArray (#19097)
Raise on invalid shape of shape 1, empty combination (#19113)
Use two-pass algorithm for csv to ensure correctness and SIMDize more ~17% (#19088)
Allow converting DatetimeOwned to ChunkedArray (#19094)
Throw proper error for empty char params in scan_csv (#19100)
Ensure parquet schema arg is propagated to IR (#19084)
Only rewrite numeric ineq joins (#19083)
Check validity of columns of keys/aggs in dsl->ir (#19082)
Bitwise aggregations should ignore null values (#19067)
Remove failing datetime subclass test (#19068)
Don't ignore multiple columns in LazyFrame.unnest (#19035)

📖 Documentation

Remove ecosystem viz section since there is one in misc already (#18408)
Fix typo in custom expressions docs (#19292)
Add SQL docs for new QUANTILE_CONT and QUANTILE_DISC functions (#19272)
Add marimo to ecosystem.md (#19250)
Improve DataFrame.write_database docstring (#19189)
Link to main website from banner (#19177)
Fix example of as_struct (#19116)
Clarify difference between bitwise/logical ops (#19180)
Add non-equi joins to, and revise, joins docs page (#19127)
Add Series.first,last,approx_n_unique to docs (#19146)
Annotate Config kwarg options (#18988)
Revise and improve 'Concepts' section (#19087)

🛠️ Other improvements

Add/fix unordered row decode, change unordered format (#19284)
Move from parquet-format-safe to polars-parquet-format (#19275)
Skip flaky test (#19242)
Add more tests for list arithmetic (#19225)
Remove unused IPC async (#19223)
Make get_list_builder infallible (#19217)
Migrate to PyO3 0.22 and released verion of rust-numpy crate (#19199)
Make expression output type known (#19195)
Revert "feat(python): Raise an error when users try to use Polars API in a fork()-without-execve() child (#19149) (#19188)
Zero-Field Structs and DataFrame with Height Property (#19123)
Make pl.repeat part of the IR (#19152)
Expose IEJoin IR node to python (#19104)
Clean remove_prefix since python3.9 is now the minimum Python (#19070)
Add new streaming engine to CI (#19051)

Thank you to all our contributors for making this release possible!
@Bidek56, @MarcoGorelli, @Rashik-raj, @adamreeve, @alexander-beedie, @alonme, @balbok0, @coastalwhite, @deanm0000, @dependabot, @dependabot[bot], @eitsupi, @etrotta, @itamarst, @jbutterwick, @joelostblom, @kenkoooo, @khalidmammadov, @laurentS, @mcrumiller, @mscolnick, @nameexhaustion, @orlp, @pomo-mondreganto, @ritchie46, @rodrigogiraoserrao, @siddharth-vi, @stinodego, @sunadase and @wence-

Contributors

orlp, adamreeve, and 27 other contributors

Assets 3

01 Oct 19:54

github-actions

py-1.9.0

be5a4b4

Python Polars 1.9.0

🚀 Performance improvements

Use List's TotalEqKernel (#18984)

✨ Enhancements

Bitwise operations / aggregations (#18994)
Allow insert_column to take expressions (#19024)
Improved error message DSL -> IR resolving (#19032)
Add strict param to eager/lazy frame "rename" (#19017)
Support schema arg in read/scan_parquet() (#19013)
Add include_file_paths parameter to read_parquet (#19008)
Add allow_missing_columns option to read/scan_parquet (#18922)
Drop python 3.8 support (#18965)
Use FFI to extract Series from different Polars binaries (#18964)
Allow for zero-width fixed size lists (#18940)

🐞 Bug fixes

Remove failing temporal lit tests (#19056)
Divide-by-zero in OOC sort (#19048)
Ensure must_flush flag is not reset (#19046)
Error node should be on top (#19045)
Force nested struct missing equality (#19031)
Fix invalid alias udf (#19021)
Raise invalid predicate join_where (#19020)
Fix nested flag of functions with multiple arguments (#19016)
Fix projection pushdown bug in IEJOINS (#19015)
Separate temporal tests (#19012)
Return the truth values of ne_missing and eq_missing operations for struct instead of null (#18930)
Fix list to numpy conversion (#19009)
Fix struct broadcasting comparisons (#19003)
Wrong result on when().then().otherwise() on struct when both result are broadcast (#19000)
Improve literals for temporal subclasses (#18998)
Ensure same fmt in Series/AnyValue to string cast (#18982)
Return correct value for when().then().else() on structs when using first()\last() (#18969)
IPC don't write variadic_buffer_counts in blocks, but only dictionaries (#18980)
Respect allow_threading in TernaryExpr (#18977)
Make join test order-agnostic (#18975)
Fix lit().shrink_dtype() broadcasting (#18958)
Parallel evaluation of cumulative_eval (#18959)
Properly implement AnyValue::Binary into_py (#18960)
Fix Expr.over with order_by did not take effect if group keys were sorted (#18947)
Properly fetch type of full None List Series (#18916)
Incorrect mode for sorted input (#18945)
Properly choose inner physical type for Array (#18942)
Disable very old date in timezone test for CI (#18935)
Infer reshape dims when determining schema (#18923)
Incorrect broadcasting on list-of-string set ops (#18918)
Adding with_row_index() to previously collected lazy scan does not take effect (#18913)

📖 Documentation

Fix example of lazy schema verification (#19059)
Rewrite 'Getting started' page (#19028)
Fix is_not_nan description (#18985)
Recommend targetDir for rust-analyzer (#18973)
Fix LazyFrame fetch method references (#18033)

📦 Build system

Bump Rust toolchain to nightly-2024-09-29 (#19006)
Bump simd-json to 0.14 (#18999)

🛠️ Other improvements

Remove built info (#19057)
Mark schema arg in read/scan_parquet as unstable (#19018)
Fix new-streaming test_lazy_parquet::test_row_index (#19019)
Preserve scalar in more places (#18898)
Mention allow_missing_columns in error message when column not found (parquet) (#18972)
Disable CSE-specific test on new streaming engine (#18971)
Add FixedSizeList equality broadcasting (#18967)
Divide ChunkCompare into Eq and Ineq variants (#18963)
Another set of new-stream test skip/fixes (#18952)
Fix/skip variety of new-streaming tests, cont (#18928)
Fix/skip variety of new-streaming tests (#18924)

Thank you to all our contributors for making this release possible!
@LukasFolwarczny, @Plutone11011, @aleexharris, @alexander-beedie, @barak1412, @coastalwhite, @dependabot, @dependabot[bot], @edwinvehmaanpera, @kgv, @mcrumiller, @nameexhaustion, @orlp, @ritchie46, @rodrigogiraoserrao, @stinodego and @xhiroga

Contributors

orlp, kgv, and 14 other contributors

Assets 3

Releases: pola-rs/polars

Python Polars 1.14.0

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

🛠️ Other improvements

Contributors

Python Polars 1.13.1

✨ Enhancements

🐞 Bug fixes

📖 Documentation

Contributors

Python Polars 1.13.0

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

📦 Build system

🛠️ Other improvements

Contributors

Rust Polars 0.44.2

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

🛠️ Other improvements

Contributors

Rust Polars 0.44.1

🐞 Bug fixes

📖 Documentation

📦 Build system

🛠️ Other improvements

Contributors

Rust Polars 0.44.0

💥 Breaking changes

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

Contributors

Python Polars 1.12.0

⚠️ Deprecations

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

🛠️ Other improvements

Contributors

Python Polars 1.11.0

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

📦 Build system

🛠️ Other improvements

Contributors

Python Polars 1.10.0

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

🛠️ Other improvements

Contributors

Python Polars 1.9.0

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

📦 Build system

🛠️ Other improvements

Contributors