Releases · pola-rs/polars

12 Sep 14:01

github-actions

rs-0.43.1

54218e7

Rust Polars 0.43.1

🐞 Bug fixes

Revert automatically turning on Parquet prefiltered (#18720)
Parquet prefiltered with projection pushdown (#18714)
Correctly display multilevel nested Arrays (#18687)
Fix scalar literals (#18707)
Missing activation for serde for PlSmallStr from some crates (#18702)
Add missing PhantomDatas to BackingStorage (#18699)
Fix use of undeclared crate or module error (#18701)
Refactor decompression checks and add support for decompressing JSON (#18536)
Qcut all nulls panics (#18667)

🛠️ Other improvements

Remove IR info from DSL (#18712)
Refactor code into functions in new parquet source (#18711)
Remove unused feature flags from polars-mem-engine (#18679)
Remove hive_parts from DSL source (#18694)

Thank you to all our contributors for making this release possible!
@alexander-beedie, @ankane, @attila-lin, @coastalwhite, @eitsupi, @nameexhaustion, @ohanf, @orlp, @ritchie46 and @yarimiz

Contributors

orlp, ankane, and 8 other contributors

Assets 2

12 Sep 15:49

github-actions

py-1.7.1

54218e7

Python Polars 1.7.1

🐞 Bug fixes

Revert automatically turning on Parquet prefiltered (#18720)
Parquet prefiltered with projection pushdown (#18714)
Fix scalar literals (#18707)

🛠️ Other improvements

Remove IR info from DSL (#18712)
Remove unused feature flags from polars-mem-engine (#18679)
Remove hive_parts from DSL source (#18694)

Thank you to all our contributors for making this release possible!
@ankane, @attila-lin, @coastalwhite, @eitsupi, @nameexhaustion, @orlp and @ritchie46

Contributors

orlp, ankane, and 5 other contributors

Assets 3

11 Sep 10:26

github-actions

rs-0.43.0

f25ca0c

Rust Polars 0.43.0

🏆 Highlights

Add support for IO[bytes] and bytes in scan_{...} functions (#18532)
Add IEJoin algorithm for non-equi joins and support Full non-equi joins (#18365)

🚀 Performance improvements

Back arrow arrays with SharedStorage which can have non-refcounted static slices (#18666)
Don't traverse file list twice for extension validation (#18620)
Remove cloning of ColumnChunkMetadata (#18615)
Add upfront partitioning in ColumnChunkMetadata (#18584)
Enable Parquet parallel=prefiltered for auto (#18514)
Change PlSmallStr impl from Arc<str> to compact_str (#18508)
Added optimizer rules for is_null().all() and similar expressions to use null_count() (#18359)
Parquet do not copy uncompressed pages (#18441)
Several large parquet optimizations (#18437)
Batch Plain Parquet UTF-8 verification (#18397)
Partition metadata for parquet statistic loading (#18343)
Fix accidental quadratic parquet metadata (#18327)
Lazy decompress Parquet pages (#18326)
Don't rechunk aligned chunks in owned_binary_chunk_align (#18314)
Batch DELTA_LENGTH_BYTE_ARRAY decoding (#18299)
Slice pushdown for SimpleProjection (#18296)
Use direct path for time/timedelta literals (#18223)
Speedup ndjson reader ~40% (#18197)

✨ Enhancements

Add support for IO[bytes] and bytes in scan_{...} functions (#18532)
Add IEJoin algorithm for non-equi joins and support Full non-equi joins (#18365)
Make expressions containing Python UDFs serializable (#18135)
Support Serde for IRPlan (#18433)
Respect input time zone if input is pandas Timestamp (#18346)
Add POLARS_BACKTRACE_IN_ERR for debugging (#18333)
IR serde (#18298)
Improve decimal_comma error message (#18269)
Support pre-signed URLs for cloud scan (#18274)
Support empty structs (#18249)
Allow float in interpolate_by by column (#18015)

🐞 Bug fixes

Scalar checks (#18627)
Scanning hive partitioned files where hive columns are partially included in the file (#18626)
Enable "polars-json/timezones" feature from "polars-io" (#18635)
Use Buffer<T> in ObjectSeries, fixes variety of offset bugs (#18637)
Properly slice validity mask on pl.Object series (#18631)
Indicative error in list.gather when wrong indices type is supplied (#18611)
Fix group first value after group-by slice (#18603)
Functions for streaming require streaming feature (#18602)
Allow for date/datetime subclasses (e.g. pd.Timestamp, FreezeGun) in pl.lit (#18497)
Fix UnitVec inline clone and with_capacity (#18586)
Ensure result name of pow matches schema in grouped context (#18533)
Decimal mean agg dtype was incorrect in IR (#18577)
Fix output type for list.eval in certain cases (#18570)
Fix map_elements for List return dtypes (#18567)
Do not remove double-sort if maintain_order=True (#18561)
Empty any_horizontal should be false, not true (#18545)
Fix type inference error in map_elements for List types (#18542)
Added proper handling of file.write for large remote csv files (#18424)
Handle Parquet projection pushdown with only row index (#18520)
Properly raise on invalid selector expressions (#18511)
Wrong output column name in or and xor operations (#18512)
Various schema corrections (#18474)
Don't drop objects on empty buffers (#18469)
Add missing chunk align in pipe sink (#18457)
Expr.sign should preserve dtype (#18446)
Enable CSE in eager if struct are expanded (#18426)
Treat explode as gather (#18431)
Fencepost error in debug assertion in splitfields (#18423)
Unsoundness in CSV SplitFields (#18413)
Parquet nested values that span several pages (#18407)
Support reading empty parquet files (#18392)
Recurse on map field during type conversion (#15075)
Allow search_sorted on boolean series (#18387)
Mark Expr.(lower|upper)_bound as returning scalar (#18383)
Fix broken feature gate for ParquetReader (#18376)
Fix compressed ndjson row count (#18371)
Use correct column names when there are no value columns in unpivot (#18340)
Parquet several smaller issues (#18325)
Fix group-by slice on all keys (#18324)
Compute joint null mask before calling rolling corr/cov stats (#18246)
Several scan_parquet(parallel='prefiltered') problems (#18278)
Json feature flag missing imports (#18305)
Check groups in group-by filter (#18300)
Make json readers ignore BOM character (#18240)
Parquet delta encoding for 0-bitwidth miniblocks (#18289)
Arguments for upsample only have to be sorted within groups (#18264)
Use appropriate bins in hist when bin_count specified (#16942)
Raise suitable error on unsupported SQL set op syntax (#18205)
Fix invalid state due to cached IR (#18262)
Fix failed AWS credential load from '~/.aws/credentials' due to formatting (#18259)
Fix panic streaming parquet scan from cloud with slice (#18202)
Consistently round half-way points down in dt.round (#18245)
Fix duplicate column output and panic for include_file_paths (#18255)
Fix unit null rank (#18252)
Use physical for row-encoding (#18251)

📖 Documentation

Fix multiprocessing docs regarding fork method check (#18563)
Pre-compute plugin_path before defining plugin (#18503)
Fix BinViewChunkedBuilder arguments (#17277) (#18439)
Add date_range and datetime_ranges examples without eager=True (#18379)
Document POLARS_BACKTRACE_IN_ERR env var (#18354)
Document DataFrame.__getitem__ and Series.__getitem__ (#18309)
Improve decimal_comma error message (#18269)
Clarify coalesce behaviour in join_asof (#18273)
Add note to Expr.shuffle differentiating from df method (#18266)

📦 Build system

Remove extension-module from polars-python (#18554)
Bump Rust toolchain to nightly-2024-08-26 (#18370)

🛠️ Other improvements

Push down max row group height calc to file metadata (#18674)
Re-use already decoded metadata for first path (new-parquet-source) (#18656)
Remove duplicate byte range calc from new parquet source (#18655)
Fix a bunch of tests for new-streaming (#18659)
Rename MemSlice::from_slice -> MemSlice::from_static (#18657)
Don't raise on multiple same names in ie_join (#18658)
Split parquet_source.rs in new-streaming (#18649)
Check predicates in join_where (#18648)
Feature gate iejoin (#18646)
Scan from BytesIO in new-streaming parquet source (#18643)
Rename MetaData -> Metadata (#18644)
Change join_where semantics (#18640)
Fix unimplemented panics to give todo!s for AUTO_NEW_STREAMING (#18628)
Remove extra schema traits (#18616)
One simplify expression module and keep utility local (#18621)
Check number of binary comparisons in join_where predicates (#18608)
Raise on suffixed predicate in join_where (#18607)
Fix Python docs build (#18605)
Fix nan-ignoring max/min in new-streaming (#18593)
Correctly support more types in new-streaming sum (#18580)
Bump NodeTraverser major version (#18576)
Fix mean reduction in new-streaming (#18572)
Rename data_type -> dtype (#18566)
Refactor ArrowSchema to use polars_schema::Schema<D> (#18564)
Remove NotifyReceiver from new-streaming parquet source (#18540)
Refactor Schema to use generic struct from new polars-schema crate (#18539)
Temporarily pin NumPy in CI to address dependency resolving issue (#18544)
Fix and extend AnyValue comparison (#18534)
Remove top-level metadata from ArrowSchema (#18527)
Add FromIterator impls for PlSmallStr (#18509)
Update PlSmallStr comment (#18518)
Change PlSmallStr impl from Arc<str> to compact_str (#18508)
Make expressions containing Python UDFs serializable (#18135)
Allow polars to pass cargo check on windows (#18498)
Remove From<&&str> for PlSmallStr (#18507)
Change naming to new benchmark setup (#18473)
More refactor for PlSmallStr (#18456)
Split Reduction into it plus ReductionState (#18460)
Remove a string allocation in Parquet (#18466)
Unify internal string type (#18425)
Remove network call in hf docs (#18454)
Remove old streaming flag if we're going into new streaming (#18438)
Address spurious hypothesis test failure (#18434)
Add pl.length() reduction and small new-streaming fixes (#18429)
Fencepost error in debug assertion in splitfields (#18423)
Group arguments in conversion in a Context (#18418)
Turn all Binary/Utf8 into BinaryView/Utf8View in Parquet (#18331)
Recursively evaluate is_elementwise for function expressions (#18385)
Various small fixes for the new streaming engine (#18384)
Temporarily add ability to disable parquet source node (#18378)
Improve dot formatting of new-streaming parquet source (#18367)
Fix the required version of rust in README.md (#18357)
Only instantiate used portion of graph (#18337)
Fix new_streaming parameter (#18342)
Add parquet source node to new streaming engine (#18152)
Disable common sub-expr elim for new streaming engine (#18330)
Remove unused Parquet indexes (#18329)
Lower arbitrary expressions in the new streaming engine (#18315)
Expose many more function expressions to python IR (#18317)
Add Graphviz physical plan visualization for new streaming engine (#18307)
Add DataFrame::new_with_broadcast and simplify column uniqueness checks (#18285)
Add output_schema to all PhysNodes (#18272)
Change fn schema to fn collect_schema (#18261)
Add multiplexer node to new streaming engine (#18241)
Add feature gates for polars-python crate (#18232)
Split py-polars crate (#18204)
Update the required version of rust in README.md (#18203)
Add itertools in utils (#18213)
Use or_else for raising (#18206)
Remove unused Parquet source files (#18193)

Thank you to all our contributors for making this release possible...

Contributors

orlp, philss, and 34 other contributors

Assets 2

11 Sep 13:33

github-actions

py-1.7.0

d8acacf

Python Polars 1.7.0

🏆 Highlights

Add support for IO[bytes] and bytes in scan_{...} functions (#18532)
Add IEJoin algorithm for non-equi joins and support Full non-equi joins (#18365)

🚀 Performance improvements

Back arrow arrays with SharedStorage which can have non-refcounted static slices (#18666)
Don't traverse file list twice for extension validation (#18620)
Remove cloning of ColumnChunkMetadata (#18615)
Add upfront partitioning in ColumnChunkMetadata (#18584)
Enable Parquet parallel=prefiltered for auto (#18514)
Change PlSmallStr impl from Arc<str> to compact_str (#18508)
Added optimizer rules for is_null().all() and similar expressions to use null_count() (#18359)

✨ Enhancements

Update BytecodeParser for upcoming Python 3.13 (#18677)
Add tooltip by default to charts (#18625)
Add support for IO[bytes] and bytes in scan_{...} functions (#18532)
Support shortcut eval of common boolean filters in SQL interface "WHERE" clause (#18571)
Add IEJoin algorithm for non-equi joins and support Full non-equi joins (#18365)
Make expressions containing Python UDFs serializable (#18135)

🐞 Bug fixes

Use IO[bytes] instead of BytesIO in DataFrame.write_parquet() (#18652)
Scalar checks (#18627)
Scanning hive partitioned files where hive columns are partially included in the file (#18626)
Enable "polars-json/timezones" feature from "polars-io" (#18635)
Use Buffer<T> in ObjectSeries, fixes variety of offset bugs (#18637)
Properly slice validity mask on pl.Object series (#18631)
Raise if single argument form in replace/replace_strict is not a mapping (#18492)
Fix group first value after group-by slice (#18603)
Allow for date/datetime subclasses (e.g. pd.Timestamp, FreezeGun) in pl.lit (#18497)
Fix output type for list.eval in certain cases (#18570)
Fix map_elements for List return dtypes (#18567)
Check for duplicate column names in read_database cursor result, raising DuplicateError if found (#18548)
Do not remove double-sort if maintain_order=True (#18561)
Empty any_horizontal should be false, not true (#18545)
Fix type inference error in map_elements for List types (#18542)
Address incorrect align_frames result when the alignment column contains NULL values (#18521)
Fix advertised version in source builds (#18523)
Handle Parquet projection pushdown with only row index (#18520)
DataFrame write_database not passing down "engine_options" when using ADBC (#18451)
Properly raise on invalid selector expressions (#18511)
Wrong output column name in or and xor operations (#18512)
Normalize by default in Series.entropy like Expr.entropy does (#18493)
Various schema corrections (#18474)
Don't drop objects on empty buffers (#18469)
Expr.sign should preserve dtype (#18446)
Ensure assert_frame_not_equal and assert_series_not_equal raise on mismatched input types (#18402)
Fixed Worksheet definition in write_excel type annotations (#18452)

📖 Documentation

Update join_where docs to clarify behaviour (#18670)
Fix multiprocessing docs regarding fork method check (#18563)
Various docstring improvements to testing.assert_* functions (#18494)
Fix formula in ewm_mean_by (#18506)
Pre-compute plugin_path before defining plugin (#18503)
Add Expr.null_count to aggregations (#18459)

🛠️ Other improvements

Fix a bunch of tests for new-streaming (#18659)
Don't raise on multiple same names in ie_join (#18658)
Check predicates in join_where (#18648)
Change join_where semantics (#18640)
Add benchmark tests for join_where with inequalities (#18614)
Check number of binary comparisons in join_where predicates (#18608)
Raise on suffixed predicate in join_where (#18607)
Fix Python docs build (#18605)
Use streaming argument in test_parquet_slice_pushdown_non_zero_offset (#18529)
Fix delta test merge (#18601)
Alter/skip some tests for new streaming (#18574)
Add lower-bound pin for numba (#18555)
Temporarily pin NumPy in CI to address dependency resolving issue (#18544)
Change PlSmallStr impl from Arc<str> to compact_str (#18508)
Make expressions containing Python UDFs serializable (#18135)
Change naming to new benchmark setup (#18473)
Ensure physical arguments to np ufuncs are rechunked (#18471)
Remove a string allocation in Parquet (#18466)
Remove network call in hf docs (#18454)

Thank you to all our contributors for making this release possible!
@0xbe7a, @MarcoGorelli, @WbaN314, @adamreeve, @alexander-beedie, @alonme, @barak1412, @coastalwhite, @dependabot, @dependabot[bot], @eitsupi, @henryharbeck, @ion-elgreco, @krasnobaev, @megaserg, @nameexhaustion, @ohanf, @orlp, @philss, @r-brink, @ritchie46, @skellys, @squnit, @stinodego, @wence- and @yarimiz

Contributors

orlp, philss, and 23 other contributors

Assets 3

28 Aug 18:57

github-actions

py-1.6.0

6ff1c70

Python Polars 1.6.0

💥 Unstable Breaking changes

These API's were marked unstable and are allowed to change.

Use Altair in DataFrame.plot (#17995)

🚀 Performance improvements

Parquet do not copy uncompressed pages (#18441)
Several large parquet optimizations (#18437)
Batch Plain Parquet UTF-8 verification (#18397)
Partition metadata for parquet statistic loading (#18343)
Fix accidental quadratic parquet metadata (#18327)
Lazy decompress Parquet pages (#18326)
Don't rechunk aligned chunks in owned_binary_chunk_align (#18314)
Batch DELTA_LENGTH_BYTE_ARRAY decoding (#18299)
Slice pushdown for SimpleProjection (#18296)
Use direct path for time/timedelta literals (#18223)
Speedup ndjson reader ~40% (#18197)
Skip parquet page when unneeded (#18192)

✨ Enhancements

Use Altair in DataFrame.plot (#17995)
Allow mapping as syntactic sugar in str.replace_many (#18214)
Respect input time zone if input is pandas Timestamp (#18346)
Improve Schema and DataType interop with Python types (#18308)
Add POLARS_BACKTRACE_IN_ERR for debugging (#18333)
IR serde (#18298)
Improve decimal_comma error message (#18269)
Support pre-signed URLs for cloud scan (#18274)
Support the most recent version of "duckdb_engine" connections via read_database (#18277)
Support empty structs (#18249)
Allow float in interpolate_by by column (#18015)
Make show_versions more responsive (#18208)

🐞 Bug fixes

Enable CSE in eager if struct are expanded (#18426)
Treat explode as gather (#18431)
Parquet nested values that span several pages (#18407)
Support reading empty parquet files (#18392)
Recurse on map field during type conversion (#15075)
Allow search_sorted on boolean series (#18387)
Mark Expr.(lower|upper)_bound as returning scalar (#18383)
Fix compressed ndjson row count (#18371)
Use correct column names when there are no value columns in unpivot (#18340)
Parquet several smaller issues (#18325)
Fix group-by slice on all keys (#18324)
Compute joint null mask before calling rolling corr/cov stats (#18246)
Several scan_parquet(parallel='prefiltered') problems (#18278)
Json feature flag missing imports (#18305)
Check groups in group-by filter (#18300)
Parquet delta encoding for 0-bitwidth miniblocks (#18289)
Arguments for upsample only have to be sorted within groups (#18264)
Use appropriate bins in hist when bin_count specified (#16942)
Raise suitable error on unsupported SQL set op syntax (#18205)
Fix invalid state due to cached IR (#18262)
Fix failed AWS credential load from '~/.aws/credentials' due to formatting (#18259)
Fix panic streaming parquet scan from cloud with slice (#18202)
Consistently round half-way points down in dt.round (#18245)
Fix duplicate column output and panic for include_file_paths (#18255)
Fix unit null rank (#18252)
Use physical for row-encoding (#18251)
Convert date and datetime in literal construction (#16018)
Fix gather str as lit (#18207)

📖 Documentation

Add date_range and datetime_ranges examples without eager=True (#18379)
Fix incorrect comments in group_by_dynamic (#18415)
Alphabetise methods in Python API reference (#18380)
Document POLARS_BACKTRACE_IN_ERR env var (#18354)
Add missing aggregation entries (#18334) (#18341)
Add missing Series methods to API reference (#18312)
Document DataFrame.__getitem__ and Series.__getitem__ (#18309)
Fix typos and add see also links to struct name expressions (#18282)
Improve decimal_comma error message (#18269)
Clarify coalesce behaviour in join_asof (#18273)
Add note to Expr.shuffle differentiating from df method (#18266)
Improve formatting and consistency of various docstrings (#18237)
Add missing "Parameters" section to bin.size expr docstring (#18222)
Fix column name output in example of DataFrame.map_rows (#18227)

📦 Build system

Bump Rust toolchain to nightly-2024-08-26 (#18370)

🛠️ Other improvements

Address spurious hypothesis test failure (#18434)
Turn all Binary/Utf8 into BinaryView/Utf8View in Parquet (#18331)
Fix the required version of rust in README.md (#18357)
Remove unused Parquet indexes (#18329)
Deprecate serialize json for LazyFrame (#18283)
Don't add sink node to cloud query (#18280)
Split py-polars crate (#18204)
Fix test for new deltalake release (#18211)
Update the required version of rust in README.md (#18203)
Fix version bifurcation for test_read_database_cx_credentials (#18220)
Use or_else for raising (#18206)
Remove unused Parquet source files (#18193)

Thank you to all our contributors for making this release possible!
@BartSchuurmans, @ChayimFriedman2, @MarcoGorelli, @StepfenShawn, @agossard, @alexander-beedie, @cgbur, @coastalwhite, @corwinjoy, @deanm0000, @henryharbeck, @ion-elgreco, @jqnatividad, @krasnobaev, @liufeimath, @markxwang, @mcrumiller, @nameexhaustion, @orlp, @ritchie46, @stinodego, @sunadase, @thomascamminady and @wence-

Contributors

orlp, wence-, and 22 other contributors

Assets 3

14 Aug 14:59

github-actions

rs-0.42.0

7686025

Rust Polars 0.42.0

💥 Breaking changes

Reject literal input in sort_by_exprs() (#17606)

🚀 Performance improvements

Skip parquet page when unneeded (#18192)
Improve binview extend/ifthenelse (#18164)
Start on better Parquet delta decoding (#18049)
Tune jemalloc to not create muzzy pages (#18148)
Reduce default async thread count (#18142)
Use single threaded algorithms if only 1 core given (#18101)
Use Arc<Vec<_>> instead of Arc<[_]> for paths and hive partitions (#18066)
SIMD View from FixedSizeBinary (#18059)
Use bitmask to filter Parquet predicate-pushdown items (#17993)
Zerocopy buffers for FixedSizeBinary to BinaryView cast (#18043)
Integer fast path Parquet dict encoding (#18030)
Speedup writing of Parquet primitive values (#18020)
Remove temporary allocations in Parquet (#18013)
Delay selection expansion (#18011)
Optimize strings slices (#17996)
Make .dt.weekday 20x faster (#17992)
Shrink MemSliceInner enum (#17991)
Push down slice with non-zero offset to Parquet (#17972)
Reduce copy in MemSlice (#17983)
Ensure metadata flags are maintained on vertical parallelization (#17804)
Ensure only nodes that are not changed are cached in collapse optimizer (#17791)
Use bitflags for OptState (#17788)
Remove async directory auto-detection (#17779)
Fix accidental quadratic horizontal concat (#17783)
Batch parquet integer decoding (#17734)
Use mmap-ed memory if possible in Parquet reader (#17725)
Use bitflags for function options (#17723)
Introduce MemReader to file buffer in Parquet reader (#17712)
Better GC and push_view for binviews (#17627)
Fix pathological perf issue in window-order-by (#17650)
Cache path resolving of scan functions (#17616)
Add ArrayChunks to optimize codegen of BatchDecoder (#17632)
Rechunk before we go into grouped gathers (#17623)
Cache schema resolve back to DSL (#17610)
Add fastpath for when rounding by single constant durations (#17580)
Improve parallelism in writing hive parquet (#17512)
Support datetime in predicate during hive partition pruning (#17545)
Batch nested embed parquet decoding (#17549)
Batch nested Parquet decoding (#17542)
Collect Parquet dictionary binary as view (#17475)
Keep more parallelism when CSE plan cache hits (#17463)
Batch parquet primitive decoding (#17462)
Respect allow_threading in some more operators (#17450)
Parallelize parquet metadata deserialization (#17399)

✨ Enhancements

Create literals for datetime/date expressions (#18184)
Create literals in 'datetime' expression (#18182)
Add missing impl for Series (#18166)
Raise on invalid 'is_between' and improve error message quality (#18147)
Add boolean Parquet HybridRle encoding (#18022)
Add nested SQL join support (#18006)
Push down slice with non-zero offset to Parquet (#17972)
Add support for binary size method to Expr and Series "bin" namespace (#17924)
Add SQL interface support for PostgreSQL dollar-quoted string literals (#17940)
Allow for parsing parquet file where the time zone is stored as lowercase "utc" (#17925)
Expose binary_elementwise_into_string_amortized for plugin authors, recommend apply_into_string_amortized instead of apply_to_buffer (#17903)
Decompress in CSV / NDJSON scan (#17841)
Ensure unique names in HConcat (#17884)
Support authentication with HuggingFace login (#17881)
Support "BY NAME" qualifier for SQL "INTERSECT" and "EXCEPT" set ops (#17835)
Raise informative error instead of panicking when passing invalid directives to to_string for Date dtype (#17670)
Implement forward/backward fill for all types (#17861)
Implement is_in operation on decimal type (#17832)
Support hf:// in read_(csv|ipc|ndjson) functions (#17785)
Allow literals in sort (#17780)
Cloud support for NDJSON (#17717)
Support API token for scanning hf:// (#17682)
Raise error instead of panic in unsupported serde (#17679)
Include file path option for NDJSON (#17681)
Hugging Face path expansion (#17665)
Add DSL validation for cloud eligible check (#17287)
Raise informative error message if non-IntoExpr is passed by name in *Frame.group_by (#17654)
Change API for writing partitioned Parquet to reduce code duplication (#17586)
Cache schema resolve back to DSL (#17610)
Expose returns_scalar to map_elements (#17613)
Add option to include file path for Parquet, IPC, CSV scans (#17563)
Support describe on decimal (#15092)
Support datetime in predicate during hive partition pruning (#17545)
Raise more informative error message for directories containing files with mixed extensions (#17480)
Exclude empty files from directory/glob expansion (#17478)
Add "future" versioning (#17421)
Apply slice pushdown immediately to in-memory frames (#17459)
Support writing hive partitioned parquet (#17324)
Add right join support (#17441)
Support hive partitioning in scan_ipc (#17434)

🐞 Bug fixes

Fix struct shift and list builder (#18189)
Don't load Parquet nested metadata (#18183)
Throw bigidx error for Parquet row-count (#18154)
Fix unpivot on empty df (#18179)
Don't vertically parallelize cse contexts (#18177)
Properly handle empty Parquet row groups with no dictionary (#18161)
Struct outer nullabillity (#18156)
Fix pyarrow predicate pushdown regression (#18145)
Prevent unwanted supertype cast in 'search_sorted' (#18143)
Parquet with filter=None (#18139)
Don't raise when converting from pandas if index contains duplicate names when include_index=False (the default) (#18133)
Don't remove leading whitespace in read_csv (#18131)
Py-polars compilation with no features (#18129)
String transform to_titlecase was too narrowly defined (#18122)
Reading Parquet with Null dictionary page (#18112)
Incorrect lazy CSV select(len()) for compressed files (#18067)
Fix sink_ipc_cloud panicking with runtime error (#18091)
Properly write Parquet for sliced lists (#18073)
Panic reading multiple CSV files from cloud (#18056)
Fix CloudWriter to use buffer before making requests (#18027)
Fix typos and remove trailing whitespace (#18024)
Handle cfg(feature) for shrink_dtype (#18038)
Subtraction with overflow on negative slice offset in Parquet (#18036)
Add nested SQL join support (#18006)
Allow read_csv schema to take unparsable types (#17765)
Multi-output column expressions in frame sort method (#17947)
Fix Asof join by schema (#17988)
Fix glob resolution for Hugging Face (#17958)
Several parquet reader/writer regressions (#17941)
Incorrect filter on categorical columns from parquet files (#17950)
SQL COUNT(DISTINCT x) should not include NULL values (#17930)
Scanning '%' from cloud (#17890)
Respect glob=False for cloud reads (#17860)
Properly write nest-nulled values in Parquet (#17845)
Allow full-null Object series to be built (#17870)
Fix from_arrow for struct type (#17839)
Infer decimal scales on mixed scale input (#17840)
Raise on unsupported fill strategy dtype (#17837)
Properly write nested NullArray in Parquet (#17807)
Check input type on list.to_struct (#17834)
Fix right join schema (#17833)
Non-compliant Parquet list element name (#17803)
Correctly set should_broadcast flag in HStack CSE rewrite (#17784)
Fix projection pusdhown of literals without names (#17778)
Don't expand HTTP paths (#17774)
Check funtion input len at expansion (#17763)
Don't panic in invalid agg_groups (#17762)
Raise empty struct (#17736)
Fix GC logic in write_ipc (#17752)
Panic in pl.concat_list and list.concat on empty inputs (#17742)
Fix out nullability for structs coming from arrow (#17738)
Percent encode for Hugging Face paths (#17718)
Use bytemuck in slice reinterpret for Parquet ArrayChunks (#17700)
Propagate struct outer nullability eagerly (#17697)
Use ETag for HTTP file cache invalidation (#17684)
Fix type inference failure caused by double transpose (#17663)
Interpret %y consistently with Chrono in to_date/to_datetime/strptime (#17661)
Fix explode invalid check (#17651)
Tighten up error checking on join keys (#17517)
Expand brackets in async glob expansion (#17630)
Fix row index disappearing after projection pushdown in NDJSON (#17631)
Fix struct -> enum is_in (#17622)
Don't needlessly unwrap in pivot_schema (#17611)
Reject literal input in sort_by_exprs() (#17606)
Bitmap collect into safety (#17588)
Method dt.truncate was sometimes returning incorrect results for pre-1970 datetimes (#17582)
Defer path expansion until collect in file scan methods (#17532)
Correct logic for descending sort of BooleanChunked (#17558)
Don't unwrap send attempt to oneshot channel (#17566)
Fix scanning from HTTP cloud paths (#17571)
Properly implement struct (#17522)
Add missing commas in python IR interchange (#17518)
Fix predicate pushdown for .list.(get|gather) (#17511)
Turn panic into error when serializing Object types (#17353)
Fix struct expansion and raise on exclude (#17489)
Fix decimal dyn float supertype (#17464)
Don't rechunk on phys_repr (#17461)
Harden alchemy session for old sqlalchemy versions (#17366)
Fix swapping rename schema (#17458)
Raise on oob decimal precision (#17445)
Don't allow json inference method to be chunked/streaming (#17396)
avoid panic when projecting solitary count into empty frame (#17393)
Set literal nesting to 0 (#17392)
Fix scanning cloud paths with spaces (#17379)
Fix slice length no longer allowing None (#17372)
Cull row index in scan if projection pushdown removes it (#17363)
Fix typo in SchemaError exception message (#17350)

📖 Documentation

Mention 'Array' in data types overview (#18060)
Correct concat rech...

Contributors

orlp, knl, and 51 other contributors

Assets 2

14 Aug 19:02

github-actions

py-1.5.0

d0475d7

Python Polars 1.5.0

🚀 Performance improvements

Improve binview extend/ifthenelse (#18164)
Start on better Parquet delta decoding (#18049)
Rechunk group-by __iter__ (#18162)
Tune jemalloc to not create muzzy pages (#18148)
Reduce default async thread count (#18142)
Make expensive selector expansion lazy (#18118)
Use single threaded algorithms if only 1 core given (#18101)
Use Arc<Vec<_>> instead of Arc<[_]> for paths and hive partitions (#18066)
SIMD View from FixedSizeBinary (#18059)
Use bitmask to filter Parquet predicate-pushdown items (#17993)
Zerocopy buffers for FixedSizeBinary to BinaryView cast (#18043)

✨ Enhancements

Create literals for datetime/date expressions (#18184)
Create literals in 'datetime' expression (#18182)
Expose top-level "has_header" param for read_excel and read_ods (#18078)
Raise on invalid 'is_between' and improve error message quality (#18147)

🐞 Bug fixes

Fix struct shift and list builder (#18189)
Don't load Parquet nested metadata (#18183)
Throw bigidx error for Parquet row-count (#18154)
Fix unpivot on empty df (#18179)
Don't vertically parallelize cse contexts (#18177)
Ensure default values are included when saving/restoring the current Config state (#18151)
Properly handle empty Parquet row groups with no dictionary (#18161)
Struct outer nullabillity (#18156)
Fix pyarrow predicate pushdown regression (#18145)
Prevent unwanted supertype cast in 'search_sorted' (#18143)
Parquet with filter=None (#18139)
Don't raise when converting from pandas if index contains duplicate names when include_index=False (the default) (#18133)
Fix cast Float to String where Float is not turn to Integer before turning to String (#18123)
Don't remove leading whitespace in read_csv (#18131)
Py-polars compilation with no features (#18129)
String transform to_titlecase was too narrowly defined (#18122)
Reading Parquet with Null dictionary page (#18112)
When setting write_excel column totals, don't forget to include any row-total cols (#18042)
Incorrect lazy CSV select(len()) for compressed files (#18067)
Fix sink_ipc_cloud panicking with runtime error (#18091)
Properly write Parquet for sliced lists (#18073)
Panic reading multiple CSV files from cloud (#18056)
Fix CloudWriter to use buffer before making requests (#18027)
Fix typos and remove trailing whitespace (#18024)
Handle cfg(feature) for shrink_dtype (#18038)

📖 Documentation

Fix references to old methods in lazy docstring (#18178)
Include PyCapsule Interface in DataFrame and Series API docs (#18174)
Corrected example result in group_by docs (#18169)
Mention 'Array' in data types overview (#18060)
Correct concat rechunk in user guide (#18080)
Fix typo in title of Hugging Face docs page (#18097)
Update pivot docstring for clarity (#18000)

🛠️ Other improvements

Remove unneeded growable (#18165)
Update Cargo.lock to fix build error on Linux (#18153)
Remove Nth,Wildcard from ExprIR and make conversion falllible (#18115)

Thank you to all our contributors for making this release possible!
@EricTulowetzke, @KDruzhkin, @MarcoGorelli, @Vincenthays, @alexander-beedie, @coastalwhite, @davanstrien, @deanm0000, @ember91, @kylebarron, @mcrumiller, @nameexhaustion, @orlp, @philss, @ritchie46 and @rosstitmarsh

Contributors

orlp, philss, and 14 other contributors

Assets 3

04 Aug 12:51

github-actions

py-1.4.1

0c2b5d8

Python Polars 1.4.1

🚀 Performance improvements

Integer fast path Parquet dict encoding (#18030)
Speedup writing of Parquet primitive values (#18020)
Remove temporary allocations in Parquet (#18013)

✨ Enhancements

Add boolean Parquet HybridRle encoding (#18022)
Support passing Worksheet objects to the write_excel method (#18031)

🐞 Bug fixes

Subtraction with overflow on negative slice offset in Parquet (#18036)
Fix drop selector (#18034)

📖 Documentation

Update map_batches docstring (#18001)

🛠️ Other improvements

Add @coastalwhite to parquet codeowners (#18032)
Minor bump to comfy-table version (#18028)

Thank you to all our contributors for making this release possible!
@alexander-beedie, @coastalwhite, @deanm0000, @nameexhaustion and @ritchie46

Contributors

alexander-beedie, ritchie46, and 3 other contributors

Assets 3

02 Aug 10:35

github-actions

py-1.4.0

618a710

Python Polars 1.4.0

🚀 Performance improvements

Delay selection expansion (#18011)
Optimize strings slices (#17996)
Make .dt.weekday 20x faster (#17992)
Shrink MemSliceInner enum (#17991)
Push down slice with non-zero offset to Parquet (#17972)
Reduce copy in MemSlice (#17983)

✨ Enhancements

Add nested SQL join support (#18006)
Push down slice with non-zero offset to Parquet (#17972)
Add support for binary size method to Expr and Series "bin" namespace (#17924)
IO plugins (#17939)
Add SQL interface support for PostgreSQL dollar-quoted string literals (#17940)
Allow for parsing parquet file where the time zone is stored as lowercase "utc" (#17925)

🐞 Bug fixes

Add nested SQL join support (#18006)
Respect strict argument (#17990)
Multi-output column expressions in frame sort method (#17947)
Fix Asof join by schema (#17988)
Set default flags for FFI plugin (#17984)
Fix glob resolution for Hugging Face (#17958)
Several parquet reader/writer regressions (#17941)
Incorrect filter on categorical columns from parquet files (#17950)
SQL COUNT(DISTINCT x) should not include NULL values (#17930)
Default to None in pycapsule interface export (#17922)

📖 Documentation

Fix aggregation guide discrepancies (#18003)
Ensure last is never ambiguous with max (#17962)
Documentation for Arrow PyCapsule interface integration (#17935)
Fix Hugging Face link in user guide (#17943)

🛠️ Other improvements

Add unit tests for str.contains_any and str.replace_many (#17961)
Suggest allow_null as replacement (#17969)
Remove apply_generic, use unary_elementwise (#17902)
Add general filters in Parquet (#17910)

Thank you to all our contributors for making this release possible!
@JamesCE2001, @MarcoGorelli, @alexander-beedie, @coastalwhite, @deanm0000, @deepyaman, @dependabot, @dependabot[bot], @henryharbeck, @kylebarron, @nameexhaustion, @ritchie46 and @wangxiaoying

Contributors

alexander-beedie, ritchie46, and 10 other contributors

Assets 3

28 Jul 09:54

github-actions

py-1.3.0

9c29683

Python Polars 1.3.0

🚀 Performance improvements

Ensure metadata flags are maintained on vertical parallelization (#17804)
Ensure only nodes that are not changed are cached in collapse optimizer (#17791)
Use bitflags for OptState (#17788)
Remove async directory auto-detection (#17779)
Fix accidental quadratic horizontal concat (#17783)
Batch parquet integer decoding (#17734)
Use mmap-ed memory if possible in Parquet reader (#17725)
Use bitflags for function options (#17723)
Also set target features and tune cpu for CC (#17716)
Introduce MemReader to file buffer in Parquet reader (#17712)

✨ Enhancements

Expose binary_elementwise_into_string_amortized for plugin authors, recommend apply_into_string_amortized instead of apply_to_buffer (#17903)
Expose allocator to capsule (#17817)
Decompress in CSV / NDJSON scan (#17841)
Ensure unique names in HConcat (#17884)
Support authentication with HuggingFace login (#17881)
Enable collection with gpu engine (#17550)
Support "BY NAME" qualifier for SQL "INTERSECT" and "EXCEPT" set ops (#17835)
Write data at table level in write_excel (#17757)
Support PyCapsule Interface in DataFrame & Series constructors (#17693)
Implement Arrow PyCapsule Interface for Series/DataFrame export (#17676)
Raise informative error instead of panicking when passing invalid directives to to_string for Date dtype (#17670)
Implement forward/backward fill for all types (#17861)
Implement is_in operation on decimal type (#17832)
Optimise read_excel when using "calamine" engine with the latest fastexcel (#17735)
Support hf:// in read_(csv|ipc|ndjson) functions (#17785)
Allow literals in sort (#17780)
Expose 'strict' argument to 'is_in' (#17776)
Release the GIL in collect_schema (#17761)
Cloud support for NDJSON (#17717)
Support API token for scanning hf:// (#17682)

🐞 Bug fixes

Scanning '%' from cloud (#17890)
Raise suitable error when invalid column passed to get_column_index (#17868)
Respect glob=False for cloud reads (#17860)
Properly write nest-nulled values in Parquet (#17845)
Improve default write_excel int/float format when using a dark "table_style" (#17869)
Fix from_arrow for struct type (#17839)
Fix bool/string usage of "column_totals" parameter in write_excel (#17846)
Infer decimal scales on mixed scale input (#17840)
Don't ignore timezones in list of dicts constructor (#14211)
Raise on unsupported fill strategy dtype (#17837)
Properly write nested NullArray in Parquet (#17807)
Check input type on list.to_struct (#17834)
Fix right join schema (#17833)
Simultaneous usage of named_expr and schema in pl.struct (#17768)
Fix projection pusdhown of literals without names (#17778)
Don't expand HTTP paths (#17774)
Check funtion input len at expansion (#17763)
Don't panic in invalid agg_groups (#17762)
Raise empty struct (#17736)
Fix GC logic in write_ipc (#17752)
Panic in pl.concat_list and list.concat on empty inputs (#17742)
Fix out nullability for structs coming from arrow (#17738)
Percent encode for Hugging Face paths (#17718)

📖 Documentation

Updating the join example input for rust for consistency with python example (#17898)
Improve filter documentation (#17755)
Reword "how" param docstring entry for 'semi' and 'anti' join types for clarity (#17843)
Mention read_* functions in Hugging Face section in user guide (#17799)
Show return type for Series attributes in API reference (#17759)
Add function with multiple arguments example to Expr.map_batches (#17789)
Add Hugging Face section to user guide (#17721)

📦 Build system

Update Rust toolchain to nightly-2024-07-26 (#17891)
Correctly reference released package in optional dependencies (#17691)

🛠️ Other improvements

On Python release, trigger docs build after API reference build (#17904)
Set uv pip install to verbose (#17901)
Fix broken typos command in make pre-commit for py-polars folder (#17897)
Remove HybridRLE iter / batch nested parquet decoding (#17889)
Add version field for python IR (#17876)
Pass through missing rolling and stringfunction information in pyir (#17702)
Make better use of typos configuration features (#17800)
Better deprecate message for _import_from_c (#17753)
Rename Unit to Plain in Parquet reader (#17751)
Unpin setuptools (#17726)
Update CODEOWNERS (#17707)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @Object905, @SandroCasagrande, @alexander-beedie, @atigbadr, @coastalwhite, @deanm0000, @delsner, @dependabot, @dependabot[bot], @henryharbeck, @implicit-apparatus, @jparag, @knl, @kylebarron, @lukapeschke, @mcrumiller, @nameexhaustion, @orlp, @ritchie46, @ruihe774, @stinodego, @szepeviktor and @wence-

Contributors

orlp, knl, and 21 other contributors

Assets 3

Releases: pola-rs/polars

Rust Polars 0.43.1

🐞 Bug fixes

🛠️ Other improvements

Contributors

Python Polars 1.7.1

🐞 Bug fixes

🛠️ Other improvements

Contributors

Rust Polars 0.43.0

🏆 Highlights

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

📦 Build system

🛠️ Other improvements

Contributors

Python Polars 1.7.0

🏆 Highlights

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

🛠️ Other improvements

Contributors

Python Polars 1.6.0

💥 Unstable Breaking changes

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

📦 Build system

🛠️ Other improvements

Contributors

Rust Polars 0.42.0

💥 Breaking changes

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

Contributors

Python Polars 1.5.0

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

🛠️ Other improvements

Contributors

Python Polars 1.4.1

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

🛠️ Other improvements

Contributors

Python Polars 1.4.0

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

🛠️ Other improvements

Contributors

Python Polars 1.3.0

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

📦 Build system

🛠️ Other improvements

Contributors