Releases: pola-rs/polars
Python Polars 1.20.0
⚠️ Deprecations
- Make parameter of
str.to_decimal
keyword-only (#20570)
🚀 Performance improvements
- Extend functionality on BitmapBuilder and use in Growables (#20754)
- Specialize first/last agg for simple types in new-streaming engine (#20728)
- Use PyO3 to convert between Python and Rust datetimes (#20660)
- Improve state caching and parallelism of window functions (#20689)
- Broadcast without materialization in
concat_arr
(#20681) - Cache rolling groups (#20675)
- Use downcast_ref instead of dtype equality in
<dyn SeriesTrait as AsRef<ChunkedArray<T>>
(#20664) - Fix performance regression for DataFrame serialization/pickling (#20641)
- Make Parquet
verify_dict_indices
SIMD (#20623) - Move to
zlib-rs
by default and usezstd::with_buffer
(#20614) - Skip filter expansion in eager (#20586)
- Improve unique pred-pd (#20569)
✨ Enhancements
- Allow different python versions for pickle (#20740)
- Add SQL support for the
NORMALIZE
string function (#20705) - Add 'allow_exact_matches' join_asof' (#20723)
- Add new-streaming first/last aggregations (#20716)
- Add Parquet Sink to new streaming engine (#20690)
- Make automatic use of Azure storage account keys opt-in (#20652)
- Reduce scan_csv() (and friends') memory usage when using BytesIO (#20649)
- Improve
GroupsProxy/GroupsPosition
to be sliceable and cheaply cloneable (#20673) - Add
str.normalize()
(#20483) - Allow more group_by agg expressions in the new streaming engine (#20663)
- Support loading Excel Table objects by name (#20654)
- Support writing to file objects from
write_excel
(#20638) - Raise
DuplicateError
if given a pyarrow Table object with duplicate column names (#20624) - Support writing partitioned parquet to cloud (#20590)
- Add hint to error message for extra struct field in JSON (#20612)
- Add
index_of()
function toSeries
andExpr
(#19894) - Update
sqlparser-rs
, enabling "LEFT" keyword to be optional for anti/semi joins in SQL queries (#20576) - Add
cat.starts_with
/cat.ends_with
(#20257)
🐞 Bug fixes
- Avoid blocking on async runtime when resolving cloud scans (#20750)
- Fix
allow_invalid_certificates
being ignored instorage_options
(#20744) - Incorrect output type for
map_groups
returning all-NULL column (#20743) - Fix
unique(maintain_order=True)
raisingInvalidOperationError
for null array (#20737) - Don't collapse into a Nested Loop Join if the cross join maintains order (#20729)
- Don't serialize credentials provider (#20741)
- Fix
Series.n_unique
raising for list of struct (#20724) - Fix incorrect top-k by sorted column, fix
head()
returning extra rows (#20722) - Add outer validity to AnyValueBufferTrusted for structs (#20713)
- Don't partition group-by with non-scalar literals in agg (#20704)
- Fix xor operation of selector with Expr (#20702)
- Incorrect view buffer dedup (#20691)
- Only verify Parquet ConvertedType if no LogicalType is given (#20682)
- Validate length of
schema_overrides
inread_csv
(#20672) - Fix
map_elements
ignoringskip_nulls=True
for struct dtype (#20668) - Check for MAP-GROUPS in cloud-eligible (#20662)
- Fix empty output of
to_arrow()
on filtered unit height DataFrame (#20656) - Add
.default
to azure credential provider scope URL (#20651) - Fix
join_asof
panicking for invalidtolerance
input (#20643) - Incorrect flag check on is_elementwise (#20646)
- Don't panic but set null type if type is unknown (#20647)
- Fix performance regression for DataFrame serialization/pickling (#20641)
- Fix
Int128
dtype serialization (#20629) - Ensure
read_excel
andread_ods
support reading from rawbytes
for all engines (#20636) - Ensure that SQL
LIKE
andILIKE
operators support multi-line matches (#20613) - Properly broadcast in sort_by (#20434)
- Properly load nested Parquet Statistics (#20610)
- AWS environment config was not loaded when credential provider was used (#20611)
- Fix order observability of group-by-dyn (#20615)
- Soundness when loading Parquet string statistics (#20585)
- Fix error filtering after
with_columns()
on unit height LazyFrame (#20584) - Propagate
tenant_id
toCredentialProviderAzure
if given (#20583) - Restore symbols on Apple by bumping nightly version (#20563)
- Fix type annotation of
str.strip_chars_*
methods (#20565) - Fix variable name in error message for "unsupported data type" in rolling and upsampling operations (#20553)
📖 Documentation
- Add more information for cross joins (#20753)
- Fix typo in sql functions (cosinus -> cosine) (#20676)
- Add links to
read_excel
"engine_options" and "read_options" docstring (#20661) - Fix small typo in plugins (polars-dt -> polars-st) (#20657)
- Add polars-h3 and polars-st to plugin list (#20653)
- Add docs reference for
Field
(#20625) - Update
DataFrame
join examples (#20587) - Miscellaneous minor updates/fixes (#20573)
- Update "group_by_rolling" (deprecated) to "rolling" in user guide (#20548)
📦 Build system
🛠️ Other improvements
- Fix remote benchmark script (#20755)
- Fix tests (#20745)
- Simplify hive predicate handling in
NEW_MULTIFILE
(#20730) - Add tests for various open issues (#20720)
- Fixes an Excel test following new
fastexcel
release (#20703) - Add tests for various open issues that have been fixed (#20680)
- Don't include debug symbols in benchmark run (#20571)
- Implement CSV, IPC and NDJson in the
MultiScanExec
node (#20648) - Don't rely on argument order of optimization_toggle (#20622)
- Fix Python deps installation in remote-benchmark workflow (#20619)
- Fix flaky categorical test (#20591)
- Bump multiversion from 0.7 to 0.8 (#20543)
- Remove unused nested function in
LazyFrame.fill_null
(#20558) - Improve bin size info (#20551)
Thank you to all our contributors for making this release possible!
@Jesse-Bakker, @MarcoGorelli, @MoizesCBF, @SamuelAllain, @alexander-beedie, @bschoenmaeckers, @coastalwhite, @eitsupi, @etiennebacher, @itamarst, @jqnatividad, @lukemanley, @mcrumiller, @nameexhaustion, @orlp, @ritchie46 and @stinodego
Python Polars 1.19.0
🚀 Performance improvements
- Collapse expanded filters in eager (#20493)
- Remove predicate from
IR::DataFrame
(#20492) - Use different binview dedup strategy depending on chunks ratio (#20451)
- Generalize the
arg_sort
fast path ontoColumn
(#20437) - Dedup binviews up front (#20449)
- Re-enable common subplan elim for new-streaming engine (#20443)
- Don't collect all LHS arrays in gather (#20441)
- Remove prepare_series for gather kernels (#20439)
- Don't always take all data buffers when gathering views (#20435)
✨ Enhancements
- Add
Int128
IO support for csv & ipc (#20535) - Support arbitrary expressions in 'join_where' (#20525)
- Allow use of Python types in
cs.by_dtype
andcol
(#20491) - Add an "include_file_paths" parameter to
read_excel
andread_ods
(#20476) - Allow more join lossless casting (#20474)
- Accept more generic
Iterable[bool]
in Series.filter (#20431) - Allow loading data from multiple Excel/ODS workbooks and worksheets (#20465)
🐞 Bug fixes
- Output index type instead of u32 for
sum_horizontal
with boolean inputs (#20531) - Fix more global categorical issues (#20547)
- Update eager join doctest on multiple columns (#20542)
- Revert categorical unique code (#20540)
- Add
unique
fast path for empty categoricals (#20536) - Fix various
Int128
operations (#20515) - Fix global cat unique (#20524)
- Fix union (#20523)
- Fix rolling aggregations for various integer types (#20512)
- Ensure
ignore_nulls
is respected in horizontal sum/mean (#20469) - Fix incorrectly added sorted flag after append for lexically ordered categorical series (#20414)
- More
Int128
testing and related fixes (#20494) - Validate column names in
unique()
for empty DataFrames (#20411) - Implement
list.min
andlist.max
forlist[i128]
(#20488) - Decimal from physical in horizontal min/max and shift (#20487)
- Don't remove sort if first/last strategy is set in unique (#20481)
- Fix join literal behavior (#20477)
- Validate asof join by args in IR resolving phase (#20473)
- Fix
align_frames
with single row panicking (#20466) - Allow multiple column sort for Decimal (#20452)
- Fix mode panicking for String dtype (#20458)
- Return correct schema for
sum_horizontal
with boolean dtype (#20459) - Fix return type for
add_business_days
,millennium
,century
andcombine
methods inSeries.dt
namespace (#20436)
📖 Documentation
- Fix typo in
DataFrame.cast
(#20532) - Fix flaky doctests (#20516)
- Add examples for bitwise expressions (#20503)
- Clarify the join pre-condition of
join_asof
(#20509) - Fix
Expr.all
description of Kleene logic (#20409)
🛠️ Other improvements
- Increase categorical test coverage (#20514)
- Report wheel sizes (#20541)
- Add tests for
floor/ceil
on integers (#20479) - Expose and rewrite 'can_pre_agg' (#20450)
- Skip test on windows; kuzu import segfaults (#20463)
- Add a
TypeCheckRule
to the optimizer (#20425)
Thank you to all our contributors for making this release possible!
@Biswas-N, @IndexSeek, @Prathamesh-Ghatole, @Terrigible, @alexander-beedie, @brifitz, @coastalwhite, @dependabot, @dependabot[bot], @jqnatividad, @lukemanley, @mcrumiller, @orlp, @ritchie46 and @siddharth-vi
Python Polars 1.18.0
🏆 Highlights
- Add new
Int128Type
(#20232)
🚀 Performance improvements
- Order observability optimizations (#20396)
- Purge ChunkedArray Metadata (#20371)
- Explicit transpose in new-streaming equi-join finalize (#20363)
- Cache dtype on ExprIR (#20331)
- Lower overhead for
BytecodeParser
on introspection of incompatible UDFs (#20280)
✨ Enhancements
- Always resolve dynamic types in schema (#20406)
- Support loading data from multiple Excel/ODS workbooks (#20404)
- Add "drop_empty_cols" parameter for
read_excel
andread_ods
(#20430) - Order observability optimizations (#20396)
- Add FirstArgLossless supertype (#20394)
- Add
dt.replace
(#19708) - Polars build for Pyodide (#20383)
- Add Azure credential provider using
DefaultAzureCredential()
(#20384) - Add env var to ignore file cache allocate error (#20356)
- Enable joins between compatible differing numeric key columns (#20332)
- Cache dtype on ExprIR (#20331)
- Serialize DataFrame/Series using IPC in serde (#20266)
- Improve error message on SchemaError (#20326)
- Use better error messages when opening files (#20307)
- Add 'skip_lines' for CSV (#20301)
- Allow subtraction of time dtype columns (#20300)
- Add
bin.reinterpret
(#20263) - Allow decoding of non-Polars arrow dictionaries in Arrow and Parquet (#20248)
- Streamline creation of empty frame from
Schema
(#20267) - Add
cat.len_chars
andcat.len_bytes
(#20211) - Expose AexprArena (#20230)
🐞 Bug fixes
- Fix nullable object in map_elements (#20422)
- Properly handle
to_physical_repr
of nested types (#20413) - Properly raise UDF errors (#20417)
- Workaround for
mmap
crash under Emscripten (#20418) - Fix using
new_columns
inscan_csv
with compressed file (#20412) - Fix return type of
Series.dt.add_business_days
(#20402) - Fix decimal series dispatch (#20400)
- Fix decimal arithmetic schema (#20398)
- Raise on categorical search_sorted (#20395)
- Fix plotting f-strings and docstrings (#20399)
- Don't try to load non-existend List/FSL statistics (#20388)
- Propagate nulls for float methods on all numeric types (#20386)
- Add env var to ignore file cache allocate error (#20356)
- Flip order on right join (#20358)
- Correctly parse special float values in
from_repr
(#20351) - Fix incorrect object store caching for ADLS URI (#20357)
- Use the same encoding for nullable as non-nullable arrays (#20323)
- Improve error message on SchemaError (#20326)
- Boolean optional slice pushdown (#20315)
- Properly handle
from_physical
for List/Array (#20311) - Ignore quotes in csv comments (#20306)
- Ensure pl.datetime returns empty column when input columns are empty (#20278)
- Ensure output height does not change on lazy projection pushdown with aggregations (#20223)
- Fix error writing on Windows to locations outside of C drive (#20245)
- Incorrect comparison in some cases with filtered list/array columns (#20243)
- Ensure height is maintained in SQL
SELECT 1 FROM
(#20241) - Properly account for updated Categorical in .unique() kernel (#20235)
📖 Documentation
- Improve docstring clarity (#20416)
- Update GPU engine installation instructions to remove
--extra-index-url
from CUDA 12 packages (#20381) - Remove Plugins overview page without information (#20348)
- Small fixes/clarifications in user guide (#20335)
- Improve docs about NaN (#20310)
- Fix substr function param definition (#19054)
- Include parquet options in BigQuery I/O write sample (#20292)
- Fix typo in
fork
warning (#20258)
📦 Build system
- Add
project.dynamic = ["version"]
to pyproject.toml (#20345) - Update
pyo3
andnumpy
crates to version0.23
(#20111) - Build wheels for ARM Windows in Python release workflow (#20247)
🛠️ Other improvements
- Enable masked out list, struct and array elements in parametric tests (#20365)
- Move hive partitioning/multi-file handling outside of readers (#20203)
- Purge ChunkedArray Metadata (#20371)
- Correcting misspelled return value and unifying regional spelling (#20375)
- Add test for
select(len())
(#20343) - Make parametric tests include
pl.List
andpl.Array
by default (#20319) - Use Column in Row Encoding (#20312)
- Don't warn on fork hook (#20309)
- Don't deconstruct
CsvParseOptions
(#20302) - Allow decoding of non-Polars arrow dictionaries in Arrow and Parquet (#20248)
- Prepare test suite for Python 3.13 support (#20297)
- Add
FunctionCastOptions
and conservative IR-level cast type-checking (#20286) - Add more descriptive error message for failure of vstack/extend (#20299)
- Clean up some remnants of Python 3.8 support (#20293)
- Add new
Int128Type
(#20232) - Add test for BytesIO overwritten after scan (#20240)
- Expose AexprArena (#20230)
Thank you to all our contributors for making this release possible!
@Jesse-Bakker, @Terrigible, @ZemanOndrej, @alexander-beedie, @balbok0, @beckernick, @bschoenmaeckers, @coastalwhite, @georgestagg, @hamdanal, @haocheng6, @kszlim, @lukemanley, @mcrumiller, @nameexhaustion, @noexecstack, @orlp, @ptiza, @r-brink, @ritchie46, @rodrigogiraoserrao, @stijnherfst, @stinodego, @tswast and @zero-stroke
Python Polars 1.17.1
🐞 Bug fixes
- Fix incorrect lazy
select(len())
with some select orderings (#20222) - Fix assertion panic on LazyFrame
scratch.is_empty()
(#20219)
Thank you to all our contributors for making this release possible!
@nameexhaustion and @ritchie46
Rust Polars 0.45.0
💥 Breaking changes
- Remove dedicated
sink_(parquet/ipc)_cloud
functions (#20164) - Experimental cloud write support (#20129)
🚀 Performance improvements
- Add fast paths for series.arg_sort and dataframe.sort (#19872)
- Utilize the RangedUniqueKernel for Enum/Categorical (#20150)
- Reduce memory copy when scanning from Python objects (#20142)
- Don't instantiate validity mask when unneeded in Parquet (#20149)
- Expand more filters (#20022)
- Cache the DataFrame schema in get_column_index (#20021)
- Reduce the size of row encoding UTF-8 (#19911)
- Memoize duplicates in rolling-gb-dyn (#19939)
- More efficient row encoding for
pl.List
(#19907) - Half the size of Booleans in row encoding (#19927)
- Rolling 'iter_lookbehind' breeze through duplicates (#19922)
- Initially trim leading and trailing filtered rows (#19850)
- Increase default async thread count for low core count systems (#19829)
- Move row group decode off async thread for local streaming parquet scan (#19828)
- Support use of Duration in
to_string
, ergonomic/perf improvement, tz-aware Datetime bugfix (#19697) - Improve
DataFrame.sort().limit/top_k
performance (#19731) - Improve cloud scan performance (#19728)
- Fix quadratic 'with_columns' behavior (#19701)
- Improve hive partition pruning with datetime predicates from SQL (#19680)
- Allow for arbitrary skips in Parquet Dictionary Decoding (#19649)
- Reorder conditions in is_leap_year (#19602)
- Rechunk in DataFrame.rows if needed (#19628)
- Dispatch Parquet Primitive PLAIN decoding to faster kernels when possible (#19611)
- Use faster iteration in 'starts_with'/'ends_with' (#19583)
- Branchless Parquet Prefiltering (#19190)
✨ Enhancements
- Retry with reloaded credentials on cloud error (#20185)
- Support reading Enum dtype from csv (#20188)
- Allow sorting of lists and arrays (#20169)
- Add
maintain_order
parameter to joins (#20026) - Allow for
to_datetime
/strftime
to automatically parse dates with single-digit hour/minute/second (#20144) - Experimental cloud write support (#20129)
- Allow setting and reading custom schema-level IPC metadata (#20066)
- Add optimized row encoding for Decimals (#20050)
- Add
drop_nans
method to DataFrame and LazyFrame (#20029) - Catch use of 'polars' in
to_string
for non-Duration dtypes and raise an informative error (#19977) - Add AhoCorasick backed 'find_many' (#19952)
- Speed up starts_with for small prefixes (#19904)
- Auto-enable hive partitioning if hive_schema was given (#19902)
- Add
pl.concat_arr
to concatenate columns into an Array column (#19881) - Support both "iso" and "iso:strict" format options for
dt.to_string
(#19840) - Add rounding for Decimal type (#19760)
- Improved array arithmetic support (#19837)
- Raise informative error on Unknown unnest (#19830)
- Support use of Duration in
to_string
, ergonomic/perf improvement, tz-aware Datetime bugfix (#19697) - Allow specification of
chunk_size
onLazyCsvReader.read_options
(#19819) - Add an
is_literal
method to expressionmeta
namespace (#19773) - A different approach to warning users of fork() issues with Polars (#19197)
- Add dylib (#19759)
- Add IPC source node for new streaming engine (#19454)
- Implement max/min methods for dtypes (#19494)
- Improve hive partition pruning with datetime predicates from SQL (#19680)
- Parallel IPC sink for the new streaming engine (#19622)
- Add SQL support for
RIGHT JOIN
, fix an issue with wildcard aliasing (#19626) - Add show_graph to display a GraphViz plot for expressions (#19365)
🐞 Bug fixes
- Don't trigger length check in array construction (#20205)
- Allow row encoding for 32-bit architectures (e.g. WASM) (#20186)
- Properly project unordered column in parquet prefiltered (#20189)
- Csv stop simd cache if eol char is hit (#20199)
- Estimated size for object (#20191)
- Respect parallel argument in parquet (#20187)
- Only validate UTF-8 for selected items when all below len 128 (#20183)
- Serialize categories of Enum in arrow metadata (#20181)
- Don't use RLE encoding for Parquet Boolean (#20172)
- Invalid
bitwise_xor
for ScalarColumn (#20140) - Add temporal feature gate in
is_elementwise_top_level
(#20177) - Column name mismatch or not found in Parquet scan with filter (#20178)
- Raise if apply returns different types (#20168)
- Deal with masked out list elements (#20161)
- Fix index out of bounds in uniform_hist_count (#20133)
- Implement
arg_sort
for Null series (#20135) - Handle slice pushdown in PythonUDF GroupBy (#20132)
- Check shape for
*_horizontal
functions (#20130) - Properly coerce types in lists (#20126)
- Incorrect aggregation of empty groups after slice (#20127)
- DataFrame
.get_column
afterdrop_in_place
(#20120) - Subtraction with underflow on empty FixedSizeBinaryArray (#20109)
- Materialize smallest dyn ints to use feature gate for i8/i16 (#20108)
- Return null instead of 0. for rolling_std when window contains a single element and ddof=1 and there are nulls elsewhere in the Series (#20077)
- Only slice after sort when slice is smaller than frame length (#20084)
- Preserve Series name in __rpow__ operation (#20072)
- Allow nested
is_in()
inwhen()/then()
for full-streaming (#20052) - Fix datetime cast behavior for pre-epoch times (#19949)
- Improve
hist
binning around breakpoints (#20054) - Fix invalid len due to projection pushdown selection of scalar (#20049)
- Fix empty scalar agg type (#20051)
- Improve binning in
Series.hist
withbin_count
when all values are the same (#20034) - Less intrusive forking warnings (#20032)
- Reading nullable sliced / masked Categoricals from Parquet (#20024)
- Regression in
hist
panicking on out of bounds index (#20016) - Fix starts_with out of bounds (#20006)
- Fix incorrect column order for parquet scan with hive columns in file (#19996)
- Incorrectly gave
list.len()
for masked-out rows (#19999) - Bug fix in existing fast path for sorted series (#20004)
- Incorrect
collect_schema()
forfill_null()
after an aggregation expression in group-by context (#19993) - Fix Decimal type fill_null (#19981)
- Fix panic on schema merge for prefiltering (#19972)
- Fix lazy frame join expression (#19974)
- Fix
gather_every
forScalar
(#19964) - Toggle 'fast_unique' on new_from_index (#19956)
- Raise proper error message when too small interval is passed to datetime_range (#19955)
- Fix scalar object (#19940)
- Raise InvalidOperationError for invalid float to decimal casts (e.g. Inf, NaN) (#19938)
- Fix panic with combination of hive and parquet prefiltering (#19905)
- Fix panic when joining with empty frame (debug only) (#19896)
- Fix incorrect result from inequality filter after join on LazyFrame (#19898)
- Misleading
ShapeError
error message on dataframe creation (#19901) - Fix panic with empty delta scan, or empty parquet scan with a provided schema (#19884)
- Ensure type object of inputs for cached any-value conversion functions are kept alive (#19866)
- Fix panic using
scan_parquet().with_row_index()
with hive partitioning enabled (#19865) - Improve histogram bin logic (#18761)
- Raise informative error instead of panicking for list arithmetic on some invalid dtypes (#19841)
- Properly handle Zero-Field Structs in row encoding (#19846)
- Incorrect explode schema for
LazyFrame.explode()
(#19860) - Ensure
List
element truncation ellipses respectASCII*
table formats (#19835) - Validate subnodes in validate IR (#19831)
- Raise if merge non-global categoricals in unpivot (#19826)
- Type hints for window_size incorrectly included timedelta in some rolling functions (#19827)
- Don't panic if column not found (#19824)
- Fix gather of Scalar null + idx w/ validity (#19823)
- Fix object chunked gather (#19811)
- Fix inconsistency between code and comment (#19810)
- Fix filter scalar nulls (#19786)
- Altair tooltip was being incorrectly applied to plots which did not accept it (#19789)
- Fix scanning google cloud with service account credentials file (#19782)
- Fix incorrect filter after right-join on LazyFrame (#19775)
- Fix incorrect lazy schema for explode on array columns (#19776)
- Fix incorrect lazy schema for aggregations (#19753)
- Fix validation for inner and left join when join_nulls unflaged (#19698)
- SQL
ELSE
clause should be implicitlyNULL
when omitted (#19714) - In group_by_dynamic, period and every were getting applied in reverse order for the window upper boundary (#19706)
- Only allow
list.to_struct
to be elementwise when width is fixed (#19688) - Make Array arithmetic ops fully elementwise (#19682)
- Update line-splitting logic in batched CSV reader (#19508)
- Fix incorrect lazy schema for
explode()
inagg()
(#19629) - Fix filter incorrectly pushed past struct unnest when unnested column name matches upper column name (#19638)
- Ensure
mean_horizontal
raises on non-numeric input (#19648) - Reorder conditions in is_leap_year (#19602)
- Copy height in .vstack() for empty dataframes (#19641) (#19642)
- Run join type coercion with correct schemas active (#19625)
- Correct wildcard and input expansion for some more functions (#19588)
- Allow
.struct.with_fields
insidelist.eval
(#19617) - Sortedness was incorrectly being preserved in dt.offset_by when offsetting by non-constant durations in the timezone-naive case (#19616)
- Fix incorrect
scan_parquet().with_row_index()
with non-zero slice or with streaming collect (#19609) - Fix mask and validity confusion in Parquet String decoding (#19614)
- Parquet decoding of nested dictionary values (#19605)
- Do not attempt to load default credentials when
credential_provider
is given (#19589) - Fix gather len in group-by state (#19586)
- Added input validation for
explode
operation in the array namespace (#19163) - Improve error message (#19546)
- Fix predica...
Python Polars 1.17.0
🚀 Performance improvements
- Add fast paths for series.arg_sort and dataframe.sort (#19872)
- Much faster
Series
construction from subclasses of standard Python types (#20166) - Utilize the RangedUniqueKernel for Enum/Categorical (#20150)
- Reduce memory copy when scanning from Python objects (#20142)
- Construct
Series
for bytes/binary data 10x faster when dtype not explicitly set (#20157) - Don't instantiate validity mask when unneeded in Parquet (#20149)
✨ Enhancements
- Retry with reloaded credentials on cloud error (#20185)
- Support reading Enum dtype from csv (#20188)
- Improve dtype inference and load for
DataFrame
cols constructed from Python Enum values (#20180) - Allow sorting of lists and arrays (#20169)
- Add
maintain_order
parameter to joins (#20026) - Allow for
to_datetime
/strftime
to automatically parse dates with single-digit hour/minute/second (#20144) - Issue warning when using
to_struct()
without a list of field names (#20158) - Experimental cloud write support (#20129)
- Add lazy support for
pl.select
(#20091) - Enable view arrow export in
write_delta
(#20092)
🐞 Bug fixes
- Don't trigger length check in array construction (#20205)
- Allow row encoding for 32-bit architectures (e.g. WASM) (#20186)
- Properly project unordered column in parquet prefiltered (#20189)
- Csv stop simd cache if eol char is hit (#20199)
- Estimated size for object (#20191)
- Respect parallel argument in parquet (#20187)
- Only validate UTF-8 for selected items when all below len 128 (#20183)
- Serialize categories of Enum in arrow metadata (#20181)
- Don't use RLE encoding for Parquet Boolean (#20172)
- Invalid
bitwise_xor
for ScalarColumn (#20140) - Series construct with large nested
u64
(#20167) - Add temporal feature gate in
is_elementwise_top_level
(#20177) - Column name mismatch or not found in Parquet scan with filter (#20178)
- Raise if apply returns different types (#20168)
- Deal with masked out list elements (#20161)
- Fix index out of bounds in uniform_hist_count (#20133)
- Implement
arg_sort
for Null series (#20135) - Handle slice pushdown in PythonUDF GroupBy (#20132)
- Check shape for
*_horizontal
functions (#20130) - Properly coerce types in lists (#20126)
- Incorrect aggregation of empty groups after slice (#20127)
- DataFrame
.get_column
afterdrop_in_place
(#20120) - Subtraction with underflow on empty FixedSizeBinaryArray (#20109)
- Materialize smallest dyn ints to use feature gate for i8/i16 (#20108)
- Return null instead of 0. for rolling_std when window contains a single element and ddof=1 and there are nulls elsewhere in the Series (#20077)
- Only slice after sort when slice is smaller than frame length (#20084)
- Preserve Series name in __rpow__ operation (#20072)
- Allow nested
is_in()
inwhen()/then()
for full-streaming (#20052)
📖 Documentation
- Add more Rust examples to User Guide (#20194)
- Expand plotting docs (#19719)
- Fix Rust examples in user guide (#20075)
- Update
by
param description for rolling_*_by functions (#19715) - Correct supported compression formats (#20085)
- Specify strictness in cast (#20067)
📦 Build system
- Upgrade
sqlparser-rs
from version0.49
to0.52
(#20110) - Bump
memmap2
to version0.9
(#20105) - Bump
object_store
to version0.11
(#20102) - Bump
fs4
to version0.12
(#20101) - Bump
thiserror
to version2
(#20097) - Bump
atoi_simd
to version0.16
(#20098) - Bump
chrono-tz
to0.10
(#20094) - Update Rust dependency
ndarray
to0.16
(#20093) - Bump Rust toolchain to
nightly-2024-11-28
(#20064)
🛠️ Other improvements
- Deprecate ddof parameter for correlation coefficient (#20197)
- Move Bitwise aggregations to FunctionExpr (#20193)
- Add ragged lines test (#20182)
- Set delta version check higher (#20153)
- Fix typo in assertion in datatype copy test (#20121)
- Move horizontal methods to polars-ops (#20134)
- Remove useless SeriesTrait::get implementations (#20136)
- Add a bunch more automated row encoding sortedness tests (#20056)
Thank you to all our contributors for making this release possible!
@DzenanJupic, @MarcoGorelli, @YichiZhang0613, @alexander-beedie, @coastalwhite, @dependabot, @dependabot[bot], @flowlight0, @henryharbeck, @iharthi, @ion-elgreco, @jqnatividad, @lukapeschke, @lukemanley, @mcrumiller, @nameexhaustion, @ptiza, @ritchie46, @siddharth-vi, @stijnherfst, @stinodego and @wsyxbcl
Python Polars 1.16.0
🚀 Performance improvements
✨ Enhancements
- Enable creation of independently reusable
Config
instances (#20053) - Improved error message on invalid Python
Enum
init (#20060) - Improve Polars
Enum
dtype init from standard Python enums (#19997) - Add optimized row encoding for Decimals (#20050)
- Add
drop_nans
method to DataFrame and LazyFrame (#20029)
🐞 Bug fixes
- Improve
hist
binning around breakpoints (#20054) - Fix invalid len due to projection pushdown selection of scalar (#20049)
- Fix empty scalar agg type (#20051)
- Improve binning in
Series.hist
withbin_count
when all values are the same (#20034) - Less intrusive forking warnings (#20032)
- Reading nullable sliced / masked Categoricals from Parquet (#20024)
- Regression in
hist
panicking on out of bounds index (#20016) - Fix starts_with out of bounds (#20006)
- Fix incorrect column order for parquet scan with hive columns in file (#19996)
- Incorrectly gave
list.len()
for masked-out rows (#19999) - Bug fix in existing fast path for sorted series (#20004)
- Incorrect
collect_schema()
forfill_null()
after an aggregation expression in group-by context (#19993) - Fix
row_by_key
typing (#19888)
📖 Documentation
📦 Build system
- Pin maturin (#20063)
Thank you to all our contributors for making this release possible!
@alexander-beedie, @coastalwhite, @gab23r, @lukemanley, @mcrumiller, @nameexhaustion, @ritchie46, @siddharth-vi, @stijnherfst and @stinodego
Python Polars 1.15.0
🚀 Performance improvements
- Reduce the size of row encoding UTF-8 (#19911)
- Memoize duplicates in rolling-gb-dyn (#19939)
- More efficient row encoding for
pl.List
(#19907) - Half the size of Booleans in row encoding (#19927)
- Rolling 'iter_lookbehind' breeze through duplicates (#19922)
- Initially trim leading and trailing filtered rows (#19850)
✨ Enhancements
- Catch use of 'polars' in
to_string
for non-Duration dtypes and raise an informative error (#19977) - Add AhoCorasick backed 'find_many' (#19952)
- Allow Python Enums as dtype inputs (#19926)
- Speed up starts_with for small prefixes (#19904)
- Auto-enable hive partitioning if hive_schema was given (#19902)
- Add
pl.concat_arr
to concatenate columns into an Array column (#19881) - Support both "iso" and "iso:strict" format options for
dt.to_string
(#19840) - Add rounding for Decimal type (#19760)
- Improved array arithmetic support (#19837)
🐞 Bug fixes
- Fix Decimal type fill_null (#19981)
- Fix panic on schema merge for prefiltering (#19972)
- Fix lazy frame join expression (#19974)
- Fix
gather_every
forScalar
(#19964) - Toggle 'fast_unique' on new_from_index (#19956)
- Parse uppercase config keys (#19852)
- Raise proper error message when too small interval is passed to datetime_range (#19955)
- Fix scalar object (#19940)
- Raise InvalidOperationError for invalid float to decimal casts (e.g. Inf, NaN) (#19938)
- Address indexing edge-case with
numpy
arrays (#19895) - Fix panic with combination of hive and parquet prefiltering (#19905)
- Fix panic when joining with empty frame (debug only) (#19896)
- Fix incorrect result from inequality filter after join on LazyFrame (#19898)
- Misleading
ShapeError
error message on dataframe creation (#19901) - Fix panic with empty delta scan, or empty parquet scan with a provided schema (#19884)
- Ensure type object of inputs for cached any-value conversion functions are kept alive (#19866)
- Improve export from 2D Array dtype columns to PyTorch Tensors (
to_torch
) and Jax Arrays (to_jax
) (#19862) - Fix panic using
scan_parquet().with_row_index()
with hive partitioning enabled (#19865) - Improve histogram bin logic (#18761)
- Raise informative error instead of panicking for list arithmetic on some invalid dtypes (#19841)
- Properly handle Zero-Field Structs in row encoding (#19846)
- Incorrect explode schema for
LazyFrame.explode()
(#19860) - DataFrame
rows_by_key
returning key tuples with elements in wrong order (#19486) - Ensure
List
element truncation ellipses respectASCII*
table formats (#19835)
📖 Documentation
- Remove duplicate sentence in
Series.bottom_k
docstring (#19947) - Complete parameters description and add an example for
clip()
(#19875) - Fix some warnings during docs build (#19848)
📦 Build system
- Use public windows runners in python release (#19982)
- Add windows-aarch64 to python binaries (#19966)
🛠️ Other improvements
- Minor non-breaking space (
) tweak for HTML rendering (#19864) - Implement nested row encoding / decoding (#19874)
- Switch back to PyO3 0.22 (#19851)
- Adjust flaky
with_columns
test (#19844) - Add proper tests for row encoding (#19843)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @barak1412, @coastalwhite, @etiennebacher, @ion-elgreco, @itamarst, @lukemanley, @mcrumiller, @mhogervo, @nameexhaustion, @orlp, @ritchie46, @stijnherfst and @stinodego
Python Polars 1.14.0
🚀 Performance improvements
- Increase default async thread count for low core count systems (#19829)
- Move row group decode off async thread for local streaming parquet scan (#19828)
- Support use of Duration in
to_string
, ergonomic/perf improvement, tz-aware Datetime bugfix (#19697)
✨ Enhancements
- Raise informative error on Unknown unnest (#19830)
- Support DataFrame init from raw SQLAlchemy rows (#19820)
- Support use of Duration in
to_string
, ergonomic/perf improvement, tz-aware Datetime bugfix (#19697) - Add an
is_literal
method to expressionmeta
namespace (#19773) - A different approach to warning users of fork() issues with Polars (#19197)
🐞 Bug fixes
- Fix
read_database(…,iter_batches=True)
type annotations (#19832) - Validate subnodes in validate IR (#19831)
- Raise if merge non-global categoricals in unpivot (#19826)
- Type hints for window_size incorrectly included timedelta in some rolling functions (#19827)
- Don't panic if column not found (#19824)
- Fix gather of Scalar null + idx w/ validity (#19823)
- Replace _kwargs in collect method (#19618)
- Fix object chunked gather (#19811)
- Fix filter scalar nulls (#19786)
- Replace spaces with to support showing multiple spaces in HTML repr (#19783)
- Altair tooltip was being incorrectly applied to plots which did not accept it (#19789)
- Respect schema_overrides in batched csv reader (#19755)
- Fix scanning google cloud with service account credentials file (#19782)
- Release the GIL in Python APIs, part 2 of 2 (#19762)
- Fix incorrect filter after right-join on LazyFrame (#19775)
- Fix incorrect lazy schema for explode on array columns (#19776)
- Fixed typo in file lazy.py (#19769)
📖 Documentation
- Update bokeh to use cdn to avoid Bokeh Error (#19788)
- Change dprint config (#19747)
- Mention
row_by_keys
in theto_dict
documentation (#19767) - Fix link to Graphviz download (#19791)
🛠️ Other improvements
- Add ToField context for common args (#19833)
- Use polars parquet reader for delta scan (#19103)
- Migrate polars-expr AggregationContext to use
Column
(#19736)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @TNieuwdorp, @YichiZhang0613, @alexander-beedie, @braaannigan, @coastalwhite, @engylemure, @gab23r, @iliya-malecki, @ion-elgreco, @itamarst, @jackxxu, @nameexhaustion, @orlp, @ritchie46, @rodrigogiraoserrao and @sn0rkmaiden
Python Polars 1.13.1
✨ Enhancements
- Add IPC source node for new streaming engine (#19454)
🐞 Bug fixes
- Release GIL in Python APIs, part 1 (#19705)
- Fix incorrect lazy schema for aggregations (#19753)
- Address incorrect
selector & col
expansion (#19742)
📖 Documentation
- Fix formatting of nested list (#19746)
- Add
meta.is_column
to API docs (#19744) - Fix join API reference links (#19745)
Thank you to all our contributors for making this release possible!
@alexander-beedie, @coastalwhite, @etiennebacher, @itamarst, @nameexhaustion, @orlp, @ritchie46 and @rodrigogiraoserrao