Releases
3.0.0
3.0.0 (2022-04-25)
⚠ BREAKING CHANGES
ir: The following are breaking changes due to simplifying expression internals
ibis.expr.datatypes.DataType.scalar_type
and DataType.column_type
factory
methods have been removed, DataType.scalar
and DataType.column
class
fields can be used to directly construct a corresponding expression instance
(though prefer to use operation.to_expr()
)
ibis.expr.types.ValueExpr._name
and ValueExpr._dtype`` fields are not accassible anymore. While these were not supposed to used directly now
ValueExpr.has_name(),
ValueExpr.get_name()and
ValueExpr.type()` methods
are the only way to retrieve the expression's name and datatype.
ibis.expr.operations.Node.output_type
is a property now not a method,
decorate those methods with @property
ibis.expr.operations.ValueOp
subclasses must define output_shape
and
output_dtype
properties from now on (note the datatype abbreviation dtype
in the property name)
ibis.expr.rules.cast()
, scalar_like()
and array_like()
rules have been
removed
api: Replace t["a"].distinct()
with t[["a"]].distinct()
.
deps: The sqlalchemy lower bound is now 1.4
ir: Schema.names and Schema.types attributes now have tuple type rather than list
expr: Columns that were added or used in an aggregation or
mutation would be alphabetically sorted in compiled SQL outputs. This
was a vestige from when Python dicts didn't preserve insertion order.
Now columns will appear in the order in which they were passed to
aggregate
or mutate
api: dt.float
is now dt.float64
; use dt.float32
for the previous behavior.
ir: Relation-based execute_node
dispatch rules must now accept tuples of expressions.
ir: removed ibis.expr.lineage.{roots,find_nodes} functions
config: Use ibis.options.graphviz_repr = True
to enable
hdfs: Use fsspec
instead of HDFS from ibis
udf: Vectorized UDF coercion functions are no longer a public API.
The minimum supported Python version is now Python 3.8
config: register_option
is no longer supported, please submit option requests upstream
backends: Read tables with pandas.read_hdf and use the pandas backend
The CSV backend is removed. Use Datafusion for CSV execution.
backends: Use the datafusion backend to read parquet files
Expr() -> Expr.pipe()
coercion functions previously in expr/schema.py are now in udf/vectorized.py
api: materialize
is removed. Joins with overlapping columns now have suffixes.
kudu: use impala instead: https://kudu.apache.org/docs/kudu_impala_integration.html
Any code that was relying implicitly on string-y
behavior from UUID datatypes will need to add an explicit cast first.
Features
add repr_html for expressions to print as tables in ipython (cd6fa4e )
add duckdb backend (667f2d5 )
allow construction of decimal literals (3d9e865 )
api: add ibis.asc
expression (efe177e ), closes #1454
api: add has_operation API to the backend (4fab014 )
api: implement type for SortExpr (ab19bd6 )
clickhouse: implement string concat for clickhouse (1767205 )
clickhouse: implement StrRight operation (67749a0 )
clickhouse: implement table union (e0008d7 )
clickhouse: implement trim, pad and string predicates (a5b7293 )
datafusion: implement Count operation (4797a86 )
datatypes: unbounded decimal type (f7e6f65 )
date: add ibis.date(y,m,d) functionality (26892b6 ), closes #386
duckdb/postgres/mysql/pyspark: implement .sql
on tables for mixing sql and expressions (00e8087 )
duckdb: add functionality needed to pass integer to interval test (e2119e8 )
duckdb: implement _get_schema_using_query (93cd730 )
duckdb: implement now() function (6924f50 )
duckdb: implement regexp replace and extract (18d16a7 )
implement force
argument in sqlalchemy backend base class (9df7f1b )
implement coalesce for the pyspark backend (8183efe )
implement semi/anti join for the pandas backend (cb36fc5 )
implement semi/anti join for the pyspark backend (3e1ba9c )
implement the remaining clickhouse joins (b3aa1f0 )
ir: rewrite and speed up expression repr (45ce9b2 )
mysql: implement _get_schema_from_query (456cd44 )
mysql: move string join impl up to alchemy for mysql (77a8eb9 )
postgres: implement _get_schema_using_query (f2459eb )
pyspark: implement Distinct for pyspark (4306ad9 )
pyspark: implement log base b for pyspark (527af3c )
pyspark: implement percent_rank and enable testing (c051617 )
repr: add interval info to interval repr (df26231 )
sqlalchemy: implement ilike (43996c0 )
sqlite: implement date_truncate (3ce4f2a )
sqlite: implement ISO week of year (714ff7b )
sqlite: implement string join and concat (6f5f353 )
support of arrays and tuples for clickhouse (db512a8 )
ver: dynamic version identifiers (408f862 )
Bug Fixes
added wheel to pyproject toml for venv users (b0b8e5c )
allow major version changes in CalVer dependencies (9c3fbe5 )
annotable: allow optional arguments at any position (778995f ), closes #3730
api: add ibis.map and .struct (327b342 ), closes #3118
api: map string multiplication with integer to repeat method (b205922 )
api: thread suffixes parameter to individual join methods (31a9aff )
change TimestampType to Timestamp (e0750be )
clickhouse: disconnect from clickhouse when computing version (11cbf08 )
clickhouse: use a context manager for execution (a471225 )
combine windows during windowization (7fdd851 )
conform epoch_seconds impls to expression return type (18a70f1 )
context-adjustment: pass scope when calling adjust_context in pyspark backend (33aad7b ), closes #3108
dask: fix asof joins for newer version of dask (50711cc )
dask: workaround dask bug (a0f3bd9 )
deps: update dependency atpublic to v3 (3fe8f0d )
deps: update dependency datafusion to >=0.4,<0.6 (3fb2194 )
deps: update dependency geoalchemy2 to >=0.6.3,<0.12 (dc3c361 )
deps: update dependency graphviz to >=0.16,<0.21 (3014445 )
duckdb: add casts to literals to fix binding errors (1977a55 ), closes #3629
duckdb: fix array column type discovery on leaf tables and add tests (15e5412 )
duckdb: fix log with base b impl (4920097 )
duckdb: support both 0.3.2 and 0.3.3 (a73ccce )
enforce the schema's column names in apply_to
(b0f334d )
expose ops.IfNull for mysql backend (156c2bd )
expr: add more binary operators to char list and implement fallback (b88184c )
expr: fix formatting of table info using tabulate (b110636 )
fix float vs real data type detection in sqlalchemy (24e6774 )
fix list_schemas argument (69c1abf )
fix postgres udfs and reenable ci tests (7d480d2 )
fix tablecolumn execution for filter following join (064595b )
format: remove some newlines from formatted expr repr (ed4fa78 )
histogram: cross_join needs onclause=True (5d36a58 ), closes #622
ibis.expr.signature.Parameter is not pickleable (828fd54 )
implement coalesce properly in the pandas backend (aca5312 )
implement count on tables for pyspark (7fe5573 ), closes #2879
infer coalesce types when a non-null expression occurs after the first argument (c5f2906 )
mutate: do not lift table column that results from mutate (ba4e5e5 )
pandas: disable range windows with order by (e016664 )
pandas: don't reassign the same column to silence SettingWithCopyWarning warning (75dc616 )
pandas: implement percent_rank correctly (d8b83e7 )
prevent unintentional cross joins in mutate + filter (83eef99 )
pyspark: fix range windows (a6f2aa8 )
regression in Selection.sort_by with resolved_keys (c7a69cd )
regression in sort_by with resolved_keys (63f1382 ), closes #3619
remove broken csv pre_execute (93b662a )
remove importorskip call for backend tests (2f0bcd8 )
remove incorrect fix for pandas regression (339f544 )
remove passing schema into register_parquet (bdcbb08 )
repr: add ops.TimeAdd to repr binop lookup table (fd94275 )
repr: allow ops.TableNode in fmt_value (6f57003 )
reverse the predicate pushdown subsitution (f3cd358 )
sort_index to satisfy pandas 1.4.x (6bac0fc )
sqlalchemy: ensure correlated subqueries FROM clauses are rendered (3175321 )
sqlalchemy: use corresponding_column to prevent spurious cross joins (fdada21 )
sqlalchemy: use replace selectables to prevent semi/anti join cross join (e8a1a71 )
sql: retain column names for named ColumnExprs (f1b4b6e ), closes #3754
sql: walk right join trees and substitute joins with right-side joins with views (0231592 )
store schema on the pandas backend to allow correct inference (35070be )
Performance Improvements
datatypes: speed up str and hash (262d3d7 )
fast path for simple column selection (d178498 )
ir: global equality cache (13c2bb2 )
ir: introduce CachedEqMixin to speed up equality checks (b633925 )
repr: remove full tree repr from rule validator error message (65885ab )
speed up attribute access (89d1c05 )
use assign instead of concat in projections when possible (985c242 )
Miscellaneous Chores
deps: increase sqlalchemy lower bound to 1.4 (560854a )
drop support for Python 3.7 (0afd138 )
Code Refactoring
api: make primitive types more cohesive (71da8f7 )
api: remove distinct ColumnExpr API (3f48cb8 )
api: remove materialize (24285c1 )
backends: remove the hdf5 backend (ff34f3e )
backends: remove the parquet backend (b510473 )
config: disable graphviz-repr-in-notebook by default (214ad4e )
config: remove old config code and port to pydantic (4bb96d1 )
dt.UUID inherits from DataType, not String (2ba540d )
expr: preserve column ordering in aggregations/mutations (668be0f )
hdfs: replace HDFS with fsspec
(cc6eddb )
ir: make Annotable immutable (1f2b3fa )
ir: make schema annotable (b980903 )
ir: remove unused lineage roots
and find_nodes
functions (d630a77 )
ir: simplify expressions by not storing dtype and name (e929f85 )
kudu: remove support for use of kudu through kudu-python (36bd97f )
move coercion functions from schema.py to udf (58eea56 ), closes #3033
remove blanket call for Expr (3a71116 ), closes #2258
remove the csv backend (0e3e02e )
udf: make coerce functions in ibis.udf.vectorized private (9ba4392 )
You can’t perform that action at this time.