Release 3.0.0 · ibis-project/ibis

3.0.0 (2022-04-25)

⚠ BREAKING CHANGES

ir: The following are breaking changes due to simplifying expression internals
- ibis.expr.datatypes.DataType.scalar_type and DataType.column_type factory
  methods have been removed, DataType.scalar and DataType.column class
  fields can be used to directly construct a corresponding expression instance
  (though prefer to use operation.to_expr())
- ibis.expr.types.ValueExpr._name and ValueExpr._dtype`` fields are not accassible anymore. While these were not supposed to used directly now ValueExpr.has_name(), ValueExpr.get_name()andValueExpr.type()` methods
  are the only way to retrieve the expression's name and datatype.
- ibis.expr.operations.Node.output_type is a property now not a method,
  decorate those methods with @property
- ibis.expr.operations.ValueOp subclasses must define output_shape and
  output_dtype properties from now on (note the datatype abbreviation dtype
  in the property name)
- ibis.expr.rules.cast(), scalar_like() and array_like() rules have been
  removed
api: Replace t["a"].distinct() with t[["a"]].distinct().
deps: The sqlalchemy lower bound is now 1.4
ir: Schema.names and Schema.types attributes now have tuple type rather than list
expr: Columns that were added or used in an aggregation or
mutation would be alphabetically sorted in compiled SQL outputs. This
was a vestige from when Python dicts didn't preserve insertion order.
Now columns will appear in the order in which they were passed to
aggregate or mutate
api: dt.float is now dt.float64; use dt.float32 for the previous behavior.
ir: Relation-based execute_node dispatch rules must now accept tuples of expressions.
ir: removed ibis.expr.lineage.{roots,find_nodes} functions
config: Use ibis.options.graphviz_repr = True to enable
hdfs: Use fsspec instead of HDFS from ibis
udf: Vectorized UDF coercion functions are no longer a public API.
The minimum supported Python version is now Python 3.8
config: register_option is no longer supported, please submit option requests upstream
backends: Read tables with pandas.read_hdf and use the pandas backend
The CSV backend is removed. Use Datafusion for CSV execution.
backends: Use the datafusion backend to read parquet files
Expr() -> Expr.pipe()
coercion functions previously in expr/schema.py are now in udf/vectorized.py
api: materialize is removed. Joins with overlapping columns now have suffixes.
kudu: use impala instead: https://kudu.apache.org/docs/kudu_impala_integration.html
Any code that was relying implicitly on string-y
behavior from UUID datatypes will need to add an explicit cast first.

Features

add repr_html for expressions to print as tables in ipython (cd6fa4e)
add duckdb backend (667f2d5)
allow construction of decimal literals (3d9e865)
api: add ibis.asc expression (efe177e), closes #1454
api: add has_operation API to the backend (4fab014)
api: implement type for SortExpr (ab19bd6)
clickhouse: implement string concat for clickhouse (1767205)
clickhouse: implement StrRight operation (67749a0)
clickhouse: implement table union (e0008d7)
clickhouse: implement trim, pad and string predicates (a5b7293)
datafusion: implement Count operation (4797a86)
datatypes: unbounded decimal type (f7e6f65)
date: add ibis.date(y,m,d) functionality (26892b6), closes #386
duckdb/postgres/mysql/pyspark: implement .sql on tables for mixing sql and expressions (00e8087)
duckdb: add functionality needed to pass integer to interval test (e2119e8)
duckdb: implement _get_schema_using_query (93cd730)
duckdb: implement now() function (6924f50)
duckdb: implement regexp replace and extract (18d16a7)
implement force argument in sqlalchemy backend base class (9df7f1b)
implement coalesce for the pyspark backend (8183efe)
implement semi/anti join for the pandas backend (cb36fc5)
implement semi/anti join for the pyspark backend (3e1ba9c)
implement the remaining clickhouse joins (b3aa1f0)
ir: rewrite and speed up expression repr (45ce9b2)
mysql: implement _get_schema_from_query (456cd44)
mysql: move string join impl up to alchemy for mysql (77a8eb9)
postgres: implement _get_schema_using_query (f2459eb)
pyspark: implement Distinct for pyspark (4306ad9)
pyspark: implement log base b for pyspark (527af3c)
pyspark: implement percent_rank and enable testing (c051617)
repr: add interval info to interval repr (df26231)
sqlalchemy: implement ilike (43996c0)
sqlite: implement date_truncate (3ce4f2a)
sqlite: implement ISO week of year (714ff7b)
sqlite: implement string join and concat (6f5f353)
support of arrays and tuples for clickhouse (db512a8)
ver: dynamic version identifiers (408f862)

Bug Fixes

added wheel to pyproject toml for venv users (b0b8e5c)
allow major version changes in CalVer dependencies (9c3fbe5)
annotable: allow optional arguments at any position (778995f), closes #3730
api: add ibis.map and .struct (327b342), closes #3118
api: map string multiplication with integer to repeat method (b205922)
api: thread suffixes parameter to individual join methods (31a9aff)
change TimestampType to Timestamp (e0750be)
clickhouse: disconnect from clickhouse when computing version (11cbf08)
clickhouse: use a context manager for execution (a471225)
combine windows during windowization (7fdd851)
conform epoch_seconds impls to expression return type (18a70f1)
context-adjustment: pass scope when calling adjust_context in pyspark backend (33aad7b), closes #3108
dask: fix asof joins for newer version of dask (50711cc)
dask: workaround dask bug (a0f3bd9)
deps: update dependency atpublic to v3 (3fe8f0d)
deps: update dependency datafusion to >=0.4,<0.6 (3fb2194)
deps: update dependency geoalchemy2 to >=0.6.3,<0.12 (dc3c361)
deps: update dependency graphviz to >=0.16,<0.21 (3014445)
duckdb: add casts to literals to fix binding errors (1977a55), closes #3629
duckdb: fix array column type discovery on leaf tables and add tests (15e5412)
duckdb: fix log with base b impl (4920097)
duckdb: support both 0.3.2 and 0.3.3 (a73ccce)
enforce the schema's column names in apply_to (b0f334d)
expose ops.IfNull for mysql backend (156c2bd)
expr: add more binary operators to char list and implement fallback (b88184c)
expr: fix formatting of table info using tabulate (b110636)
fix float vs real data type detection in sqlalchemy (24e6774)
fix list_schemas argument (69c1abf)
fix postgres udfs and reenable ci tests (7d480d2)
fix tablecolumn execution for filter following join (064595b)
format: remove some newlines from formatted expr repr (ed4fa78)
histogram: cross_join needs onclause=True (5d36a58), closes #622
ibis.expr.signature.Parameter is not pickleable (828fd54)
implement coalesce properly in the pandas backend (aca5312)
implement count on tables for pyspark (7fe5573), closes #2879
infer coalesce types when a non-null expression occurs after the first argument (c5f2906)
mutate: do not lift table column that results from mutate (ba4e5e5)
pandas: disable range windows with order by (e016664)
pandas: don't reassign the same column to silence SettingWithCopyWarning warning (75dc616)
pandas: implement percent_rank correctly (d8b83e7)
prevent unintentional cross joins in mutate + filter (83eef99)
pyspark: fix range windows (a6f2aa8)
regression in Selection.sort_by with resolved_keys (c7a69cd)
regression in sort_by with resolved_keys (63f1382), closes #3619
remove broken csv pre_execute (93b662a)
remove importorskip call for backend tests (2f0bcd8)
remove incorrect fix for pandas regression (339f544)
remove passing schema into register_parquet (bdcbb08)
repr: add ops.TimeAdd to repr binop lookup table (fd94275)
repr: allow ops.TableNode in fmt_value (6f57003)
reverse the predicate pushdown subsitution (f3cd358)
sort_index to satisfy pandas 1.4.x (6bac0fc)
sqlalchemy: ensure correlated subqueries FROM clauses are rendered (3175321)
sqlalchemy: use corresponding_column to prevent spurious cross joins (fdada21)
sqlalchemy: use replace selectables to prevent semi/anti join cross join (e8a1a71)
sql: retain column names for named ColumnExprs (f1b4b6e), closes #3754
sql: walk right join trees and substitute joins with right-side joins with views (0231592)
store schema on the pandas backend to allow correct inference (35070be)

Performance Improvements

datatypes: speed up str and hash (262d3d7)
fast path for simple column selection (d178498)
ir: global equality cache (13c2bb2)
ir: introduce CachedEqMixin to speed up equality checks (b633925)
repr: remove full tree repr from rule validator error message (65885ab)
speed up attribute access (89d1c05)
use assign instead of concat in projections when possible (985c242)

Miscellaneous Chores

deps: increase sqlalchemy lower bound to 1.4 (560854a)
drop support for Python 3.7 (0afd138)

Code Refactoring

api: make primitive types more cohesive (71da8f7)
api: remove distinct ColumnExpr API (3f48cb8)
api: remove materialize (24285c1)
backends: remove the hdf5 backend (ff34f3e)
backends: remove the parquet backend (b510473)
config: disable graphviz-repr-in-notebook by default (214ad4e)
config: remove old config code and port to pydantic (4bb96d1)
dt.UUID inherits from DataType, not String (2ba540d)
expr: preserve column ordering in aggregations/mutations (668be0f)
hdfs: replace HDFS with fsspec (cc6eddb)
ir: make Annotable immutable (1f2b3fa)
ir: make schema annotable (b980903)
ir: remove unused lineage roots and find_nodes functions (d630a77)
ir: simplify expressions by not storing dtype and name (e929f85)
kudu: remove support for use of kudu through kudu-python (36bd97f)
move coercion functions from schema.py to udf (58eea56), closes #3033
remove blanket call for Expr (3a71116), closes #2258
remove the csv backend (0e3e02e)
udf: make coerce functions in ibis.udf.vectorized private (9ba4392)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3.0.0

3.0.0 (2022-04-25)

⚠ BREAKING CHANGES

Features

Bug Fixes

Performance Improvements

Miscellaneous Chores

Code Refactoring