Skip to content

3.0.0

Compare
Choose a tag to compare
@ibis-project-bot ibis-project-bot released this 25 Apr 17:44

3.0.0 (2022-04-25)

⚠ BREAKING CHANGES

  • ir: The following are breaking changes due to simplifying expression internals
    • ibis.expr.datatypes.DataType.scalar_type and DataType.column_type factory
      methods have been removed, DataType.scalar and DataType.column class
      fields can be used to directly construct a corresponding expression instance
      (though prefer to use operation.to_expr())
    • ibis.expr.types.ValueExpr._name and ValueExpr._dtype`` fields are not accassible anymore. While these were not supposed to used directly now ValueExpr.has_name(), ValueExpr.get_name()andValueExpr.type()` methods
      are the only way to retrieve the expression's name and datatype.
    • ibis.expr.operations.Node.output_type is a property now not a method,
      decorate those methods with @property
    • ibis.expr.operations.ValueOp subclasses must define output_shape and
      output_dtype properties from now on (note the datatype abbreviation dtype
      in the property name)
    • ibis.expr.rules.cast(), scalar_like() and array_like() rules have been
      removed
  • api: Replace t["a"].distinct() with t[["a"]].distinct().
  • deps: The sqlalchemy lower bound is now 1.4
  • ir: Schema.names and Schema.types attributes now have tuple type rather than list
  • expr: Columns that were added or used in an aggregation or
    mutation would be alphabetically sorted in compiled SQL outputs. This
    was a vestige from when Python dicts didn't preserve insertion order.
    Now columns will appear in the order in which they were passed to
    aggregate or mutate
  • api: dt.float is now dt.float64; use dt.float32 for the previous behavior.
  • ir: Relation-based execute_node dispatch rules must now accept tuples of expressions.
  • ir: removed ibis.expr.lineage.{roots,find_nodes} functions
  • config: Use ibis.options.graphviz_repr = True to enable
  • hdfs: Use fsspec instead of HDFS from ibis
  • udf: Vectorized UDF coercion functions are no longer a public API.
  • The minimum supported Python version is now Python 3.8
  • config: register_option is no longer supported, please submit option requests upstream
  • backends: Read tables with pandas.read_hdf and use the pandas backend
  • The CSV backend is removed. Use Datafusion for CSV execution.
  • backends: Use the datafusion backend to read parquet files
  • Expr() -> Expr.pipe()
  • coercion functions previously in expr/schema.py are now in udf/vectorized.py
  • api: materialize is removed. Joins with overlapping columns now have suffixes.
  • kudu: use impala instead: https://kudu.apache.org/docs/kudu_impala_integration.html
  • Any code that was relying implicitly on string-y
    behavior from UUID datatypes will need to add an explicit cast first.

Features

  • add repr_html for expressions to print as tables in ipython (cd6fa4e)
  • add duckdb backend (667f2d5)
  • allow construction of decimal literals (3d9e865)
  • api: add ibis.asc expression (efe177e), closes #1454
  • api: add has_operation API to the backend (4fab014)
  • api: implement type for SortExpr (ab19bd6)
  • clickhouse: implement string concat for clickhouse (1767205)
  • clickhouse: implement StrRight operation (67749a0)
  • clickhouse: implement table union (e0008d7)
  • clickhouse: implement trim, pad and string predicates (a5b7293)
  • datafusion: implement Count operation (4797a86)
  • datatypes: unbounded decimal type (f7e6f65)
  • date: add ibis.date(y,m,d) functionality (26892b6), closes #386
  • duckdb/postgres/mysql/pyspark: implement .sql on tables for mixing sql and expressions (00e8087)
  • duckdb: add functionality needed to pass integer to interval test (e2119e8)
  • duckdb: implement _get_schema_using_query (93cd730)
  • duckdb: implement now() function (6924f50)
  • duckdb: implement regexp replace and extract (18d16a7)
  • implement force argument in sqlalchemy backend base class (9df7f1b)
  • implement coalesce for the pyspark backend (8183efe)
  • implement semi/anti join for the pandas backend (cb36fc5)
  • implement semi/anti join for the pyspark backend (3e1ba9c)
  • implement the remaining clickhouse joins (b3aa1f0)
  • ir: rewrite and speed up expression repr (45ce9b2)
  • mysql: implement _get_schema_from_query (456cd44)
  • mysql: move string join impl up to alchemy for mysql (77a8eb9)
  • postgres: implement _get_schema_using_query (f2459eb)
  • pyspark: implement Distinct for pyspark (4306ad9)
  • pyspark: implement log base b for pyspark (527af3c)
  • pyspark: implement percent_rank and enable testing (c051617)
  • repr: add interval info to interval repr (df26231)
  • sqlalchemy: implement ilike (43996c0)
  • sqlite: implement date_truncate (3ce4f2a)
  • sqlite: implement ISO week of year (714ff7b)
  • sqlite: implement string join and concat (6f5f353)
  • support of arrays and tuples for clickhouse (db512a8)
  • ver: dynamic version identifiers (408f862)

Bug Fixes

  • added wheel to pyproject toml for venv users (b0b8e5c)
  • allow major version changes in CalVer dependencies (9c3fbe5)
  • annotable: allow optional arguments at any position (778995f), closes #3730
  • api: add ibis.map and .struct (327b342), closes #3118
  • api: map string multiplication with integer to repeat method (b205922)
  • api: thread suffixes parameter to individual join methods (31a9aff)
  • change TimestampType to Timestamp (e0750be)
  • clickhouse: disconnect from clickhouse when computing version (11cbf08)
  • clickhouse: use a context manager for execution (a471225)
  • combine windows during windowization (7fdd851)
  • conform epoch_seconds impls to expression return type (18a70f1)
  • context-adjustment: pass scope when calling adjust_context in pyspark backend (33aad7b), closes #3108
  • dask: fix asof joins for newer version of dask (50711cc)
  • dask: workaround dask bug (a0f3bd9)
  • deps: update dependency atpublic to v3 (3fe8f0d)
  • deps: update dependency datafusion to >=0.4,<0.6 (3fb2194)
  • deps: update dependency geoalchemy2 to >=0.6.3,<0.12 (dc3c361)
  • deps: update dependency graphviz to >=0.16,<0.21 (3014445)
  • duckdb: add casts to literals to fix binding errors (1977a55), closes #3629
  • duckdb: fix array column type discovery on leaf tables and add tests (15e5412)
  • duckdb: fix log with base b impl (4920097)
  • duckdb: support both 0.3.2 and 0.3.3 (a73ccce)
  • enforce the schema's column names in apply_to (b0f334d)
  • expose ops.IfNull for mysql backend (156c2bd)
  • expr: add more binary operators to char list and implement fallback (b88184c)
  • expr: fix formatting of table info using tabulate (b110636)
  • fix float vs real data type detection in sqlalchemy (24e6774)
  • fix list_schemas argument (69c1abf)
  • fix postgres udfs and reenable ci tests (7d480d2)
  • fix tablecolumn execution for filter following join (064595b)
  • format: remove some newlines from formatted expr repr (ed4fa78)
  • histogram: cross_join needs onclause=True (5d36a58), closes #622
  • ibis.expr.signature.Parameter is not pickleable (828fd54)
  • implement coalesce properly in the pandas backend (aca5312)
  • implement count on tables for pyspark (7fe5573), closes #2879
  • infer coalesce types when a non-null expression occurs after the first argument (c5f2906)
  • mutate: do not lift table column that results from mutate (ba4e5e5)
  • pandas: disable range windows with order by (e016664)
  • pandas: don't reassign the same column to silence SettingWithCopyWarning warning (75dc616)
  • pandas: implement percent_rank correctly (d8b83e7)
  • prevent unintentional cross joins in mutate + filter (83eef99)
  • pyspark: fix range windows (a6f2aa8)
  • regression in Selection.sort_by with resolved_keys (c7a69cd)
  • regression in sort_by with resolved_keys (63f1382), closes #3619
  • remove broken csv pre_execute (93b662a)
  • remove importorskip call for backend tests (2f0bcd8)
  • remove incorrect fix for pandas regression (339f544)
  • remove passing schema into register_parquet (bdcbb08)
  • repr: add ops.TimeAdd to repr binop lookup table (fd94275)
  • repr: allow ops.TableNode in fmt_value (6f57003)
  • reverse the predicate pushdown subsitution (f3cd358)
  • sort_index to satisfy pandas 1.4.x (6bac0fc)
  • sqlalchemy: ensure correlated subqueries FROM clauses are rendered (3175321)
  • sqlalchemy: use corresponding_column to prevent spurious cross joins (fdada21)
  • sqlalchemy: use replace selectables to prevent semi/anti join cross join (e8a1a71)
  • sql: retain column names for named ColumnExprs (f1b4b6e), closes #3754
  • sql: walk right join trees and substitute joins with right-side joins with views (0231592)
  • store schema on the pandas backend to allow correct inference (35070be)

Performance Improvements

  • datatypes: speed up str and hash (262d3d7)
  • fast path for simple column selection (d178498)
  • ir: global equality cache (13c2bb2)
  • ir: introduce CachedEqMixin to speed up equality checks (b633925)
  • repr: remove full tree repr from rule validator error message (65885ab)
  • speed up attribute access (89d1c05)
  • use assign instead of concat in projections when possible (985c242)

Miscellaneous Chores

  • deps: increase sqlalchemy lower bound to 1.4 (560854a)
  • drop support for Python 3.7 (0afd138)

Code Refactoring

  • api: make primitive types more cohesive (71da8f7)
  • api: remove distinct ColumnExpr API (3f48cb8)
  • api: remove materialize (24285c1)
  • backends: remove the hdf5 backend (ff34f3e)
  • backends: remove the parquet backend (b510473)
  • config: disable graphviz-repr-in-notebook by default (214ad4e)
  • config: remove old config code and port to pydantic (4bb96d1)
  • dt.UUID inherits from DataType, not String (2ba540d)
  • expr: preserve column ordering in aggregations/mutations (668be0f)
  • hdfs: replace HDFS with fsspec (cc6eddb)
  • ir: make Annotable immutable (1f2b3fa)
  • ir: make schema annotable (b980903)
  • ir: remove unused lineage roots and find_nodes functions (d630a77)
  • ir: simplify expressions by not storing dtype and name (e929f85)
  • kudu: remove support for use of kudu through kudu-python (36bd97f)
  • move coercion functions from schema.py to udf (58eea56), closes #3033
  • remove blanket call for Expr (3a71116), closes #2258
  • remove the csv backend (0e3e02e)
  • udf: make coerce functions in ibis.udf.vectorized private (9ba4392)