Skip to content

Releases: ibis-project/ibis

6.2.0

31 Aug 16:02
Compare
Choose a tag to compare

6.2.0 (2023-08-31)

Features

  • trino: add source application to trino backend (cf5fdb9)

Bug Fixes

  • bigquery,impala: escape all ASCII escape sequences in string literals (402f5ca)
  • bigquery: correctly escape ASCII escape sequences in regex patterns (a455203)
  • release: pin conventional-changelog-conventionalcommits to 6.1.0 (d6526b8)
  • trino: ensure that list_databases look at all catalogs not just the current one (cfbdbf1)
  • trino: override incorrect base sqlalchemy list_schemas implementation (84d38a1)

Documentation

  • trino: add connection docstring (507a00e)

6.1.0

03 Aug 20:23
Compare
Choose a tag to compare

6.1.0 (2023-08-03)

Features

  • api: add ibis.dtype top-level API (867e5f1)
  • api: add table.nunique() for counting unique table rows (adcd762)
  • api: allow mixing literals and columns in ibis.array (3355dd8)
  • api: improve efficiency of __dataframe__ protocol (15e27da)
  • api: support boolean literals in join API (c56376f)
  • arrays: add concat method equivalent to __add__/__radd__ (0ed0ab1)
  • arrays: add repeat method equivalent to __mul__/__rmul__ (b457c7b)
  • backends: add current_schema API (955a9d0)
  • bigquery: fill out CREATE TABLE DDL options including support for overwrite (5dac7ec)
  • datafusion: add count_distinct, median, approx_median, stddev and var aggregations (45089c4)
  • datafusion: add extract url fields functions (4f5ea98)
  • datafusion: add functions sign, power, nullifzero, log (ef72e40)
  • datafusion: add RegexSearch, StringContains and StringJoin (4edaab5)
  • datafusion: implement in-memory table (d4ec5c2)
  • flink: add tests and translation rules for additional operators (fc2aa5d)
  • flink: implement translation rules and tests for over aggregation in Flink backend (e173cd7)
  • flink: implement translation rules for literal expressions in flink compiler (a8f4880)
  • improved error messages when missing backend dependencies (2fe851b)
  • make output of to_sql a proper str subclass (084bdb9)
  • pandas: add ExtractURLField functions (e369333)
  • polars: implement ops.SelfReference (983e393)
  • pyspark: read/write delta tables (d403187)
  • refactor ddl for create_database and add create_schema where relevant (d7a857c)
  • sqlite: add scalar python udf support to sqlite (92f29e6)
  • sqlite: implement extract url field functions (cb1956f)
  • trino: implement support for .sql table expression method (479bc60)
  • trino: support table properties when creating a table (b9d65ef)

Bug Fixes

  • api: allow scalar window order keys (3d3f4f3)
  • backends: make current_database implementation and API consistent across all backends (eeeeee0)
  • bigquery: respect the fully qualified table name at the init (a25f460)
  • clickhouse: check dispatching instead of membership in the registry for has_operation (acb7f3f)
  • datafusion: always quote column names to prevent datafusion from normalizing case (310db2b)
  • deps: update dependency datafusion to v27 (3a311cd)
  • druid: handle conversion issues from string, binary, and timestamp (b632063)
  • duckdb: avoid double escaping backslashes for bind parameters (8436f57)
  • duckdb: cast read_only to string for connection (27e17d6)
  • duckdb: deduplicate results from list_schemas() (172520e)
  • duckdb: ensure that current_database returns the correct value (2039b1e)
  • duckdb: handle conversion from duckdb_engine unsigned int aliases (e6fd0cc)
  • duckdb: map hugeint to decimal to avoid information loss (4fe91d4)
  • duckdb: run pre-execute-hooks in duckdb before file export (5bdaa1d)
  • duckdb: use regexp_matches to ensure that matching checks containment instead of a full match (0a0cda6)
  • examples: remove example datasets that are incompatible with case-insensitive file systems (4048826)
  • exprs: ensure that left_semi and semi are equivalent (bbc1eb7)
  • forward arguments through __dataframe__ protocol (50f3be9)
  • ir: change "it not a" to "is not a" in errors (d0d463f)
  • memtable: implement support for translation of empty memtable (05b02da)
  • mysql: fix UUID type reflection for sqlalchemy 2.0.18 (12d4039)
  • mysql: pass-through kwargs to connect_args (e3f3e2d)
  • ops: ensure that name attribute is always valid for ops.SelfReference (9068aca)
  • polars: ensure that pivot_longer works with more than one column (822c912)
  • polars: fix collect implementation (c1182be)
  • postgres: by default use domain socket (e44fdfb)
  • pyspark: make has_operation method a [@classmethod](https://github.com/classmethod) (c1b7dbc)
  • release: use @google/semantic-release-replace-plugin@1.2.0 to avoid module loading bug (673aab3)
  • snowflake: fix broken unnest functionality (207587c)
  • snowflake: reset the schema and database to the original schema after creating them (54ce26a)
  • snowflake: reset to original schema when resetting the database (32ff832)
  • snowflake: use regexp_instr != 0 instead of REGEXP keyword (06e2be4)
  • sqlalchemy: add support for sqlalchemy string subclassed types (8b33b35)
  • sql: handle parsing aliases ([3645cf4](3645cf4119620e8b01...
Read more

6.0.0

05 Jul 15:05
Compare
Choose a tag to compare

6.0.0 (2023-07-05)

⚠ BREAKING CHANGES

  • imports: Use of ibis.udf as a module is removed. Use ibis.legacy.udf instead.

  • The minimum supported Python version is now Python 3.9

  • api: group_by().count() no longer automatically names the count aggregation count. Use relabel to rename columns.

  • backends: Backend.ast_schema is removed. Use expr.as_table().schema() instead.

  • snowflake/postgres: Postgres UDFs now use the new @udf.scalar.python API. This should be a low-effort replacement for the existing API.

  • ir: ops.NullLiteral is removed

  • datatypes: dt.Interval has no longer a default unit, dt.interval is removed

  • deps: snowflake-connector-python's lower bound was increased to 3.0.2, the minimum version needed to avoid a high-severity vulernability. Please upgrade snowflake-connector-python to at least version 3.0.2.

  • api: Table.difference(), Table.intersection(), and Table.union() now require at least one argument.

  • postgres: Ibis no longer automatically defines first/last reductions on connection to the postgres backend. Use DDL shown in https://wiki.postgresql.org/wiki/First/last_(aggregate) or one of the pgxn implementations instead.

  • api: ibis.examples.<example-name>.fetch no longer forwards arbitrary keyword arguments to read_csv/read_parquet.

  • datatypes: dt.Interval.value_type attribute is removed

  • api: Table.count() is no longer automatically named "count". Use Table.count().name("count") to achieve the previous behavior.

  • trino: The trino backend now requires at least version 0.321 of the trino Python package.

  • backends: removed AlchemyTable, AlchemyDatabase, DaskTable, DaskDatabase, PandasTable, PandasDatabase, PySparkDatabaseTable, use ops.DatabaseTable instead

  • dtypes: temporal unit enums are now available under ibis.common.temporal instead of ibis.common.enums.

  • clickhouse: external_tables can no longer be passed in ibis.clickhouse.connect. Pass external_tables directly in raw_sql/execute/to_pyarrow/to_pyarrow_batches().

  • datatypes: dt.Set is now an alias for dt.Array

  • bigquery: Before this change, ibis timestamp is mapping to Bigquery TIMESTAMP type and no timezone supports. However, it's not correct, BigQuery TIMESTAMP type should have UTC timezone, while DATETIME type is the no timezone version. Hence, this change is breaking the ibis timestamp mapping to BigQuery: If ibis timestamp has the UTC timezone, will map to BigQuery TIMESTAMP type. If ibis timestamp has no timezone, will map to BigQuery DATETIME type.

  • impala: Cursors are no longer returned from DDL operations to prevent resource leakage. Use raw_sql if you need specialized operations that return a cursor. Additionally, table-based DDL operations now return the table they're operating on.

  • api: Column.first()/Column.last() are now reductions by default. Code running these expressions in isolation will no longer be windowed over the entire table. Code using this function in select-based APIs should function unchanged.

  • bigquery: when using the bigquery backend, casting float to int
    will no longer round floats to the nearest integer

  • ops.Hash: The hash method on table columns on longer accepts
    the how argument. The hashing functions available are highly
    backend-dependent and the intention of the hash operation is to provide
    a fast, consistent (on the same backend, only) integer value.
    If you have been passing in a value for how, you can remove it and you
    will get the same results as before, as there were no backends with
    multiple hash functions working.

  • duckdb: Some CSV files may now have headers that did not have them previously. Set header=False to get the previous behavior.

  • deps: New environments will have a different default setting for compression in the ClickHouse backend due to removal of optional dependencies. Ibis is still capable of using the optional dependencies but doesn't include them by default. Install clickhouse-cityhash and lz4 to preserve the previous behavior.

  • api: Table.set_column() is removed; use Table.mutate(name=expr) instead

  • api: the suffixes argument in all join methods has been removed in favor of lname/rname args. The default renaming scheme for duplicate columns has also changed. To get the exact same behavior as before, pass in lname="{name}_x", rname="{name}_y".

  • ir: IntervalType.unit is now an enum instead of a string

  • type-system: Inferred types of Python objects may be slightly different. Ibis now use pyarrow to infer the column types of pandas DataFrame and other types.

  • backends: path argument of Backend.connect() is removed, use the database argument instead

  • api: removed Table.sort_by() and Table.groupby(), use .order_by() and .group_by() respectively

  • datatypes: DataType.scalar and column class attributes are now strings.

  • backends: Backend.load_data(), Backend.exists_database() and Backend.exists_table() are removed

  • ir: Value.summary() and NumericValue.summary() are removed

  • schema: Schema.merge() is removed, use the union operator schema1 | schema2 instead

  • api: ibis.sequence() is removed

  • drop support for Python 3.8 (747f4ca)

Features

  • add dask windowing (9cb920a)
  • add easy type hints to GroupBy (da330b1)
  • add microsecond method to TimestampValue and TimeValue (e9df2da)
  • api: add __dataframe__ implementation (b3d9619)
  • api: add ALL_CAPS option to Table.relabel (c0b30e2)
  • api: add first/last reduction APIs (8c01980)
  • api: add zip operation and api (fecf695)
  • api: allow passing multiple keyword arguments to ibis.interval (22ee854)
  • api: better repr and pickle support for deferred expressions (2b1ec9c)
  • api: exact median (c53031c)
  • api: raise better error on column name collision in joins (e04c38c)
  • api: replace suffixes in join with lname/rname (3caf3a1)
  • api: support abstract type names in selectors.of_type (f6d2d56)
  • api: support list of strings and single strings in the across selector (a6b60e7)
  • api: use create_table to load example data (42e09a4)
  • bigquery: add client and storage_client params to connect (4cf1354)
  • bigquery: enable group_concat over windows (d6a1117)
  • cast: add table-level try_cast (5e4d16b)
  • clickhouse: add array zip impl (efba835)
  • clickhouse: move to clickhouse supported Python client (012557a)
  • clickhouse: set default engine to native file (29815fa)
  • clickhouse: support pyarrow decimal types (7472dd5)
  • common: add a pure python egraph implementation (aed2ed0)
  • common: add pattern matchers (b515d5c)
  • common: add support for start parameter in StringFind (31ce741)
  • common: add Topmost and Innermost pattern matchers (90b48fc)
  • common: implement copy protocol for Immutable base class (e61c66b)
  • create_table: support pyarrow Table in table creation (9dbb25c)
  • datafusion: add string functions (66c0afb)
  • datafusion: add support for scalar pyarrow UDFs ([45935b7](45935b78922f09ab...
Read more

5.1.0

11 Apr 17:44
Compare
Choose a tag to compare

5.1.0 (2023-04-11)

Features

  • api: expand distinct API for dropping duplicates based on column subsets (3720ea5)
  • api: implement pyarrow memtables (9d4fbbd)
  • api: support passing a format string to Table.relabel (0583959)
  • api: thread kwargs around properly to support more complex connection arguments (7e0e15b)
  • backends: add more array functions (5208801)
  • bigquery: make to_pyarrow_batches() smarter (42f5987)
  • bigquery: support bignumeric type (d7c0f49)
  • default repr to showing all columns in Jupyter notebooks (91a0811)
  • druid: add re_search support (946202b)
  • duckdb: add map operations (a4c4e77)
  • duckdb: support sqlalchemy 2 (679bb52)
  • mssql: implement ops.StandardDev, ops.Variance (e322f1d)
  • pandas: support memtable in pandas backend (6e4d621), closes #5467
  • polars: implement count distinct (aea4ccd)
  • postgres: implement ops.Arbitrary (ee8dbab)
  • pyspark: pivot_longer (f600c90)
  • pyspark: add ArrayFilter operation (2b1301e)
  • pyspark: add ArrayMap operation (e2c159c)
  • pyspark: add DateDiff operation (bfd6109)
  • pyspark: add partial support for interval types (067120d)
  • pyspark: add read_csv, read_parquet, and register (7bd22af)
  • pyspark: implement count distinct (db29e10)
  • pyspark: support basic caching (ab0df7a)
  • snowflake: add optional 'connect_args' param (8bf2043)
  • snowflake: native pyarrow support (ce3d6a4)
  • sqlalchemy: support unknown types (fde79fa)
  • sqlite: implement ops.Arbitrary (9bcdf77)
  • sql: use temp views where possible (5b9d8c0)
  • table: implement pivot_wider API (60e7731)
  • ux: move ibis.expr.selectors to ibis.selectors and deprecate for removal in 6.0 (0ae639d)

Bug Fixes

  • api: disambiguate attribute errors from a missing resolve method (e12c4df)
  • api: support filter on literal followed by aggregate (68d65c8)
  • clickhouse: do not render aliases when compiling aggregate expression components (46caf3b)
  • clickhouse: ensure that clickhouse depends on sqlalchemy for make_url usage (ea10a27)
  • clickhouse: ensure that truncate works (1639914)
  • clickhouse: fix create_table implementation (5a54489)
  • clickhouse: workaround sqlglot issue with calling match (762f4d6)
  • deps: support pandas 2.0 (4f1d9fe)
  • duckdb: branch to avoid unnecessary dataframe construction (9d5d943)
  • duckdb: disable the progress bar by default (1a1892c)
  • duckdb: drop use of experimental parallel csv reader (47d8b92)
  • duckdb: generate SIMILAR TO instead of tilde to workaround sqlglot issue (434da27)
  • improve typing signature of .dropna() (e11de3f)
  • mssql: improve aggregation on expressions (58aa78d)
  • mssql: remove invalid aggregations (1ce3ef9)
  • polars: backwards compatibility for the time_zone and time_unit properties (3a2c4df)
  • postgres: allow inference of unknown types (343fb37)
  • pyspark: fail when aggregation contains a having filter (bd81a9f)
  • pyspark: raise proper error when trying to generate sql (51afc13)
  • snowflake: fix new array operations; remove ArrayRemove operation (772668b)
  • snowflake: make sure ephemeral tables following backend quoting rules (9a845df)
  • snowflake: make sure pyarrow is used when possible (01f5154)
  • sql: ensure that set operations resolve to a single relation (3a02965)
  • sql: generate consistent pivot_longer semantics in the presence of multiple unnests (6bc301a)
  • sqlglot: work with newer versions (6f7302d)
  • trino,duckdb,postgres: make cumulative notany/notall aggregations work (c2e985f)
  • trino: only support how='first' with arbitrary reduction (315b5e7)
  • ux: use guaranteed length-1 characters for NULL values (8618789)

Refactors

  • api: remove explicit use of .projection in favor of the shorter .select (73df8df)
  • cache: factor out ref counted cache (c816f00)
  • duckdb: simplify to_pyarrow_batches implementation (d6235ee)
  • duckdb: source loaded and installed extensions from duckdb (fb06262)
  • duckdb: use native duckdb parquet reader unless auth required (e9f57eb)
  • generate uuid-based names for temp tables ([a1164df](a1164df5d1bc4fa454371626a05...
Read more

5.0.0

15 Mar 22:36
Compare
Choose a tag to compare

5.0.0 (2023-03-15)

⚠ BREAKING CHANGES

  • api: Snowflake identifiers are now kept as is from the database. Many table names and column names may now be in SHOUTING CASE. Adjust code accordingly.
  • backend: Backends now raise ibis.common.exceptions.UnsupportedOperationError in more places during compilation. You may need to catch this error type instead of the previous type, which differed between backends.
  • ux: Table.info now returns an expression
  • ux: Passing a sequence of column names to Table.drop is removed. Replace drop(cols) with drop(*cols).
  • The spark plugin alias is removed. Use pyspark instead
  • ir: removed ibis.expr.scope and ibis.expr.timecontext modules, access them under ibis.backends.base.df.<module>
  • some methods have been removed from the top-level ibis.<backend> namespaces, access them on a connected backend instance instead.
  • common: removed ibis.common.geospatial, import the functions from ibis.backends.base.sql.registry.geospatial
  • datatypes: JSON is no longer a subtype of String
  • datatype: Category, CategoryValue/Column/Scalar are removed. Use string types instead.
  • ux: The metric_name argument to value_counts is removed. Use Table.relabel to change the metric column's name.
  • deps: the minimum version of parsy is now 2.0
  • ir/backends: removed the following symbols:
  • ibis.backends.duckdb.parse_type() function
  • ibis.backends.impala.Backend.set_database() method
  • ibis.backends.pyspark.Backend.set_database() method
  • ibis.backends.impala.ImpalaConnection.ping() method
  • ibis.expr.operations.DatabaseTable.change_name() method
  • ibis.expr.operations.ParseURL class
  • ibis.expr.operations.Value.to_projection() method
  • ibis.expr.types.Table.get_column() method
  • ibis.expr.types.Table.get_columns() method
  • ibis.expr.types.StringValue.parse_url() method
  • schema: Schema.from_dict(), .delete() and .append() methods are removed
  • datatype: struct_type.pairs is removed, use struct_type.fields instead
  • datatype: Struct(names, types) is not supported anymore, pass a dictionary to Struct constructor instead

Features

  • add max_columns option for table repr (a3aa236)
  • add examples API (b62356e)
  • api: add map/array accessors for easy conversion of JSON to stronger-typed values (d1e9d11)
  • api: add array to string join operation (74de349)
  • api: add builtin support for relabeling columns to snake case (1157273)
  • api: add support for passing a mapping to ibis.map (d365fd4)
  • api: allow single argument set operations (bb0a6f0)
  • api: implement to_pandas() API for ecosystem compatibility (cad316c)
  • api: implement isin (ac31db2)
  • api: make cache evaluate only once per session per expression (5a8ffe9)
  • api: make create_table uniform (833c698)
  • api: more selectors (5844304)
  • api: upcast pandas DataFrames to memtables in rlz.table rule (8dcfb8d)
  • backends: implement ops.Time for sqlalchemy backends (713cd33)
  • bigquery: add BIGNUMERIC type support (5c98ea4)
  • bigquery: add UUID literal support (ac47c62)
  • bigquery: enable subqueries in select statements (ef4dc86)
  • bigquery: implement create and drop table method (5f3c22c)
  • bigquery: implement create_view and drop_view method (a586473)
  • bigquery: support creating tables from in-memory tables (c3a25f1)
  • bigquery: support in-memory tables (37e3279)
  • change Rich repr of dtypes from blue to dim (008311f)
  • clickhouse: implement ArrayFilter translation (f2144b6)
  • clickhouse: implement ops.ArrayMap (45000e7)
  • clickhouse: implement ops.MapLength (fc82eaa)
  • clickhouse: implement ops.Capitalize (914c64c)
  • clickhouse: implement ops.ExtractMillisecond (ee74e3a)
  • clickhouse: implement ops.RandomScalar (104aeed)
  • clickhouse: implement ops.StringAscii (a507d17)
  • clickhouse: implement ops.TimestampFromYMDHMS, ops.DateFromYMD (05f5ae5)
  • clickhouse: improve error message for invalid types in literal (e4d7799)
  • clickhouse: support asof_join (7ed5143)
  • common: add abstract mapping collection with support for set operations (7d4aa0f)
  • common: add support for variadic positional and variadic keyword annotations (baea1fa)
  • common: hold typehint in the annotation objects (b3601c6)
  • common: support Callable arguments and return types in Validator.from_annotable() (ae57c36)
  • common: support positional only and keyword only arguments in annotations (340dca1)
  • dask/pandas: raise OperationNotDefinedError exc for not defined operations (2833685)
  • datafusion: implement ops.Degress, ops.Radians (7e61391)
  • datafusion: implement ops.Exp (7cb3ade)
  • datafusion: implement ops.Pi, ops.E (5a74cb4)
  • datafusion: implement ops.RandomScalar (5d1cd0f)
  • datafusion: implement ops.StartsWith (8099014)
  • datafusion: implement ops.StringAscii (b1d7672)
  • datafusion: implement ops.StrRight (016a082)
  • datafusion: implement ops.Translate (2fe3fc4)
  • datafusion: support substr without end (a19fd87)
  • datatype/schema: support datatype and schema declaration using type annotated classes (6722c31)
  • datatype: enable inference of Decimal type (8761732)
  • datatype: implement Mapping abstract base class for StructType (5df2022)
  • deps: add Python 3.11 support and tests ([6f3f759](https://github.com/ibis-project/ibis/commit...
Read more

4.1.0

25 Jan 12:29
Compare
Choose a tag to compare

4.1.0 (2023-01-25)

Features

  • add ibis.get_backend function (2d27df8)
  • add py.typed to allow mypy to type check packages that use ibis (765d42e)
  • api: add ibis.set_backend function (e7fabaf)
  • api: add selectors for easier selection of columns (306bc88)
  • bigquery: add JS UDF support (e74328b)
  • bigquery: add SQL UDF support (db24173)
  • bigquery: add to_pyarrow method (30157c5)
  • bigquery: implement bitwise operations (55b69b1)
  • bigquery: implement ops.Typeof (b219919)
  • bigquery: implement ops.ZeroIfNull (f4c5607)
  • bigquery: implement struct literal (c5f2a1d)
  • clickhouse: properly support native boolean types (31cc7ba)
  • common: add support for annotating with coercible types (ae4a415)
  • common: make frozendict truly immutable (1c25213)
  • common: support annotations with typing.Literal (6f89f0b)
  • common: support generic mapping and sequence type annotations (ddc6603)
  • dask: support connect() with no arguments (67eed42)
  • datatype: add optional timestamp scale parameter (a38115a)
  • datatypes: add as_struct method to convert schemas to structs (64be7b1)
  • duckdb: add read_json function for consuming newline-delimited JSON files (65e65c1)
  • mssql: add a bunch of missing types (c698d35)
  • mssql: implement inference for DATETIME2 and DATETIMEOFFSET (aa9f151)
  • nicer repr for Backend.tables (0d319ca)
  • pandas: support connect() with no arguments (78cbbdd)
  • polars: allow ibis.polars.connect() to function without any arguments (d653a07)
  • polars: handle casting to scaled timestamps (099d1ec)
  • postgres: add Map(string, string) support via the built-in HSTORE extension (f968f8f)
  • pyarrow: support conversion to pyarrow map and struct types (54a4557)
  • snowflake: add more array operations (8d8bb70)
  • snowflake: add more map operations (7ae6e25)
  • snowflake: any/all/notany/notall reductions (ba1af5e)
  • snowflake: bitwise reductions (5aba997)
  • snowflake: date from ymd (035f856)
  • snowflake: fix array slicing (bd7af2a)
  • snowflake: implement ArrayCollect (c425f68)
  • snowflake: implement NthValue (0dca57c)
  • snowflake: implement ops.Arbitrary (45f4f05)
  • snowflake: implement ops.StructColumn (41698ed)
  • snowflake: implement StringSplit (e6acc09)
  • snowflake: implement StructField and struct literals (286a5c3)
  • snowflake: implement TimestampFromUNIX (314637d)
  • snowflake: implement TimestampFromYMDHMS (1eba8be)
  • snowflake: implement typeof operation (029499c)
  • snowflake: implement exists/not exists (7c8363b)
  • snowflake: implement extract millisecond (3292e91)
  • snowflake: make literal maps and params work (dd759d3)
  • snowflake: regex extract, search and replace (9c82179)
  • snowflake: string to timestamp (095ded6)
  • sqlite: implement _get_schema_using_query in SQLite backend (7ff84c8)
  • trino: compile timestamp types with scale (67683d3)
  • trino: enable ops.ExistsSubquery and ops.NotExistsSubquery (9b9b315)
  • trino: map parameters (53bd910)
  • ux: improve error message when column is not found (b527506)

Bug Fixes

  • backend: read the default backend setting in _default_backend (11252af)
  • bigquery: move connection logic to do_connect (42f2106)
  • bigquery: remove invalid operations from registry (911a080)
  • bigquery: resolve deprecation warnings for StructType and Schema (c9e7078)
  • clickhouse: fix position call (702de5d)
  • correctly visualize array type (26b0b3f)
  • deps: make sure pyarrow is not an implicit dependency (10373f4)
  • duckdb: make read_csv on URLs work (9e61816)
  • duckdb: only try to load extensions when necessary for csv (c77bde7)
  • duckdb: remove invalid operations from registry (ba2ec59)
  • fallback to default backend with to_pyarrow/to_pyarrow_batches (a1a6902)
  • impala: remove broken alias elision (32b120f)
  • ir: error for order_by on nonexistent column (57b1dd8)
  • ir: ops.Where output shape should consider all arguments (...
Read more

4.0.0

09 Jan 21:11
Compare
Choose a tag to compare

4.0.0 (2023-01-09)

⚠ BREAKING CHANGES

  • functions, methods and classes marked as deprecated are removed now
  • ir: replace HLLCardinality with ApproxCountDistinct and CMSMedian with ApproxMedian operations.
  • backends: the datatype of returned execution results now more closely matches that of the ibis expression's type. Downstream code may need to be adjusted.
  • ir: the JSONB type is replaced by the JSON type.
  • dev-deps: expression types have been removed from ibis.expr.api. Use import ibis.expr.types as ir to access these types.
  • common: removed @immutable_property decorator, use @attribute.default instead
  • timestamps: the timezone argument to to_timestamp is gone. This was only supported in the BigQuery backend. Append %Z to the format string and the desired time zone to the input column if necessary.
  • deps: ibis now supports at minimum duckdb 0.3.3. Please upgrade your duckdb install as needed.
  • api: previously ibis.connect would return a Table object when calling connect on a parquet/csv file. This now returns a backend containing a single table created from that file. When possible users may use ibis.read instead to read files into ibis tables.
  • api: histogram()'s closed argument no longer exists because it never had any effect. Remove it from your histogram method calls.
  • pandas/dask: the Pandas and Dask backends now interpret casting ints to/from timestamps as seconds since the unix epoch, matching other backends.
  • datafusion: register_csv and register_parquet are removed. Pass filename to register method instead.
  • ir: ops.NodeList and ir.List are removed. Use tuples to represent sequence of expressions instead.
  • api: re_extract now follows re.match behavior. In particular, the 0th group is now the entire string if there's a match, otherwise the groups are 1-based.
  • datatypes: enums are now strings. Likely no action needed since no functionality existed.
  • ir: Replace t[t.x.topk(...)] with t.semi_join(t.x.topk(...), "x").
  • ir: ir.Analytic.type() and ir.TopK.type() methods are removed.
  • api: the default limit for table/column expressions is now None (meaning no limit).
  • ir: join changes: previously all column names that collided between left and right tables were renamed with an appended suffix. Now for the case of inner joins with only equality predicates, colliding columns that are known to be equal due to the join predicates aren't renamed.
  • impala: kerberos support is no longer installed by default for the impala backend. To add support you'll need to install the kerberos package separately.
  • ir: ops.DeferredSortKey is removed. Use ops.SortKey directly instead.
  • ir: ibis.common.grounds.Annotable is mutable by default now
  • ir: node.has_resolved_name() is removed, use isinstance(node, ops.Named) instead; node.resolve_name() is removed use node.name instead
  • ir: removed ops.Node.flat_args(), directly use node.args property instead
  • ir: removed ops.Node.inputs property, use the multipledispatched get_node_arguments() function in the pandas backend
  • ir: Node.blocks() method has been removed.
  • ir: HasSchema mixin class is no longer available, directly subclass ops.TableNode and implement schema property instead
  • ir: Removed Node.output_type property in favor of abstractmethod Node.to_expr() which now must be explicitly implemented
  • ir: Expr(Op(Expr(Op(Expr(Op))))) is now represented as Expr(Op(Op(Op))), so code using ibis internals must be migrated
  • pandas: Use timezone conversion functions to compute the original machine localized value
  • common: use ibis.common.validators.{Patameter, Signature} instead
  • ir: ibis.expr.lineage.lineage() is now removed
  • ir: removed ir.DestructValue, ir.DestructScalar and ir.DestructColumn, use table.unpack() instead
  • ir: removed Node.root_tables() method, use ibis.expr.analysis.find_immediate_parent_tables() instead
  • impala: use other methods for pinging the database

Features

  • add experimental decorator (791335f)
  • add to_pyarrow and to_pyarrow_batches (a059cf9)
  • add unbind method to expressions (4b91b0b), closes #4536
  • add way to specify sqlglot dialect on backend (f1c0608)
  • alchemy: implement json getitem for sqlalchemy backends (7384087)
  • api: add agg alias for aggregate (907583f)
  • api: add agg alias to group_by (6b6367c)
  • api: add ibis.read top level API function (e67132c)
  • api: add JSON __getitem__ operation (3e2efb4)
  • api: implement __array__ (1402347)
  • api: make drop variadic (1d69702)
  • api: return object from to_sql to support notebook syntax highlighting (87c9833)
  • api: use rich for interactive __repr__ (04758b8)
  • backend: make ArrayCollect filterable (1e1a5cf)
  • backends/mssql: add backend support for Microsoft Sql Server (fc39323)
  • bigquery: add ops.DateFromYMD, ops.TimeFromHMS, ops.TimestampFromYMDHMS (a4a7936)
  • bigquery: add ops.ExtractDayOfYear (30c547a)
  • bigquery: add support for correlation (4df9f8b)
  • bigquery: implement argmin and argmax (40c5f0d)
  • bigquery: implement pi and e (b91370a)
  • bigquery: implement array repeat (09d1e2f)
  • bigquery: implement JSON getitem functionality (9c0e775)
  • bigquery: implement ops.ArraySlice (49414ef)
  • bigquery: implement ops.Capitalize (5757bb0)
  • bigquery: implement ops.Clip (5495d6d)
  • bigquery: implement ops.Degrees, ops.Radians (5119b93)
  • bigquery: implement ops.ExtractWeekOfYear (477d287)
  • bigquery: implement ops.RandomScalar (5dc8482)
  • bigquery: implement ops.StructColumn, ops.ArrayColumn (2bbf73c)
  • bigquery: implement ops.Translate (77a4b3e)
  • bigquery: implementt ops.NthValue (b43ba28)
  • bigquery: move bigquery backend back into the main repo (cd5e881)
  • clickhouse: handle more options in parse_url implementation (874c5c0)
  • clickhouse: implement INTERSECT ALL/EXCEPT ALL (f65fbc3)
  • clickhouse: implement quantile/multiquantile (96d7d1b)
  • common: support function annotations with both typehints and rules (7e23f3e)
  • dask: implement mode aggregation (017f07a)
  • dask: implement json getitem (381d805)
  • datafusion: convert column expressions to...
Read more

3.2.0

15 Sep 11:01
Compare
Choose a tag to compare

3.2.0 (2022-09-15)

Features

  • add api to get backend entry points (0152f5e)
  • api: add and_ and or_ helpers (94bd4df)
  • api: add argmax and argmin column methods (b52216a)
  • api: add distinct to Intersection and Difference operations (cd9a34c)
  • api: add ibis.memtable API for constructing in-memory table expressions (0cc6948)
  • api: add ibis.sql to easily get a formatted SQL string (d971cc3)
  • api: add Table.unpack() and StructValue.lift() APIs for projecting struct fields (ced5f53)
  • api: allow transmute-style select method (d5fc364)
  • api: implement all bitwise operators (7fc5073)
  • api: promote psql to a show_sql public API (877a05d)
  • clickhouse: add dataframe external table support for memtables (bc86aa7)
  • clickhouse: add enum, ipaddr, json, lowcardinality to type parser (8f0287f)
  • clickhouse: enable support for working window functions (310a5a8)
  • clickhouse: implement argmin and argmax (ee7c878)
  • clickhouse: implement bitwise operations (348cd08)
  • clickhouse: implement struct scalars (1f3efe9)
  • dask: implement StringReplace execution (1389f4b)
  • dask: implement ungrouped argmin and argmax (854aea7)
  • deps: support duckdb 0.5.0 (47165b2)
  • duckdb: handle query parameters in ibis.connect (fbde95d)
  • duckdb: implement argmin and argmax (abf03f1)
  • duckdb: implement bitwise xor (ca3abed)
  • duckdb: register tables from pandas/pyarrow objects (36e48cc)
  • duckdb: support unsigned integer types (2e67918)
  • impala: implement bitwise operations (c5302ab)
  • implement dropna for SQL backends (8a747fb)
  • log: make BaseSQLBackend._log print by default (12de5bb)
  • mysql: register BLOB types (1e4fb92)
  • pandas: implement argmin and argmax (bf9b948)
  • pandas: implement NotContains on grouped data (976dce7)
  • pandas: implement StringReplace execution (578795f)
  • pandas: implement Contains with a group by (c534848)
  • postgres: implement bitwise xor (9b1ebf5)
  • pyspark: add option to treat nan as null in aggregations (bf47250)
  • pyspark: implement ibis.connect for pyspark (a191744)
  • pyspark: implement Intersection and Difference (9845a3c)
  • pyspark: implement bitwise operators (33cadb1)
  • sqlalchemy: implement bitwise operator translation (bd9f64c)
  • sqlalchemy: make ibis.connect with sqlalchemy backends (b6cefb9)
  • sqlalchemy: properly implement Intersection and Difference (2bc0b69)
  • sql: implement StringReplace translation (29daa32)
  • sqlite: implement bitwise xor and bitwise not (58c42f9)
  • support table.sort_by(ibis.random()) (693005d)
  • type-system: infer pandas' string dtype (5f0eb5d)
  • ux: add duckdb as the default backend (8ccb81d)
  • ux: use rich to format Table.info() output (67234c3)
  • ux: use sqlglot for pretty printing SQL (a3c81c5)
  • variadic union, intersect, & difference functions (05aca5a)

Bug Fixes

  • api: make sure column names that are already inferred are not overwritten (6f1cb16)
  • api: support deferred objects in existing API functions (241ce6a)
  • backend: ensure that chained limits respect prior limits (02a04f5)
  • backends: ensure select after filter works (e58ca73)
  • backends: only recommend installing ibis-foo when foo is a known backend (ac6974a)
  • base-sql: fix String-generating backend string concat implementation (3cf78c1)
  • clickhouse: add IPv4/IPv6 literal inference (0a2f315)
  • clickhouse: cast repeat times argument to UInt64 (b643544)
  • clickhouse: fix listing tables from databases with no tables (08900c3)
  • compilers: make sure memtable rows have names in the SQL string compilers (18e7f95)
  • compiler: use repr for SQL string VALUES data (75af658)
  • dask: ensure predicates are computed before projections (5cd70e1)
  • dask: implement timestamp-date binary comparisons (48d5058)
  • dask: set dask upper bound due to large scale test breakage (796c645), closes #9221
  • decimal: add decimal type inference (3fe3fd8)
  • deps: update dependency duckdb-engine to >=0.1.8,<0.4.0 (113dc8f)
  • deps: update dependency duckdb-engine t...
Read more

3.1.0

26 Jul 09:54
Compare
Choose a tag to compare

3.1.0 (2022-07-26)

Features

  • add __getattr__ support to StructValue (75bded1)
  • allow selection subclasses to define new node args (2a7dc41)
  • api: accept Schema objects in public ibis.schema (0daac6c)
  • api: add .tables accessor to BaseBackend (7ad27f0)
  • api: add e function to public API (3a07e70)
  • api: add ops.StructColumn operation (020bfdb)
  • api: add cume_dist operation (6b6b185)
  • api: add toplevel ibis.connect() (e13946b)
  • api: handle literal timestamps with timezone embedded in string (1ae976b)
  • api: ibis.connect() default to duckdb for parquet/csv extensions (ff2f088)
  • api: make struct metadata more convenient to access (3fd9bd8)
  • api: support tab completion for backends (eb75fc5)
  • api: underscore convenience api (81716da)
  • api: unnest (98ecb09)
  • backends: allow column expressions from non-foreign tables on the right side of isin/notin (e1374a4)
  • base-sql: implement trig and math functions (addb2c1)
  • clickhouse: add ability to pass arbitrary kwargs to Clickhouse do_connect (583f599)
  • clickhouse: implement ops.StructColumn operation (0063007)
  • clickhouse: implement array collect (8b2577d)
  • clickhouse: implement ArrayColumn (1301f18)
  • clickhouse: implement bit aggs (f94a5d2)
  • clickhouse: implement clip (12dfe50)
  • clickhouse: implement covariance and correlation (a37c155)
  • clickhouse: implement degrees (7946c0f)
  • clickhouse: implement proper type serialization (80f4ab9)
  • clickhouse: implement radians (c7b7f08)
  • clickhouse: implement strftime (222f2b5)
  • clickhouse: implement struct field access (fff69f3)
  • clickhouse: implement trig and math functions (c56440a)
  • clickhouse: support subsecond timestamp literals (e8698a6)
  • compiler: restore intersect_class and difference_class overrides in base SQL backend (2c46a15)
  • dask: implement trig functions (e4086bb)
  • dask: implement zeroifnull (38487db)
  • datafusion: implement negate (69dd64d)
  • datafusion: implement trig functions (16803e1)
  • duckdb: add register method to duckdb backend to load parquet and csv files (4ccc6fc)
  • duckdb: enable find_in_set test (377023d)
  • duckdb: enable group_concat test (4b9ad6c)
  • duckdb: implement ops.StructColumn operation (211bfab)
  • duckdb: implement approx_count_distinct (03c89ad)
  • duckdb: implement approx_median (894ce90)
  • duckdb: implement arbitrary first and last aggregation (8a500bc)
  • duckdb: implement NthValue (1bf2842)
  • duckdb: implement strftime (aebc252)
  • duckdb: return the ir.Table instance from DuckDB's register API (0d05d41)
  • mysql: implement FindInSet (e55bbbf)
  • mysql: implement StringToTimestamp (169250f)
  • pandas: implement bitwise aggregations (37ff328)
  • pandas: implement degrees (25b4f69)
  • pandas: implement radians (6816b75)
  • pandas: implement trig functions (1fd52d2)
  • pandas: implement zeroifnull (48e8ed1)
  • postgres/duckdb: implement covariance and correlation (464d3ef)
  • postgres: implement ArrayColumn (7b0a506)
  • pyspark: implement approx_count_distinct (1fe1d75)
  • pyspark: implement approx_median (07571a9)
  • pyspark: implement covariance and correlation (ae818fb)
  • pyspark: implement degrees (f478c7c)
  • pyspark: implement nth_value (abb559d)
  • pyspark: implement nullifzero (640234b)
  • pyspark: implement radians (18843c0)
  • pyspark: implement trig functions (fd7621a)
  • pyspark: implement Where (32b9abb)
  • pyspark: implement xor (550b35b)
  • pyspark: implement zeroifnull (db13241)
  • pyspark: topk support (9344591)
  • sqlalchemy: add degrees and radians (8b7415f)
  • sqlalchemy: add xor translation rule (2921664)
  • sqlalchemy: allow non-primitive arrays ([4e02918](4e02918...
Read more

3.0.2

28 Apr 16:30
Compare
Choose a tag to compare

3.0.2 (2022-04-28)

Bug Fixes

  • docs: fix tempdir location for docs build (dcd1b22)