Releases: ibis-project/ibis
6.2.0
6.2.0 (2023-08-31)
Features
- trino: add source application to trino backend (cf5fdb9)
Bug Fixes
- bigquery,impala: escape all ASCII escape sequences in string literals (402f5ca)
- bigquery: correctly escape ASCII escape sequences in regex patterns (a455203)
- release: pin conventional-changelog-conventionalcommits to 6.1.0 (d6526b8)
- trino: ensure that list_databases look at all catalogs not just the current one (cfbdbf1)
- trino: override incorrect base sqlalchemy
list_schemas
implementation (84d38a1)
Documentation
- trino: add connection docstring (507a00e)
6.1.0
6.1.0 (2023-08-03)
Features
- api: add
ibis.dtype
top-level API (867e5f1) - api: add
table.nunique()
for counting unique table rows (adcd762) - api: allow mixing literals and columns in
ibis.array
(3355dd8) - api: improve efficiency of
__dataframe__
protocol (15e27da) - api: support boolean literals in join API (c56376f)
- arrays: add
concat
method equivalent to__add__
/__radd__
(0ed0ab1) - arrays: add
repeat
method equivalent to__mul__
/__rmul__
(b457c7b) - backends: add
current_schema
API (955a9d0) - bigquery: fill out
CREATE TABLE
DDL options including support foroverwrite
(5dac7ec) - datafusion: add count_distinct, median, approx_median, stddev and var aggregations (45089c4)
- datafusion: add extract url fields functions (4f5ea98)
- datafusion: add functions sign, power, nullifzero, log (ef72e40)
- datafusion: add RegexSearch, StringContains and StringJoin (4edaab5)
- datafusion: implement in-memory table (d4ec5c2)
- flink: add tests and translation rules for additional operators (fc2aa5d)
- flink: implement translation rules and tests for over aggregation in Flink backend (e173cd7)
- flink: implement translation rules for literal expressions in flink compiler (a8f4880)
- improved error messages when missing backend dependencies (2fe851b)
- make output of
to_sql
a properstr
subclass (084bdb9) - pandas: add ExtractURLField functions (e369333)
- polars: implement
ops.SelfReference
(983e393) - pyspark: read/write delta tables (d403187)
- refactor ddl for create_database and add create_schema where relevant (d7a857c)
- sqlite: add scalar python udf support to sqlite (92f29e6)
- sqlite: implement extract url field functions (cb1956f)
- trino: implement support for
.sql
table expression method (479bc60) - trino: support table properties when creating a table (b9d65ef)
Bug Fixes
- api: allow scalar window order keys (3d3f4f3)
- backends: make
current_database
implementation and API consistent across all backends (eeeeee0) - bigquery: respect the fully qualified table name at the init (a25f460)
- clickhouse: check dispatching instead of membership in the registry for
has_operation
(acb7f3f) - datafusion: always quote column names to prevent datafusion from normalizing case (310db2b)
- deps: update dependency datafusion to v27 (3a311cd)
- druid: handle conversion issues from string, binary, and timestamp (b632063)
- duckdb: avoid double escaping backslashes for bind parameters (8436f57)
- duckdb: cast read_only to string for connection (27e17d6)
- duckdb: deduplicate results from
list_schemas()
(172520e) - duckdb: ensure that current_database returns the correct value (2039b1e)
- duckdb: handle conversion from duckdb_engine unsigned int aliases (e6fd0cc)
- duckdb: map hugeint to decimal to avoid information loss (4fe91d4)
- duckdb: run pre-execute-hooks in duckdb before file export (5bdaa1d)
- duckdb: use regexp_matches to ensure that matching checks containment instead of a full match (0a0cda6)
- examples: remove example datasets that are incompatible with case-insensitive file systems (4048826)
- exprs: ensure that left_semi and semi are equivalent (bbc1eb7)
- forward arguments through
__dataframe__
protocol (50f3be9) - ir: change "it not a" to "is not a" in errors (d0d463f)
- memtable: implement support for translation of empty memtable (05b02da)
- mysql: fix UUID type reflection for sqlalchemy 2.0.18 (12d4039)
- mysql: pass-through kwargs to connect_args (e3f3e2d)
- ops: ensure that name attribute is always valid for
ops.SelfReference
(9068aca) - polars: ensure that
pivot_longer
works with more than one column (822c912) - polars: fix collect implementation (c1182be)
- postgres: by default use domain socket (e44fdfb)
- pyspark: make
has_operation
method a[@classmethod](https://github.com/classmethod)
(c1b7dbc) - release: use @google/semantic-release-replace-plugin@1.2.0 to avoid module loading bug (673aab3)
- snowflake: fix broken unnest functionality (207587c)
- snowflake: reset the schema and database to the original schema after creating them (54ce26a)
- snowflake: reset to original schema when resetting the database (32ff832)
- snowflake: use
regexp_instr != 0
instead ofREGEXP
keyword (06e2be4) - sqlalchemy: add support for sqlalchemy string subclassed types (8b33b35)
- sql: handle parsing aliases ([3645cf4](3645cf4119620e8b01...
6.0.0
6.0.0 (2023-07-05)
β BREAKING CHANGES
-
imports: Use of
ibis.udf
as a module is removed. Useibis.legacy.udf
instead. -
The minimum supported Python version is now Python 3.9
-
api:
group_by().count()
no longer automatically names the count aggregationcount
. Userelabel
to rename columns. -
backends:
Backend.ast_schema
is removed. Useexpr.as_table().schema()
instead. -
snowflake/postgres: Postgres UDFs now use the new
@udf.scalar.python
API. This should be a low-effort replacement for the existing API. -
ir:
ops.NullLiteral
is removed -
datatypes:
dt.Interval
has no longer a default unit,dt.interval
is removed -
deps:
snowflake-connector-python
's lower bound was increased to 3.0.2, the minimum version needed to avoid a high-severity vulernability. Please upgradesnowflake-connector-python
to at least version 3.0.2. -
api:
Table.difference()
,Table.intersection()
, andTable.union()
now require at least one argument. -
postgres: Ibis no longer automatically defines
first
/last
reductions on connection to the postgres backend. Use DDL shown in https://wiki.postgresql.org/wiki/First/last_(aggregate) or one of thepgxn
implementations instead. -
api:
ibis.examples.<example-name>.fetch
no longer forwards arbitrary keyword arguments toread_csv
/read_parquet
. -
datatypes:
dt.Interval.value_type
attribute is removed -
api:
Table.count()
is no longer automatically named"count"
. UseTable.count().name("count")
to achieve the previous behavior. -
trino: The trino backend now requires at least version 0.321 of the
trino
Python package. -
backends: removed
AlchemyTable
,AlchemyDatabase
,DaskTable
,DaskDatabase
,PandasTable
,PandasDatabase
,PySparkDatabaseTable
, useops.DatabaseTable
instead -
dtypes: temporal unit enums are now available under
ibis.common.temporal
instead ofibis.common.enums
. -
clickhouse:
external_tables
can no longer be passed inibis.clickhouse.connect
. Passexternal_tables
directly inraw_sql
/execute
/to_pyarrow
/to_pyarrow_batches()
. -
datatypes:
dt.Set
is now an alias fordt.Array
-
bigquery: Before this change, ibis timestamp is mapping to Bigquery TIMESTAMP type and no timezone supports. However, it's not correct, BigQuery TIMESTAMP type should have UTC timezone, while DATETIME type is the no timezone version. Hence, this change is breaking the ibis timestamp mapping to BigQuery: If ibis timestamp has the UTC timezone, will map to BigQuery TIMESTAMP type. If ibis timestamp has no timezone, will map to BigQuery DATETIME type.
-
impala: Cursors are no longer returned from DDL operations to prevent resource leakage. Use
raw_sql
if you need specialized operations that return a cursor. Additionally, table-based DDL operations now return the table they're operating on. -
api:
Column.first()
/Column.last()
are now reductions by default. Code running these expressions in isolation will no longer be windowed over the entire table. Code using this function inselect
-based APIs should function unchanged. -
bigquery: when using the bigquery backend, casting float to int
will no longer round floats to the nearest integer -
ops.Hash: The
hash
method on table columns on longer accepts
thehow
argument. The hashing functions available are highly
backend-dependent and the intention of the hash operation is to provide
a fast, consistent (on the same backend, only) integer value.
If you have been passing in a value forhow
, you can remove it and you
will get the same results as before, as there were no backends with
multiple hash functions working. -
duckdb: Some CSV files may now have headers that did not have them previously. Set
header=False
to get the previous behavior. -
deps: New environments will have a different default setting for
compression
in the ClickHouse backend due to removal of optional dependencies. Ibis is still capable of using the optional dependencies but doesn't include them by default. Installclickhouse-cityhash
andlz4
to preserve the previous behavior. -
api:
Table.set_column()
is removed; useTable.mutate(name=expr)
instead -
api: the
suffixes
argument in all join methods has been removed in favor oflname
/rname
args. The default renaming scheme for duplicate columns has also changed. To get the exact same behavior as before, pass inlname="{name}_x", rname="{name}_y"
. -
ir:
IntervalType.unit
is now an enum instead of a string -
type-system: Inferred types of Python objects may be slightly different. Ibis now use
pyarrow
to infer the column types of pandas DataFrame and other types. -
backends:
path
argument ofBackend.connect()
is removed, use thedatabase
argument instead -
api: removed
Table.sort_by()
andTable.groupby()
, use.order_by()
and.group_by()
respectively -
datatypes:
DataType.scalar
andcolumn
class attributes are now strings. -
backends:
Backend.load_data()
,Backend.exists_database()
andBackend.exists_table()
are removed -
ir:
Value.summary()
andNumericValue.summary()
are removed -
schema:
Schema.merge()
is removed, use the union operatorschema1 | schema2
instead -
api:
ibis.sequence()
is removed -
drop support for Python 3.8 (747f4ca)
Features
- add dask windowing (9cb920a)
- add easy type hints to GroupBy (da330b1)
- add microsecond method to TimestampValue and TimeValue (e9df2da)
- api: add
__dataframe__
implementation (b3d9619) - api: add ALL_CAPS option to Table.relabel (c0b30e2)
- api: add first/last reduction APIs (8c01980)
- api: add zip operation and api (fecf695)
- api: allow passing multiple keyword arguments to
ibis.interval
(22ee854) - api: better repr and pickle support for deferred expressions (2b1ec9c)
- api: exact median (c53031c)
- api: raise better error on column name collision in joins (e04c38c)
- api: replace
suffixes
injoin
withlname
/rname
(3caf3a1) - api: support abstract type names in
selectors.of_type
(f6d2d56) - api: support list of strings and single strings in the
across
selector (a6b60e7) - api: use
create_table
to load example data (42e09a4) - bigquery: add client and storage_client params to connect (4cf1354)
- bigquery: enable group_concat over windows (d6a1117)
- cast: add table-level try_cast (5e4d16b)
- clickhouse: add array zip impl (efba835)
- clickhouse: move to clickhouse supported Python client (012557a)
- clickhouse: set default engine to native file (29815fa)
- clickhouse: support pyarrow decimal types (7472dd5)
- common: add a pure python egraph implementation (aed2ed0)
- common: add pattern matchers (b515d5c)
- common: add support for start parameter in StringFind (31ce741)
- common: add Topmost and Innermost pattern matchers (90b48fc)
- common: implement copy protocol for Immutable base class (e61c66b)
- create_table: support pyarrow Table in table creation (9dbb25c)
- datafusion: add string functions (66c0afb)
- datafusion: add support for scalar pyarrow UDFs ([45935b7](45935b78922f09ab...
5.1.0
5.1.0 (2023-04-11)
Features
- api: expand
distinct
API for dropping duplicates based on column subsets (3720ea5) - api: implement pyarrow memtables (9d4fbbd)
- api: support passing a format string to
Table.relabel
(0583959) - api: thread kwargs around properly to support more complex connection arguments (7e0e15b)
- backends: add more array functions (5208801)
- bigquery: make
to_pyarrow_batches()
smarter (42f5987) - bigquery: support bignumeric type (d7c0f49)
- default repr to showing all columns in Jupyter notebooks (91a0811)
- druid: add re_search support (946202b)
- duckdb: add map operations (a4c4e77)
- duckdb: support sqlalchemy 2 (679bb52)
- mssql: implement ops.StandardDev, ops.Variance (e322f1d)
- pandas: support memtable in pandas backend (6e4d621), closes #5467
- polars: implement count distinct (aea4ccd)
- postgres: implement
ops.Arbitrary
(ee8dbab) - pyspark:
pivot_longer
(f600c90) - pyspark: add ArrayFilter operation (2b1301e)
- pyspark: add ArrayMap operation (e2c159c)
- pyspark: add DateDiff operation (bfd6109)
- pyspark: add partial support for interval types (067120d)
- pyspark: add read_csv, read_parquet, and register (7bd22af)
- pyspark: implement count distinct (db29e10)
- pyspark: support basic caching (ab0df7a)
- snowflake: add optional 'connect_args' param (8bf2043)
- snowflake: native pyarrow support (ce3d6a4)
- sqlalchemy: support unknown types (fde79fa)
- sqlite: implement
ops.Arbitrary
(9bcdf77) - sql: use temp views where possible (5b9d8c0)
- table: implement
pivot_wider
API (60e7731) - ux: move
ibis.expr.selectors
toibis.selectors
and deprecate for removal in 6.0 (0ae639d)
Bug Fixes
- api: disambiguate attribute errors from a missing
resolve
method (e12c4df) - api: support filter on literal followed by aggregate (68d65c8)
- clickhouse: do not render aliases when compiling aggregate expression components (46caf3b)
- clickhouse: ensure that clickhouse depends on sqlalchemy for
make_url
usage (ea10a27) - clickhouse: ensure that truncate works (1639914)
- clickhouse: fix
create_table
implementation (5a54489) - clickhouse: workaround sqlglot issue with calling
match
(762f4d6) - deps: support pandas 2.0 (4f1d9fe)
- duckdb: branch to avoid unnecessary dataframe construction (9d5d943)
- duckdb: disable the progress bar by default (1a1892c)
- duckdb: drop use of experimental parallel csv reader (47d8b92)
- duckdb: generate
SIMILAR TO
instead of tilde to workaround sqlglot issue (434da27) - improve typing signature of .dropna() (e11de3f)
- mssql: improve aggregation on expressions (58aa78d)
- mssql: remove invalid aggregations (1ce3ef9)
- polars: backwards compatibility for the
time_zone
andtime_unit
properties (3a2c4df) - postgres: allow inference of unknown types (343fb37)
- pyspark: fail when aggregation contains a
having
filter (bd81a9f) - pyspark: raise proper error when trying to generate sql (51afc13)
- snowflake: fix new array operations; remove
ArrayRemove
operation (772668b) - snowflake: make sure ephemeral tables following backend quoting rules (9a845df)
- snowflake: make sure pyarrow is used when possible (01f5154)
- sql: ensure that set operations resolve to a single relation (3a02965)
- sql: generate consistent
pivot_longer
semantics in the presence of multipleunnest
s (6bc301a) - sqlglot: work with newer versions (6f7302d)
- trino,duckdb,postgres: make cumulative
notany
/notall
aggregations work (c2e985f) - trino: only support
how='first'
witharbitrary
reduction (315b5e7) - ux: use guaranteed length-1 characters for
NULL
values (8618789)
Refactors
- api: remove explicit use of
.projection
in favor of the shorter.select
(73df8df) - cache: factor out ref counted cache (c816f00)
- duckdb: simplify
to_pyarrow_batches
implementation (d6235ee) - duckdb: source loaded and installed extensions from duckdb (fb06262)
- duckdb: use native duckdb parquet reader unless auth required (e9f57eb)
- generate uuid-based names for temp tables ([a1164df](a1164df5d1bc4fa454371626a05...
5.0.0
5.0.0 (2023-03-15)
β BREAKING CHANGES
- api: Snowflake identifiers are now kept as is from the database. Many table names and column names may now be in SHOUTING CASE. Adjust code accordingly.
- backend: Backends now raise
ibis.common.exceptions.UnsupportedOperationError
in more places during compilation. You may need to catch this error type instead of the previous type, which differed between backends. - ux:
Table.info
now returns an expression - ux: Passing a sequence of column names to
Table.drop
is removed. Replacedrop(cols)
withdrop(*cols)
. - The
spark
plugin alias is removed. Usepyspark
instead - ir: removed
ibis.expr.scope
andibis.expr.timecontext
modules, access them underibis.backends.base.df.<module>
- some methods have been removed from the top-level
ibis.<backend>
namespaces, access them on a connected backend instance instead. - common: removed
ibis.common.geospatial
, import the functions fromibis.backends.base.sql.registry.geospatial
- datatypes:
JSON
is no longer a subtype ofString
- datatype:
Category
,CategoryValue
/Column
/Scalar
are removed. Use string types instead. - ux: The
metric_name
argument tovalue_counts
is removed. UseTable.relabel
to change the metric column's name. - deps: the minimum version of
parsy
is now 2.0 - ir/backends: removed the following symbols:
ibis.backends.duckdb.parse_type()
functionibis.backends.impala.Backend.set_database()
methodibis.backends.pyspark.Backend.set_database()
methodibis.backends.impala.ImpalaConnection.ping()
methodibis.expr.operations.DatabaseTable.change_name()
methodibis.expr.operations.ParseURL
classibis.expr.operations.Value.to_projection()
methodibis.expr.types.Table.get_column()
methodibis.expr.types.Table.get_columns()
methodibis.expr.types.StringValue.parse_url()
method
- schema:
Schema.from_dict()
,.delete()
and.append()
methods are removed - datatype:
struct_type.pairs
is removed, usestruct_type.fields
instead - datatype:
Struct(names, types)
is not supported anymore, pass a dictionary toStruct
constructor instead
Features
- add
max_columns
option for table repr (a3aa236) - add examples API (b62356e)
- api: add
map
/array
accessors for easy conversion of JSON to stronger-typed values (d1e9d11) - api: add array to string join operation (74de349)
- api: add builtin support for relabeling columns to snake case (1157273)
- api: add support for passing a mapping to
ibis.map
(d365fd4) - api: allow single argument set operations (bb0a6f0)
- api: implement
to_pandas()
API for ecosystem compatibility (cad316c) - api: implement isin (ac31db2)
- api: make
cache
evaluate only once per session per expression (5a8ffe9) - api: make create_table uniform (833c698)
- api: more selectors (5844304)
- api: upcast pandas DataFrames to memtables in
rlz.table
rule (8dcfb8d) - backends: implement
ops.Time
for sqlalchemy backends (713cd33) - bigquery: add
BIGNUMERIC
type support (5c98ea4) - bigquery: add UUID literal support (ac47c62)
- bigquery: enable subqueries in select statements (ef4dc86)
- bigquery: implement create and drop table method (5f3c22c)
- bigquery: implement create_view and drop_view method (a586473)
- bigquery: support creating tables from in-memory tables (c3a25f1)
- bigquery: support in-memory tables (37e3279)
- change Rich repr of dtypes from blue to dim (008311f)
- clickhouse: implement
ArrayFilter
translation (f2144b6) - clickhouse: implement
ops.ArrayMap
(45000e7) - clickhouse: implement
ops.MapLength
(fc82eaa) - clickhouse: implement ops.Capitalize (914c64c)
- clickhouse: implement ops.ExtractMillisecond (ee74e3a)
- clickhouse: implement ops.RandomScalar (104aeed)
- clickhouse: implement ops.StringAscii (a507d17)
- clickhouse: implement ops.TimestampFromYMDHMS, ops.DateFromYMD (05f5ae5)
- clickhouse: improve error message for invalid types in literal (e4d7799)
- clickhouse: support asof_join (7ed5143)
- common: add abstract mapping collection with support for set operations (7d4aa0f)
- common: add support for variadic positional and variadic keyword annotations (baea1fa)
- common: hold typehint in the annotation objects (b3601c6)
- common: support
Callable
arguments and return types inValidator.from_annotable()
(ae57c36) - common: support positional only and keyword only arguments in annotations (340dca1)
- dask/pandas: raise OperationNotDefinedError exc for not defined operations (2833685)
- datafusion: implement ops.Degress, ops.Radians (7e61391)
- datafusion: implement ops.Exp (7cb3ade)
- datafusion: implement ops.Pi, ops.E (5a74cb4)
- datafusion: implement ops.RandomScalar (5d1cd0f)
- datafusion: implement ops.StartsWith (8099014)
- datafusion: implement ops.StringAscii (b1d7672)
- datafusion: implement ops.StrRight (016a082)
- datafusion: implement ops.Translate (2fe3fc4)
- datafusion: support substr without end (a19fd87)
- datatype/schema: support datatype and schema declaration using type annotated classes (6722c31)
- datatype: enable inference of
Decimal
type (8761732) - datatype: implement
Mapping
abstract base class forStructType
(5df2022) - deps: add Python 3.11 support and tests ([6f3f759](https://github.com/ibis-project/ibis/commit...
4.1.0
4.1.0 (2023-01-25)
Features
- add
ibis.get_backend
function (2d27df8) - add py.typed to allow mypy to type check packages that use ibis (765d42e)
- api: add
ibis.set_backend
function (e7fabaf) - api: add selectors for easier selection of columns (306bc88)
- bigquery: add JS UDF support (e74328b)
- bigquery: add SQL UDF support (db24173)
- bigquery: add to_pyarrow method (30157c5)
- bigquery: implement bitwise operations (55b69b1)
- bigquery: implement ops.Typeof (b219919)
- bigquery: implement ops.ZeroIfNull (f4c5607)
- bigquery: implement struct literal (c5f2a1d)
- clickhouse: properly support native boolean types (31cc7ba)
- common: add support for annotating with coercible types (ae4a415)
- common: make frozendict truly immutable (1c25213)
- common: support annotations with typing.Literal (6f89f0b)
- common: support generic mapping and sequence type annotations (ddc6603)
- dask: support
connect()
with no arguments (67eed42) - datatype: add optional timestamp scale parameter (a38115a)
- datatypes: add
as_struct
method to convert schemas to structs (64be7b1) - duckdb: add
read_json
function for consuming newline-delimited JSON files (65e65c1) - mssql: add a bunch of missing types (c698d35)
- mssql: implement inference for
DATETIME2
andDATETIMEOFFSET
(aa9f151) - nicer repr for Backend.tables (0d319ca)
- pandas: support
connect()
with no arguments (78cbbdd) - polars: allow ibis.polars.connect() to function without any arguments (d653a07)
- polars: handle casting to scaled timestamps (099d1ec)
- postgres: add
Map(string, string)
support via the built-inHSTORE
extension (f968f8f) - pyarrow: support conversion to pyarrow map and struct types (54a4557)
- snowflake: add more array operations (8d8bb70)
- snowflake: add more map operations (7ae6e25)
- snowflake: any/all/notany/notall reductions (ba1af5e)
- snowflake: bitwise reductions (5aba997)
- snowflake: date from ymd (035f856)
- snowflake: fix array slicing (bd7af2a)
- snowflake: implement
ArrayCollect
(c425f68) - snowflake: implement
NthValue
(0dca57c) - snowflake: implement
ops.Arbitrary
(45f4f05) - snowflake: implement
ops.StructColumn
(41698ed) - snowflake: implement
StringSplit
(e6acc09) - snowflake: implement
StructField
and struct literals (286a5c3) - snowflake: implement
TimestampFromUNIX
(314637d) - snowflake: implement
TimestampFromYMDHMS
(1eba8be) - snowflake: implement
typeof
operation (029499c) - snowflake: implement exists/not exists (7c8363b)
- snowflake: implement extract millisecond (3292e91)
- snowflake: make literal maps and params work (dd759d3)
- snowflake: regex extract, search and replace (9c82179)
- snowflake: string to timestamp (095ded6)
- sqlite: implement
_get_schema_using_query
in SQLite backend (7ff84c8) - trino: compile timestamp types with scale (67683d3)
- trino: enable
ops.ExistsSubquery
andops.NotExistsSubquery
(9b9b315) - trino: map parameters (53bd910)
- ux: improve error message when column is not found (b527506)
Bug Fixes
- backend: read the default backend setting in
_default_backend
(11252af) - bigquery: move connection logic to do_connect (42f2106)
- bigquery: remove invalid operations from registry (911a080)
- bigquery: resolve deprecation warnings for
StructType
andSchema
(c9e7078) - clickhouse: fix position call (702de5d)
- correctly visualize array type (26b0b3f)
- deps: make sure pyarrow is not an implicit dependency (10373f4)
- duckdb: make
read_csv
on URLs work (9e61816) - duckdb: only try to load extensions when necessary for csv (c77bde7)
- duckdb: remove invalid operations from registry (ba2ec59)
- fallback to default backend with
to_pyarrow
/to_pyarrow_batches
(a1a6902) - impala: remove broken alias elision (32b120f)
- ir: error for
order_by
on nonexistent column (57b1dd8) - ir: ops.Where output shape should consider all arguments (...
4.0.0
4.0.0 (2023-01-09)
β BREAKING CHANGES
- functions, methods and classes marked as deprecated are removed now
- ir: replace
HLLCardinality
withApproxCountDistinct
andCMSMedian
withApproxMedian
operations. - backends: the datatype of returned execution results now more closely matches that of the ibis expression's type. Downstream code may need to be adjusted.
- ir: the
JSONB
type is replaced by theJSON
type. - dev-deps: expression types have been removed from
ibis.expr.api
. Useimport ibis.expr.types as ir
to access these types. - common: removed
@immutable_property
decorator, use@attribute.default
instead - timestamps: the
timezone
argument toto_timestamp
is gone. This was only supported in the BigQuery backend. Append%Z
to the format string and the desired time zone to the input column if necessary. - deps: ibis now supports at minimum duckdb 0.3.3. Please upgrade your duckdb install as needed.
- api: previously
ibis.connect
would return aTable
object when callingconnect
on a parquet/csv file. This now returns a backend containing a single table created from that file. When possible users may useibis.read
instead to read files into ibis tables. - api:
histogram()
'sclosed
argument no longer exists because it never had any effect. Remove it from yourhistogram
method calls. - pandas/dask: the Pandas and Dask backends now interpret casting ints to/from timestamps as seconds since the unix epoch, matching other backends.
- datafusion:
register_csv
andregister_parquet
are removed. Pass filename toregister
method instead. - ir:
ops.NodeList
andir.List
are removed. Use tuples to represent sequence of expressions instead. - api:
re_extract
now followsre.match
behavior. In particular, the0
th group is now the entire string if there's a match, otherwise the groups are 1-based. - datatypes: enums are now strings. Likely no action needed since no functionality existed.
- ir: Replace
t[t.x.topk(...)]
witht.semi_join(t.x.topk(...), "x")
. - ir:
ir.Analytic.type()
andir.TopK.type()
methods are removed. - api: the default limit for table/column expressions is now
None
(meaning no limit). - ir: join changes: previously all column names that collided between
left
andright
tables were renamed with an appended suffix. Now for the case of inner joins with only equality predicates, colliding columns that are known to be equal due to the join predicates aren't renamed. - impala: kerberos support is no longer installed by default for the
impala
backend. To add support you'll need to install thekerberos
package separately. - ir:
ops.DeferredSortKey
is removed. Useops.SortKey
directly instead. - ir:
ibis.common.grounds.Annotable
is mutable by default now - ir:
node.has_resolved_name()
is removed, useisinstance(node, ops.Named)
instead;node.resolve_name()
is removed usenode.name
instead - ir: removed
ops.Node.flat_args()
, directly usenode.args
property instead - ir: removed
ops.Node.inputs
property, use the multipledispatchedget_node_arguments()
function in the pandas backend - ir:
Node.blocks()
method has been removed. - ir:
HasSchema
mixin class is no longer available, directly subclassops.TableNode
and implement schema property instead - ir: Removed
Node.output_type
property in favor of abstractmethodNode.to_expr()
which now must be explicitly implemented - ir:
Expr(Op(Expr(Op(Expr(Op)))))
is now represented asExpr(Op(Op(Op)))
, so code using ibis internals must be migrated - pandas: Use timezone conversion functions to compute the original machine localized value
- common: use
ibis.common.validators.{Patameter, Signature}
instead - ir:
ibis.expr.lineage.lineage()
is now removed - ir: removed
ir.DestructValue
,ir.DestructScalar
andir.DestructColumn
, usetable.unpack()
instead - ir: removed
Node.root_tables()
method, useibis.expr.analysis.find_immediate_parent_tables()
instead - impala: use other methods for pinging the database
Features
- add experimental decorator (791335f)
- add to_pyarrow and to_pyarrow_batches (a059cf9)
- add unbind method to expressions (4b91b0b), closes #4536
- add way to specify sqlglot dialect on backend (f1c0608)
- alchemy: implement json getitem for sqlalchemy backends (7384087)
- api: add
agg
alias foraggregate
(907583f) - api: add
agg
alias togroup_by
(6b6367c) - api: add
ibis.read
top level API function (e67132c) - api: add JSON
__getitem__
operation (3e2efb4) - api: implement
__array__
(1402347) - api: make
drop
variadic (1d69702) - api: return object from
to_sql
to support notebook syntax highlighting (87c9833) - api: use
rich
for interactive__repr__
(04758b8) - backend: make
ArrayCollect
filterable (1e1a5cf) - backends/mssql: add backend support for Microsoft Sql Server (fc39323)
- bigquery: add ops.DateFromYMD, ops.TimeFromHMS, ops.TimestampFromYMDHMS (a4a7936)
- bigquery: add ops.ExtractDayOfYear (30c547a)
- bigquery: add support for correlation (4df9f8b)
- bigquery: implement
argmin
andargmax
(40c5f0d) - bigquery: implement
pi
ande
(b91370a) - bigquery: implement array repeat (09d1e2f)
- bigquery: implement JSON getitem functionality (9c0e775)
- bigquery: implement ops.ArraySlice (49414ef)
- bigquery: implement ops.Capitalize (5757bb0)
- bigquery: implement ops.Clip (5495d6d)
- bigquery: implement ops.Degrees, ops.Radians (5119b93)
- bigquery: implement ops.ExtractWeekOfYear (477d287)
- bigquery: implement ops.RandomScalar (5dc8482)
- bigquery: implement ops.StructColumn, ops.ArrayColumn (2bbf73c)
- bigquery: implement ops.Translate (77a4b3e)
- bigquery: implementt ops.NthValue (b43ba28)
- bigquery: move bigquery backend back into the main repo (cd5e881)
- clickhouse: handle more options in
parse_url
implementation (874c5c0) - clickhouse: implement
INTERSECT ALL
/EXCEPT ALL
(f65fbc3) - clickhouse: implement quantile/multiquantile (96d7d1b)
- common: support function annotations with both typehints and rules (7e23f3e)
- dask: implement
mode
aggregation (017f07a) - dask: implement json getitem (381d805)
- datafusion: convert column expressions to...
3.2.0
3.2.0 (2022-09-15)
Features
- add api to get backend entry points (0152f5e)
- api: add
and_
andor_
helpers (94bd4df) - api: add
argmax
andargmin
column methods (b52216a) - api: add
distinct
toIntersection
andDifference
operations (cd9a34c) - api: add
ibis.memtable
API for constructing in-memory table expressions (0cc6948) - api: add
ibis.sql
to easily get a formatted SQL string (d971cc3) - api: add
Table.unpack()
andStructValue.lift()
APIs for projecting struct fields (ced5f53) - api: allow transmute-style select method (d5fc364)
- api: implement all bitwise operators (7fc5073)
- api: promote
psql
to ashow_sql
public API (877a05d) - clickhouse: add dataframe external table support for memtables (bc86aa7)
- clickhouse: add enum, ipaddr, json, lowcardinality to type parser (8f0287f)
- clickhouse: enable support for working window functions (310a5a8)
- clickhouse: implement
argmin
andargmax
(ee7c878) - clickhouse: implement bitwise operations (348cd08)
- clickhouse: implement struct scalars (1f3efe9)
- dask: implement
StringReplace
execution (1389f4b) - dask: implement ungrouped
argmin
andargmax
(854aea7) - deps: support duckdb 0.5.0 (47165b2)
- duckdb: handle query parameters in
ibis.connect
(fbde95d) - duckdb: implement
argmin
andargmax
(abf03f1) - duckdb: implement bitwise xor (ca3abed)
- duckdb: register tables from pandas/pyarrow objects (36e48cc)
- duckdb: support unsigned integer types (2e67918)
- impala: implement bitwise operations (c5302ab)
- implement dropna for SQL backends (8a747fb)
- log: make BaseSQLBackend._log print by default (12de5bb)
- mysql: register BLOB types (1e4fb92)
- pandas: implement
argmin
andargmax
(bf9b948) - pandas: implement
NotContains
on grouped data (976dce7) - pandas: implement
StringReplace
execution (578795f) - pandas: implement Contains with a group by (c534848)
- postgres: implement bitwise xor (9b1ebf5)
- pyspark: add option to treat nan as null in aggregations (bf47250)
- pyspark: implement
ibis.connect
for pyspark (a191744) - pyspark: implement
Intersection
andDifference
(9845a3c) - pyspark: implement bitwise operators (33cadb1)
- sqlalchemy: implement bitwise operator translation (bd9f64c)
- sqlalchemy: make
ibis.connect
with sqlalchemy backends (b6cefb9) - sqlalchemy: properly implement
Intersection
andDifference
(2bc0b69) - sql: implement
StringReplace
translation (29daa32) - sqlite: implement bitwise xor and bitwise not (58c42f9)
- support
table.sort_by(ibis.random())
(693005d) - type-system: infer pandas' string dtype (5f0eb5d)
- ux: add duckdb as the default backend (8ccb81d)
- ux: use
rich
to formatTable.info()
output (67234c3) - ux: use
sqlglot
for pretty printing SQL (a3c81c5) - variadic union, intersect, & difference functions (05aca5a)
Bug Fixes
- api: make sure column names that are already inferred are not overwritten (6f1cb16)
- api: support deferred objects in existing API functions (241ce6a)
- backend: ensure that chained limits respect prior limits (02a04f5)
- backends: ensure select after filter works (e58ca73)
- backends: only recommend installing ibis-foo when foo is a known backend (ac6974a)
- base-sql: fix String-generating backend string concat implementation (3cf78c1)
- clickhouse: add IPv4/IPv6 literal inference (0a2f315)
- clickhouse: cast repeat
times
argument toUInt64
(b643544) - clickhouse: fix listing tables from databases with no tables (08900c3)
- compilers: make sure memtable rows have names in the SQL string compilers (18e7f95)
- compiler: use
repr
for SQL stringVALUES
data (75af658) - dask: ensure predicates are computed before projections (5cd70e1)
- dask: implement timestamp-date binary comparisons (48d5058)
- dask: set dask upper bound due to large scale test breakage (796c645), closes #9221
- decimal: add decimal type inference (3fe3fd8)
- deps: update dependency duckdb-engine to >=0.1.8,<0.4.0 (113dc8f)
- deps: update dependency duckdb-engine t...
3.1.0
3.1.0 (2022-07-26)
Features
- add
__getattr__
support toStructValue
(75bded1) - allow selection subclasses to define new node args (2a7dc41)
- api: accept
Schema
objects in publicibis.schema
(0daac6c) - api: add
.tables
accessor toBaseBackend
(7ad27f0) - api: add
e
function to public API (3a07e70) - api: add
ops.StructColumn
operation (020bfdb) - api: add cume_dist operation (6b6b185)
- api: add toplevel ibis.connect() (e13946b)
- api: handle literal timestamps with timezone embedded in string (1ae976b)
- api: ibis.connect() default to duckdb for parquet/csv extensions (ff2f088)
- api: make struct metadata more convenient to access (3fd9bd8)
- api: support tab completion for backends (eb75fc5)
- api: underscore convenience api (81716da)
- api: unnest (98ecb09)
- backends: allow column expressions from non-foreign tables on the right side of
isin
/notin
(e1374a4) - base-sql: implement trig and math functions (addb2c1)
- clickhouse: add ability to pass arbitrary kwargs to Clickhouse do_connect (583f599)
- clickhouse: implement
ops.StructColumn
operation (0063007) - clickhouse: implement array collect (8b2577d)
- clickhouse: implement ArrayColumn (1301f18)
- clickhouse: implement bit aggs (f94a5d2)
- clickhouse: implement clip (12dfe50)
- clickhouse: implement covariance and correlation (a37c155)
- clickhouse: implement degrees (7946c0f)
- clickhouse: implement proper type serialization (80f4ab9)
- clickhouse: implement radians (c7b7f08)
- clickhouse: implement strftime (222f2b5)
- clickhouse: implement struct field access (fff69f3)
- clickhouse: implement trig and math functions (c56440a)
- clickhouse: support subsecond timestamp literals (e8698a6)
- compiler: restore
intersect_class
anddifference_class
overrides in base SQL backend (2c46a15) - dask: implement trig functions (e4086bb)
- dask: implement zeroifnull (38487db)
- datafusion: implement negate (69dd64d)
- datafusion: implement trig functions (16803e1)
- duckdb: add register method to duckdb backend to load parquet and csv files (4ccc6fc)
- duckdb: enable find_in_set test (377023d)
- duckdb: enable group_concat test (4b9ad6c)
- duckdb: implement
ops.StructColumn
operation (211bfab) - duckdb: implement approx_count_distinct (03c89ad)
- duckdb: implement approx_median (894ce90)
- duckdb: implement arbitrary first and last aggregation (8a500bc)
- duckdb: implement NthValue (1bf2842)
- duckdb: implement strftime (aebc252)
- duckdb: return the
ir.Table
instance from DuckDB'sregister
API (0d05d41) - mysql: implement FindInSet (e55bbbf)
- mysql: implement StringToTimestamp (169250f)
- pandas: implement bitwise aggregations (37ff328)
- pandas: implement degrees (25b4f69)
- pandas: implement radians (6816b75)
- pandas: implement trig functions (1fd52d2)
- pandas: implement zeroifnull (48e8ed1)
- postgres/duckdb: implement covariance and correlation (464d3ef)
- postgres: implement ArrayColumn (7b0a506)
- pyspark: implement approx_count_distinct (1fe1d75)
- pyspark: implement approx_median (07571a9)
- pyspark: implement covariance and correlation (ae818fb)
- pyspark: implement degrees (f478c7c)
- pyspark: implement nth_value (abb559d)
- pyspark: implement nullifzero (640234b)
- pyspark: implement radians (18843c0)
- pyspark: implement trig functions (fd7621a)
- pyspark: implement Where (32b9abb)
- pyspark: implement xor (550b35b)
- pyspark: implement zeroifnull (db13241)
- pyspark: topk support (9344591)
- sqlalchemy: add degrees and radians (8b7415f)
- sqlalchemy: add xor translation rule (2921664)
- sqlalchemy: allow non-primitive arrays ([4e02918](4e02918...