-
Notifications
You must be signed in to change notification settings - Fork 609
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
14 changed files
with
185 additions
and
306 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,3 @@ | ||
::: {.callout-warning} | ||
The Polars backend is experimental and is subject to backwards incompatible changes. | ||
This backend is experimental and is subject to backwards incompatible changes. | ||
::: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,71 @@ | ||
# Chain expressions (`ibis._`) | ||
# Chaining expressions | ||
|
||
## Prerequisites | ||
Expressions can easily be chained using the deferred expression API, also known as the Underscore (`_`) API. | ||
|
||
An Ibis table. | ||
In this guide, we use the `_` API to concisely create column expressions and then chain table expressions. | ||
|
||
## Setup | ||
|
||
To get started, import `_` from ibis: | ||
|
||
```{python} | ||
import ibis | ||
from ibis import _ | ||
import pandas as pd | ||
``` | ||
|
||
Let's create two in-memory tables using [`ibis.memtable`](../external-dataframes/memtable_join.qmd), an API introduced in 3.2: | ||
|
||
```{python} | ||
df1 = pd.DataFrame({'x': range(5), 'y': list('ab')*2 + list('e')}) | ||
t1 = ibis.memtable(df1) | ||
df2 = pd.DataFrame({'x': range(10), 'z': list(reversed(list('ab')*2 + list('e')))*2}) | ||
t2 = ibis.memtable(df2) | ||
``` | ||
|
||
## Creating column expressions | ||
|
||
We can use `_` to create new column expressions without explicit reference to the previous table expression: | ||
|
||
```{python} | ||
# We can pass a deferred expression into a function: | ||
def modf(t): | ||
return t.x % 3 | ||
xmod = modf(_) | ||
# We can create ColumnExprs like aggregate expressions: | ||
ymax = _.y.max() | ||
zmax = _.z.max() | ||
zct = _.z.count() | ||
``` | ||
|
||
## Chaining Ibis expressions | ||
|
||
We can also use it to chain Ibis expressions in one Python expression: | ||
|
||
```{python} | ||
join = ( | ||
t1 | ||
# _ is t1 | ||
.join(t2, _.x == t2.x) | ||
# _ is the join result: | ||
.mutate(xmod=xmod) | ||
# _ is the TableExpression after mutate: | ||
.group_by(_.xmod) | ||
# `ct` is a ColumnExpression derived from a deferred expression: | ||
.aggregate(ymax=ymax, zmax=zmax) | ||
# _ is the aggregation result: | ||
.filter(_.ymax == _.zmax) | ||
# _ is the filtered result, and re-create xmod in t2 using modf: | ||
.join(t2, _.xmod == modf(t2)) | ||
# _ is the second join result: | ||
.join(t1, _.xmod == modf(t1)) | ||
# _ is the third join result: | ||
.select(_.x, _.y, _.z) | ||
# Finally, _ is the selection result: | ||
.order_by(_.x) | ||
) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,109 @@ | ||
# pandas `DataFrame`s | ||
|
||
You might have an in-memory DataFrame that you want to join to an ibis table expression. | ||
|
||
For example, you might have a file on your local machine that you don't want to upload to | ||
your backend, but you need to join it to a table in that backend. | ||
|
||
You can perform joins on local data to ibis expressions from your backend using ibis `memtable`s. | ||
|
||
In this guide, you will learn how to work effectively use pandas DataFrames with ibis. | ||
|
||
## Setup | ||
|
||
In this example, we will create two DataFrames: | ||
|
||
* One containing events | ||
* One containing event names | ||
|
||
We will save the events to a parquet file and read that as an ibis expression | ||
using the DuckDB backend. | ||
|
||
We will then convert the event names pandas `DataFrame` to an ibis `memtable`, | ||
and join the two expressions together. | ||
|
||
First, we'll start off by working only with pandas DataFrames. | ||
|
||
```{python} | ||
import pandas as pd | ||
from datetime import date | ||
# create a pandas DataFrame that we will convert to a | ||
# PandasInMemoryTable (Ibis MemTable) | ||
events = pd.DataFrame( | ||
{ | ||
'event_id': range(4), | ||
'event_name': [f'e{k}' for k in range(4)], | ||
} | ||
) | ||
``` | ||
|
||
Next, let's create some measurement data that we'll write to an Apache Parquet file. | ||
|
||
```{python} | ||
# Create a parquet file that we will read in using the DuckDB backend | ||
# as a TableExpression | ||
measures = pd.DataFrame({ | ||
"event_id": [0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3], | ||
"measured_on": map( | ||
date, | ||
[2021] * 12, | ||
[6, 6, 6, 6, 5, 5, 5, 5, 5, 5, 7, 7], | ||
range(1, 13), | ||
), | ||
"measurement": None | ||
}) | ||
measures.loc[[1, 4, 5, 7], "measurement"] = [5.0, 42.0, 42.0, 11.0] | ||
measures.head() | ||
``` | ||
|
||
Let's save `measures` to Parquet. | ||
|
||
```{python} | ||
measures.to_parquet('measures.parquet') | ||
``` | ||
|
||
Now let's create an in-memory DuckDB backend with ibis and turn on [interactive mode](../configure/basics.qmd#interactive-mode). | ||
|
||
```{python} | ||
#| echo: false | ||
import ibis | ||
``` | ||
|
||
```{python} | ||
import ibis | ||
ibis.options.interactive = True | ||
con = ibis.connect('duckdb://') | ||
measures = con.read_parquet("measures.parquet") | ||
measures | ||
``` | ||
|
||
Converting a pandas `DataFrame` to an ibis expression is as simple as feeding it to `ibis.memtable`: | ||
|
||
```{python} | ||
mem_events = ibis.memtable(events) | ||
mem_events | ||
``` | ||
|
||
and joining is the same as joining any two table expressions: | ||
|
||
```{python} | ||
joined = measures.join(mem_events, "event_id") | ||
joined | ||
``` | ||
|
||
For maximum convenience, you can avoid calling `ibis.memtable(events)` and | ||
pass in the `events` `DataFrame` as the right hand side of the join: | ||
|
||
```{python} | ||
joined = measures.join(events, "event_id") | ||
joined | ||
``` | ||
|
||
In this case, Ibis is calling `ibis.memtable(events)` for you. |
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.