Skip to content

Commit

Permalink
chore: clean up a bunch of content
Browse files Browse the repository at this point in the history
  • Loading branch information
cpcloud committed Aug 29, 2023
1 parent 0d84e08 commit b0a4401
Show file tree
Hide file tree
Showing 14 changed files with 185 additions and 306 deletions.
2 changes: 1 addition & 1 deletion .releaserc.js
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ module.exports = {
[
"@semantic-release/changelog",
{
changelogTitle: "Release Notes\n---",
changelogTitle: "Release notes\n---",
changelogFile: "docs2/release_notes.qmd",
},
],
Expand Down
2 changes: 1 addition & 1 deletion docs2/_callouts/experimental_backend.qmd
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
::: {.callout-warning}
The Polars backend is experimental and is subject to backwards incompatible changes.
This backend is experimental and is subject to backwards incompatible changes.
:::
1 change: 1 addition & 0 deletions docs2/_quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@ website:
- auto: "how-to/input-output"
- auto: "how-to/analytics"
- auto: "how-to/visualization"
- auto: "how-to/external-dataframes"
- auto: "how-to/old"
- id: community
title: "Community"
Expand Down
18 changes: 0 additions & 18 deletions docs2/backends/memtable-template.md

This file was deleted.

78 changes: 0 additions & 78 deletions docs2/backends/template.md

This file was deleted.

72 changes: 69 additions & 3 deletions docs2/how-to/analytics/chain_expressions.qmd
Original file line number Diff line number Diff line change
@@ -1,5 +1,71 @@
# Chain expressions (`ibis._`)
# Chaining expressions

## Prerequisites
Expressions can easily be chained using the deferred expression API, also known as the Underscore (`_`) API.

An Ibis table.
In this guide, we use the `_` API to concisely create column expressions and then chain table expressions.

## Setup

To get started, import `_` from ibis:

```{python}
import ibis
from ibis import _
import pandas as pd
```

Let's create two in-memory tables using [`ibis.memtable`](../external-dataframes/memtable_join.qmd), an API introduced in 3.2:

```{python}
df1 = pd.DataFrame({'x': range(5), 'y': list('ab')*2 + list('e')})
t1 = ibis.memtable(df1)
df2 = pd.DataFrame({'x': range(10), 'z': list(reversed(list('ab')*2 + list('e')))*2})
t2 = ibis.memtable(df2)
```

## Creating column expressions

We can use `_` to create new column expressions without explicit reference to the previous table expression:

```{python}
# We can pass a deferred expression into a function:
def modf(t):
return t.x % 3
xmod = modf(_)
# We can create ColumnExprs like aggregate expressions:
ymax = _.y.max()
zmax = _.z.max()
zct = _.z.count()
```

## Chaining Ibis expressions

We can also use it to chain Ibis expressions in one Python expression:

```{python}
join = (
t1
# _ is t1
.join(t2, _.x == t2.x)
# _ is the join result:
.mutate(xmod=xmod)
# _ is the TableExpression after mutate:
.group_by(_.xmod)
# `ct` is a ColumnExpression derived from a deferred expression:
.aggregate(ymax=ymax, zmax=zmax)
# _ is the aggregation result:
.filter(_.ymax == _.zmax)
# _ is the filtered result, and re-create xmod in t2 using modf:
.join(t2, _.xmod == modf(t2))
# _ is the second join result:
.join(t1, _.xmod == modf(t1))
# _ is the third join result:
.select(_.x, _.y, _.z)
# Finally, _ is the selection result:
.order_by(_.x)
)
```
2 changes: 1 addition & 1 deletion docs2/how-to/configure/basics.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Ibis configuration happens through the `ibis.options` attribute. Attributes can

## Interactive mode

Ibis out of the box is in _deffered mode_. Expressions display their internal details when printed to the console.
Ibis out of the box is in *deferred mode*. Expressions display their internal details when printed to the console.

```{python}
t.head(3)
Expand Down
109 changes: 109 additions & 0 deletions docs2/how-to/external-dataframes/memtable_join.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
# pandas `DataFrame`s

You might have an in-memory DataFrame that you want to join to an ibis table expression.

For example, you might have a file on your local machine that you don't want to upload to
your backend, but you need to join it to a table in that backend.

You can perform joins on local data to ibis expressions from your backend using ibis `memtable`s.

In this guide, you will learn how to work effectively use pandas DataFrames with ibis.

## Setup

In this example, we will create two DataFrames:

* One containing events
* One containing event names

We will save the events to a parquet file and read that as an ibis expression
using the DuckDB backend.

We will then convert the event names pandas `DataFrame` to an ibis `memtable`,
and join the two expressions together.

First, we'll start off by working only with pandas DataFrames.

```{python}
import pandas as pd
from datetime import date
# create a pandas DataFrame that we will convert to a
# PandasInMemoryTable (Ibis MemTable)
events = pd.DataFrame(
{
'event_id': range(4),
'event_name': [f'e{k}' for k in range(4)],
}
)
```

Next, let's create some measurement data that we'll write to an Apache Parquet file.

```{python}
# Create a parquet file that we will read in using the DuckDB backend
# as a TableExpression
measures = pd.DataFrame({
"event_id": [0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3],
"measured_on": map(
date,
[2021] * 12,
[6, 6, 6, 6, 5, 5, 5, 5, 5, 5, 7, 7],
range(1, 13),
),
"measurement": None
})
measures.loc[[1, 4, 5, 7], "measurement"] = [5.0, 42.0, 42.0, 11.0]
measures.head()
```

Let's save `measures` to Parquet.

```{python}
measures.to_parquet('measures.parquet')
```

Now let's create an in-memory DuckDB backend with ibis and turn on [interactive mode](../configure/basics.qmd#interactive-mode).

```{python}
#| echo: false
import ibis
```

```{python}
import ibis
ibis.options.interactive = True
con = ibis.connect('duckdb://')
measures = con.read_parquet("measures.parquet")
measures
```

Converting a pandas `DataFrame` to an ibis expression is as simple as feeding it to `ibis.memtable`:

```{python}
mem_events = ibis.memtable(events)
mem_events
```

and joining is the same as joining any two table expressions:

```{python}
joined = measures.join(mem_events, "event_id")
joined
```

For maximum convenience, you can avoid calling `ibis.memtable(events)` and
pass in the `events` `DataFrame` as the right hand side of the join:

```{python}
joined = measures.join(events, "event_id")
joined
```

In this case, Ibis is calling `ibis.memtable(events)` for you.
81 changes: 0 additions & 81 deletions docs2/how-to/old/memtable_join.qmd

This file was deleted.

Loading

0 comments on commit b0a4401

Please sign in to comment.