Skip to content

Commit

Permalink
docs: clean up extending tutorials
Browse files Browse the repository at this point in the history
  • Loading branch information
cpcloud committed Sep 7, 2023
1 parent 0bad961 commit 8da58d4
Show file tree
Hide file tree
Showing 2 changed files with 44 additions and 52 deletions.
35 changes: 14 additions & 21 deletions docs/how-to/extending/elementwise.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,15 @@ This notebook will show you how to add a new elementwise operation to an existin

We are going to add `julianday`, a function supported by the SQLite database, to the SQLite Ibis backend.

The Julian day of a date, is the number of days since January 1st, 4713 BC. For more information check the [Julian day](https://en.wikipedia.org/wiki/Julian_day) wikipedia page.
The Julian day of a date, is the number of days since January 1st, 4713 BC. For more information check the [Julian day](https://en.wikipedia.org/wiki/Julian_day) Wikipedia page.

## Step 1: Define the Operation

Let's define the `julianday` operation as a function that takes one string input argument and returns a float.

```python
def julianday(date: str) -> float:
"""Julian date"""
"""Return the Julian day from a date."""
```


Expand All @@ -37,15 +37,15 @@ We just defined a `JulianDay` class that takes one argument of type string or bi

Because we know the output type of the operation, to make an expression out of ``JulianDay`` we simply need to construct it and call its `ibis.expr.types.Node.to_expr` method.

We still need to add a method to `StringValue` and `BinaryValue` (this needs to work on both scalars and columns).
We still need to add a method to `StringValue` (this needs to work on both scalars and columns).

When you add a method to any of the expression classes whose name matches `*Value` both the scalar and column child classes will pick it up, making it easy to define operations for both scalars and columns in one place.

We can do this by defining a function and assigning it to the appropriate class
of expressions.

```{python}
from ibis.expr.types import BinaryValue, StringValue
from ibis.expr.types import StringValue
def julianday(string_value):
Expand All @@ -55,13 +55,13 @@ def julianday(string_value):
StringValue.julianday = julianday
```

## Interlude: Create some expressions with `sha1`
## Interlude: Create some expressions with `julianday`


```{python}
import ibis
t = ibis.table([('string_col', 'string')], name='t')
t = ibis.table(dict(string_col="string"), name="t")
t.string_col.julianday()
```
Expand Down Expand Up @@ -91,46 +91,39 @@ Download the geography database.

```{python}
!curl -LsS -o geography.db 'https://storage.googleapis.com/ibis-tutorial-data/geography.db'
```


```{python}
import os
db_fname = 'geography.db'
con = ibis.sqlite.connect(db_fname)
con = ibis.sqlite.connect("geography.db")
```

### Create and execute a `julianday` expression


```{python}
independence = con.table('independence')
independence
ind = con.table("independence")
ind
```


```{python}
day = independence.independence_date.cast('string')
day = ind.independence_date.cast("string")
day
```


```{python}
julianday_expr = day.julianday().name("jday")
julianday_expr
jday_expr = day.julianday().name("jday")
jday_expr
```


```{python}
ibis.to_sql(julianday_expr)
ibis.to_sql(jday_expr)
```

Because we've defined our operation on `StringValue`, and not just on `StringColumn` we get operations on both string scalars *and* string columns for free


```{python}
jday = ibis.literal('2010-03-14').julianday()
jday = ibis.literal("2010-03-14").julianday()
con.execute(jday)
```
61 changes: 30 additions & 31 deletions docs/how-to/extending/reduction.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -13,16 +13,19 @@ We're going to add a **`last_date`** function to ibis. `last_date` simply return
Let's define the `last_date` operation as a function that takes any date column as input and returns a date:

```python
import datetime
import typing
from __future__ import annotations

def last_date(dates: typing.List[datetime.date]) -> datetime.date:
"""Latest date"""
from datetime import date


def last_date(dates: list[date]) -> date:
"""Latest date."""
```


```{python}
from typing import Optional
from __future__ import annotations
import ibis.expr.datatypes as dt
import ibis.expr.datashape as ds
import ibis.expr.rules as rlz
Expand All @@ -32,9 +35,9 @@ from ibis.expr.operations import Reduction, Value
class LastDate(Reduction):
arg: Value[dt.Date, ds.Any]
where: Optional[Value[dt.Boolean, ds.Any]] = None
where: Value[dt.Boolean, ds.Any] | None = None
dtype = rlz.dtype_like('arg')
dtype = rlz.dtype_like("arg")
shape = ds.scalar
```

Expand All @@ -44,13 +47,13 @@ We just defined a `LastDate` class that takes one date column as input, and retu

## Step 2: Define the API

Because every reduction in ibis has the ability to filter out values during aggregation (a typical feature in databases and analytics tools), to make an expression out of ``LastDate`` we need to pass an additional argument: `where` to our `LastDate` constructor.
Because every reduction in ibis has the ability to filter out values during aggregation, to make an expression out of `LastDate` we need to pass an additional argument `where` to our `LastDate` constructor.

Additionally, reductions should be defined on `Column` classes because reductions are not always well-defined for a scalar value.


```{python}
from ibis.expr.types import (
DateColumn, # not DateValue! reductions are only valid on columns
)
from ibis.expr.types import DateColumn
def last_date(date_column, where=None):
Expand All @@ -65,8 +68,10 @@ DateColumn.last_date = last_date
```{python}
import ibis
people = ibis.table(
dict(name='string', country='string', date_of_birth='date'), name='people'
dict(name="string", country="string", date_of_birth="date"),
name="people",
)
```

Expand All @@ -77,7 +82,7 @@ people.date_of_birth.last_date()


```{python}
people.date_of_birth.last_date(people.country == 'Indonesia')
people.date_of_birth.last_date(people.country == "Indonesia")
```

## Step 3: Turn the Expression into SQL
Expand All @@ -90,12 +95,15 @@ import sqlalchemy as sa
@ibis.sqlite.add_operation(LastDate)
def _last_date(translator, expr):
# pull out the arguments to the expression
arg, where = expr.op().args
op = expr.op()
arg = op.arg
where = op.where
# compile the argument
compiled_arg = translator.translate(arg)
# call the appropriate SQLite function (`max` for the latest/maximum date)
# call the appropriate SQLite function (`max` for the latest date)
agg = sa.func.max(compiled_arg)
# handle a non-None filter clause
Expand All @@ -110,32 +118,23 @@ Download the geography database.

```{python}
!curl -LsS -o geography.db 'https://storage.googleapis.com/ibis-tutorial-data/geography.db'
```


```{python}
import os
import ibis
db_fname = 'geography.db'
con = ibis.sqlite.connect(db_fname)
con = ibis.sqlite.connect("geography.db")
```

### Create and execute a `bitwise_and` expression


```{python}
independence = con.table('independence')
independence
ind = con.table("independence")
ind
```

Last country to gain independence in our database:


```{python}
expr = independence.independence_date.last_date()
expr = ind.independence_date.last_date()
expr
```

Expand All @@ -144,12 +143,12 @@ expr
ibis.to_sql(expr)
```

Last country to gain independence from the Spanish Empire, using the `where` parameter:
Show the last country to gain independence from the Spanish Empire, using the `where` parameter:


```{python}
expr = independence.independence_date.last_date(
where=independence.independence_from == 'Spanish Empire'
expr = ind.independence_date.last_date(
where=ind.independence_from == "Spanish Empire"
)
expr
```

1 comment on commit 8da58d4

@ibis-squawk-bot
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 3.

Benchmark suite Current: 8da58d4 Previous: b37804a Ratio
ibis/tests/benchmarks/test_benchmarks.py::test_compile[small-impala] 1733.6617205315047 iter/sec (stddev: 0.008295550509259) 12596.366232062164 iter/sec (stddev: 0.000015222207751772768) 7.27

This comment was automatically generated by workflow using github-action-benchmark.

Please sign in to comment.