Skip to content

Commit

Permalink
docs(talks): pycon 2024 maintainers talk (#9193)
Browse files Browse the repository at this point in the history
Maintainers talk for PyCon 2024
  • Loading branch information
cpcloud authored May 15, 2024
1 parent 142c105 commit 77d6cb6
Show file tree
Hide file tree
Showing 9 changed files with 328 additions and 0 deletions.
Binary file added docs/presentations/pycon2024/basement-ci.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/presentations/pycon2024/bill.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/presentations/pycon2024/docker-eye-roll.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/presentations/pycon2024/gha.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/presentations/pycon2024/machine.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
328 changes: 328 additions & 0 deletions docs/presentations/pycon2024/maintainers.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,328 @@
---
title: "Test 20 databases on every commit"
subtitle: "You can, it's not hyperbole"
author:
- Phillip Cloud
execute:
echo: true
format:
revealjs:
footer: <https://ibis-project.org>
chalkboard: true
# https://quarto.org/docs/presentations/revealjs/themes.html#using-themes
theme: dark
---

# What

## Maybe this is you

![](./docker-eye-roll.gif){fig-align="center"}

## Or this

![](./wonka.png){fig-align="center"}

## Or maybe even this

![](./basement-ci.jpeg){fig-align="center"}

## Not earth shattering

:::: {.columns}

::: {.column width="50%"}
### Overview

- What we learned about maintenance building Ibis
- Day to day of supporting 20+ databases
- Unique challenges
:::

::: {.column width="50%"}
### Topics

- Some docker stuff
- Some packaging stuff
- Some CI stuff
- Some `pytest` plugins stuff
:::
::::

# Overview of Ibis

## Ibis is a Python library for:

- exploratory data analysis (EDA)
- analytics
- data engineering
- ML preprocessing
- building your own dataframe library

::: {.r-fit-text}
dev to prod with the same API
:::

## One API, 20+ backends {.smaller .scrollable}

```{python}
#| code-fold: true
#| echo: false
import ibis
ibis.options.interactive = True
t = ibis.examples.penguins.fetch()
t.to_parquet("penguins.parquet")
```

::: {.panel-tabset}

## DuckDB

```{python}
con = ibis.connect("duckdb://")
```

```{python}
t = con.read_parquet("penguins.parquet")
t.head(3)
```

```{python}
t.group_by("species", "island").agg(count=t.count()).order_by("count")
```

## Polars

```{python}
con = ibis.connect("polars://")
```

```{python}
t = con.read_parquet("penguins.parquet")
t.head(3)
```

```{python}
t.group_by("species", "island").agg(count=t.count()).order_by("count")
```

## DataFusion

```{python}
con = ibis.connect("datafusion://")
```

```{python}
t = con.read_parquet("penguins.parquet")
t.head(3)
```

```{python}
t.group_by("species", "island").agg(count=t.count()).order_by("count")
```

## PySpark

```{python}
con = ibis.connect("pyspark://")
```

```{python}
t = con.read_parquet("penguins.parquet")
t.head(3)
```

```{python}
t.group_by("species", "island").agg(count=t.count()).order_by("count")
```

## 16+ other things

![](./machine.gif){fig-align="center" width="100%" height="100%"}

:::

## How it works

```{python}
#| echo: false
import os
import sys
sys.path.append(os.path.abspath("../.."))
from backends_sankey import fig
fig.show()
```

# What's in an Ibis?

## By the numbers {.smaller}

:::: {.columns}
::: {.column width="50%"}
### Backends
- **17** SQL
- **3** non-SQL
- **2** cloud
:::

::: {.column width="50%"}
### Engines + APIs
- **9** distributed SQL
- **3** dataframe
- oldest: **~45** years 👀
- youngest: **~2** years
:::
::::

### Other facts

- Latency is variable
- Deployment models vary

::: {.fragment}
::: {.r-fit-text}
_**Feature development**_
:::
:::

## Bit of a pickle

![](./picklerick.png)

# How

## High level

### Goal: fast iteration

- fast env setup (dependency management)
- fast(ish) tests (test-running library)
- high **job** concurrency (ci/provider)
- **easy to run**: dev speed ([`just`](https://github.com/casey/just))

::: {.fragment}
::: {.r-fit-text}
_CI must complete "quickly"_
:::
:::

## Tools: overview

- **deps**: poetry
- **ci**: GitHub Actions
- **wild beasts**: docker
- **house pets**: docker
- cool kids don't get special tx (duckdb, polars)
- task runner (e.g.: `just up postgres`)

## Tools: poetry

::: {.callout-warning}
## Opinions follow
:::

- **Env setup needs to be _fast_**: avoid constraint solving
- Poetry is one way; there are others
- Get yourself a lockfile

::: {.fragment}
::: {.r-fit-text}
_Are you doing that **now**_❔❓
:::
:::

## This plot

::: {layout="[[-1], [1], [-1]]"}

![](./progress.png){fig-align="center"}

:::

::: {.fragment}
::: {.r-fit-text}
_We've added 3 or 4 new backends since the switch_
:::
:::

## Tools: docker

- Sure, docker
- But, do you to use it locally?
- Use health checks; "dumb" ones are fine
- Make it easy for devs to use

## Tools: GitHub Actions {.smaller}

::: {.callout-note}
### I don't work for GitHub
:::

- Pay for the [the Teams plan](https://docs.github.com/en/actions/learn-github-actions/usage-limits-billing-and-administration#usage-limits) to get more concurrency
- Automate dependency updates

::: {.columns}
::: {.column width="50%"}
### GHA concurrency limits

![](./gha.png)
:::

::: {.column width="50%"}
### Ibis CI cost

![](./bill.png)
:::
:::

## `pytest` {.smaller}

### Ibis problems

- Backends don't implement the same stuff
- Need to know when backend passes
- Need to specify exception type it raises
- Answer questions like: "will it _ever_ blend?"

::: {.fragment}
### Markers + hooks

```python
@pytest.mark.never("duckdb") # never gonna happen
@pytest.mark.notyet("impala") # might happen
@pytest.mark.notimpl("snowflake") # ibis devs: do some work
def test_soundex():
...

def pytest_ignore_collect(...):
# pytest -m duckdb: don't collect things that aren't marked duckdb
...
```
:::

## `pytest` plugins you may like

**`pytest-`**

- `xdist`: try to make this work if you can
- `randomly`: exposes your bogus and stateful assumptions
- `repeat`: great when randomly exposes your busted assumptions
- `clarity`: you will hate read failure diffs less
- `snapshot`: better than that giant `f`-string you just wrote

**hypothesis** 👈 that too, we don't use it enough

# Summary

- Use docker for dev **and** "prod"
- Lock your dependencies (dev only!)
- Auto update stuff
- `pytest` probably has a thing for that
- Spend time on dev ex
- Track CI run durations, look at them too

# Questions?
Binary file added docs/presentations/pycon2024/picklerick.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/presentations/pycon2024/progress.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/presentations/pycon2024/wonka.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 77d6cb6

Please sign in to comment.