-
Notifications
You must be signed in to change notification settings - Fork 609
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs(talks): pycon 2024 maintainers talk (#9193)
Maintainers talk for PyCon 2024
- Loading branch information
Showing
9 changed files
with
328 additions
and
0 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,328 @@ | ||
--- | ||
title: "Test 20 databases on every commit" | ||
subtitle: "You can, it's not hyperbole" | ||
author: | ||
- Phillip Cloud | ||
execute: | ||
echo: true | ||
format: | ||
revealjs: | ||
footer: <https://ibis-project.org> | ||
chalkboard: true | ||
# https://quarto.org/docs/presentations/revealjs/themes.html#using-themes | ||
theme: dark | ||
--- | ||
|
||
# What | ||
|
||
## Maybe this is you | ||
|
||
![](./docker-eye-roll.gif){fig-align="center"} | ||
|
||
## Or this | ||
|
||
![](./wonka.png){fig-align="center"} | ||
|
||
## Or maybe even this | ||
|
||
![](./basement-ci.jpeg){fig-align="center"} | ||
|
||
## Not earth shattering | ||
|
||
:::: {.columns} | ||
|
||
::: {.column width="50%"} | ||
### Overview | ||
|
||
- What we learned about maintenance building Ibis | ||
- Day to day of supporting 20+ databases | ||
- Unique challenges | ||
::: | ||
|
||
::: {.column width="50%"} | ||
### Topics | ||
|
||
- Some docker stuff | ||
- Some packaging stuff | ||
- Some CI stuff | ||
- Some `pytest` plugins stuff | ||
::: | ||
:::: | ||
|
||
# Overview of Ibis | ||
|
||
## Ibis is a Python library for: | ||
|
||
- exploratory data analysis (EDA) | ||
- analytics | ||
- data engineering | ||
- ML preprocessing | ||
- building your own dataframe library | ||
|
||
::: {.r-fit-text} | ||
dev to prod with the same API | ||
::: | ||
|
||
## One API, 20+ backends {.smaller .scrollable} | ||
|
||
```{python} | ||
#| code-fold: true | ||
#| echo: false | ||
import ibis | ||
ibis.options.interactive = True | ||
t = ibis.examples.penguins.fetch() | ||
t.to_parquet("penguins.parquet") | ||
``` | ||
|
||
::: {.panel-tabset} | ||
|
||
## DuckDB | ||
|
||
```{python} | ||
con = ibis.connect("duckdb://") | ||
``` | ||
|
||
```{python} | ||
t = con.read_parquet("penguins.parquet") | ||
t.head(3) | ||
``` | ||
|
||
```{python} | ||
t.group_by("species", "island").agg(count=t.count()).order_by("count") | ||
``` | ||
|
||
## Polars | ||
|
||
```{python} | ||
con = ibis.connect("polars://") | ||
``` | ||
|
||
```{python} | ||
t = con.read_parquet("penguins.parquet") | ||
t.head(3) | ||
``` | ||
|
||
```{python} | ||
t.group_by("species", "island").agg(count=t.count()).order_by("count") | ||
``` | ||
|
||
## DataFusion | ||
|
||
```{python} | ||
con = ibis.connect("datafusion://") | ||
``` | ||
|
||
```{python} | ||
t = con.read_parquet("penguins.parquet") | ||
t.head(3) | ||
``` | ||
|
||
```{python} | ||
t.group_by("species", "island").agg(count=t.count()).order_by("count") | ||
``` | ||
|
||
## PySpark | ||
|
||
```{python} | ||
con = ibis.connect("pyspark://") | ||
``` | ||
|
||
```{python} | ||
t = con.read_parquet("penguins.parquet") | ||
t.head(3) | ||
``` | ||
|
||
```{python} | ||
t.group_by("species", "island").agg(count=t.count()).order_by("count") | ||
``` | ||
|
||
## 16+ other things | ||
|
||
![](./machine.gif){fig-align="center" width="100%" height="100%"} | ||
|
||
::: | ||
|
||
## How it works | ||
|
||
```{python} | ||
#| echo: false | ||
import os | ||
import sys | ||
sys.path.append(os.path.abspath("../..")) | ||
from backends_sankey import fig | ||
fig.show() | ||
``` | ||
|
||
# What's in an Ibis? | ||
|
||
## By the numbers {.smaller} | ||
|
||
:::: {.columns} | ||
::: {.column width="50%"} | ||
### Backends | ||
- **17** SQL | ||
- **3** non-SQL | ||
- **2** cloud | ||
::: | ||
|
||
::: {.column width="50%"} | ||
### Engines + APIs | ||
- **9** distributed SQL | ||
- **3** dataframe | ||
- oldest: **~45** years 👀 | ||
- youngest: **~2** years | ||
::: | ||
:::: | ||
|
||
### Other facts | ||
|
||
- Latency is variable | ||
- Deployment models vary | ||
|
||
::: {.fragment} | ||
::: {.r-fit-text} | ||
_… **Feature development**_❓ | ||
::: | ||
::: | ||
|
||
## Bit of a pickle | ||
|
||
![](./picklerick.png) | ||
|
||
# How | ||
|
||
## High level | ||
|
||
### Goal: fast iteration | ||
|
||
- fast env setup (dependency management) | ||
- fast(ish) tests (test-running library) | ||
- high **job** concurrency (ci/provider) | ||
- **easy to run**: dev speed ([`just`](https://github.com/casey/just)) | ||
|
||
::: {.fragment} | ||
::: {.r-fit-text} | ||
_CI must complete "quickly"_ | ||
::: | ||
::: | ||
|
||
## Tools: overview | ||
|
||
- **deps**: poetry | ||
- **ci**: GitHub Actions | ||
- **wild beasts**: docker | ||
- **house pets**: docker | ||
- cool kids don't get special tx (duckdb, polars) | ||
- task runner (e.g.: `just up postgres`) | ||
|
||
## Tools: poetry | ||
|
||
::: {.callout-warning} | ||
## Opinions follow | ||
::: | ||
|
||
- **Env setup needs to be _fast_**: avoid constraint solving | ||
- Poetry is one way; there are others | ||
- Get yourself a lockfile | ||
|
||
::: {.fragment} | ||
::: {.r-fit-text} | ||
… _Are you doing that **now**_❔❓ | ||
::: | ||
::: | ||
|
||
## This plot | ||
|
||
::: {layout="[[-1], [1], [-1]]"} | ||
|
||
![](./progress.png){fig-align="center"} | ||
|
||
::: | ||
|
||
::: {.fragment} | ||
::: {.r-fit-text} | ||
_We've added 3 or 4 new backends since the switch_ | ||
::: | ||
::: | ||
|
||
## Tools: docker | ||
|
||
- Sure, docker | ||
- But, do you to use it locally? | ||
- Use health checks; "dumb" ones are fine | ||
- Make it easy for devs to use | ||
|
||
## Tools: GitHub Actions {.smaller} | ||
|
||
::: {.callout-note} | ||
### I don't work for GitHub | ||
::: | ||
|
||
- Pay for the [the Teams plan](https://docs.github.com/en/actions/learn-github-actions/usage-limits-billing-and-administration#usage-limits) to get more concurrency | ||
- Automate dependency updates | ||
|
||
::: {.columns} | ||
::: {.column width="50%"} | ||
### GHA concurrency limits | ||
|
||
![](./gha.png) | ||
::: | ||
|
||
::: {.column width="50%"} | ||
### Ibis CI cost | ||
|
||
![](./bill.png) | ||
::: | ||
::: | ||
|
||
## `pytest` {.smaller} | ||
|
||
### Ibis problems | ||
|
||
- Backends don't implement the same stuff | ||
- Need to know when backend passes | ||
- Need to specify exception type it raises | ||
- Answer questions like: "will it _ever_ blend?" | ||
|
||
::: {.fragment} | ||
### Markers + hooks | ||
|
||
```python | ||
@pytest.mark.never("duckdb") # never gonna happen | ||
@pytest.mark.notyet("impala") # might happen | ||
@pytest.mark.notimpl("snowflake") # ibis devs: do some work | ||
def test_soundex(): | ||
... | ||
|
||
def pytest_ignore_collect(...): | ||
# pytest -m duckdb: don't collect things that aren't marked duckdb | ||
... | ||
``` | ||
::: | ||
|
||
## `pytest` plugins you may like | ||
|
||
**`pytest-`** | ||
|
||
- `xdist`: try to make this work if you can | ||
- `randomly`: exposes your bogus and stateful assumptions | ||
- `repeat`: great when randomly exposes your busted assumptions | ||
- `clarity`: you will hate read failure diffs less | ||
- `snapshot`: better than that giant `f`-string you just wrote | ||
|
||
**hypothesis** 👈 that too, we don't use it enough | ||
|
||
# Summary | ||
|
||
- Use docker for dev **and** "prod" | ||
- Lock your dependencies (dev only!) | ||
- Auto update stuff | ||
- `pytest` probably has a thing for that | ||
- Spend time on dev ex | ||
- Track CI run durations, look at them too | ||
|
||
# Questions? |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.