Skip to content

Commit

Permalink
docs: some how-to updates
Browse files Browse the repository at this point in the history
  • Loading branch information
lostmygithubaccount authored and cpcloud committed Sep 6, 2023
1 parent c20d3ee commit 6627016
Show file tree
Hide file tree
Showing 5 changed files with 143 additions and 114 deletions.
2 changes: 0 additions & 2 deletions docs/_quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -102,8 +102,6 @@ website:
- auto: "how-to/input-output"
- auto: "how-to/analytics"
- auto: "how-to/visualization"
- auto: "how-to/external-dataframes"
- auto: "how-to/old"
- id: contribute
title: "Contribute"
style: "docked"
Expand Down
77 changes: 76 additions & 1 deletion docs/how-to/analytics/basics.qmd
Original file line number Diff line number Diff line change
@@ -1,3 +1,78 @@
# Basic analytics

TODO
Assuming you have a table:

{{< include /_code/setup_penguins.qmd >}}

You can perform basic analytics by selecting, grouping, aggregating, filtering, sorting, mutating, and joining data.

## Selecting

Use the `.select()` method to select columns:
```{python}
t.select("species", "island", "year")
```

## Filtering

Use the `.filter()` method to filter rows:
```{python}
t.filter(t["species"] != "Adelie")
```

## Aggregating

Use the `.aggregate()` method to aggregate data:
```{python}
t.aggregate(avg_bill_length=t["bill_length_mm"].mean())
```

## Grouping

Use the `.group_by()` method to group data:

```{python}
t.group_by(["species", "island"]).aggregate(avg_bill_length=t["bill_length_mm"].mean())
```

## Ordering

Use the `order_by()` method to order data:

```{python}
t.order_by(t["bill_length_mm"].desc())
```

## Mutating

Use the `.mutate()` method to create new columns:

```{python}
t.mutate(bill_length_cm=t["bill_length_mm"] / 10).relocate(
t.columns[0:2], "bill_length_cm"
)
```

## Joining

Use the `.join()` method to join data:

```{python}
t.join(t, t["species"] == t["species"], how="left_semi")
```

## Combining it all together

We can use [the underscore to chain expressions together](./chain_expressions.qmd).]

```{python}
t.join(t, t["species"] == t["species"], how="left_semi").filter(
ibis._["species"] != "Adelie"
).group_by(["species", "island"]).aggregate(
avg_bill_length=ibis._["bill_length_mm"].mean()
).order_by(
ibis._["avg_bill_length"].desc()
)
```

Since we've turned on interactive mode here, this executes the query and displays the result.
2 changes: 1 addition & 1 deletion docs/how-to/analytics/chain_expressions.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ from ibis import _
import pandas as pd
```

Let's create two in-memory tables using [`ibis.memtable`](../external-dataframes/memtable_join.qmd), an API introduced in 3.2:
Let's create two in-memory tables using [`ibis.memtable`], an API introduced in 3.2:

```{python}
df1 = pd.DataFrame({'x': range(5), 'y': list('ab')*2 + list('e')})
Expand Down
109 changes: 0 additions & 109 deletions docs/how-to/external-dataframes/memtable_join.qmd

This file was deleted.

67 changes: 66 additions & 1 deletion docs/how-to/input-output/multiple-backends.qmd
Original file line number Diff line number Diff line change
@@ -1,3 +1,68 @@
# Work with multiple backends

You can...
You can work with multiple backends by creating and using separate connections.

## Local example

We'll use some of the local backends to demonstrate, but this applies to any backends.

```{python}
import ibis
ibis.options.interactive = True
t = ibis.examples.penguins.fetch()
t.to_parquet("penguins.parquet")
t.head(3)
```

You can create a connection or several:

```{python}
ddb_con = ibis.duckdb.connect()
ddb_con2 = ibis.duckdb.connect()
```

You can use the connection to create a table:

```{python}
ddb_con.read_parquet("penguins.parquet")
```

```{python}
ddb_con2.read_parquet("penguins.parquet")
```

Or different backends:

```{python}
pl_con = ibis.polars.connect()
pl_con2 = ibis.polars.connect()
```

```{python}
pl_con.read_parquet("penguins.parquet")
```

```{python}
pl_con2.read_parquet("penguins.parquet")
```

Or a different backend:

```{python}
df_con = ibis.datafusion.connect()
df_con2 = ibis.datafusion.connect()
```

```{python}
df_con.read_parquet("penguins.parquet")
```

```{python}
df_con2.read_parquet("penguins.parquet")
```

## Next steps

After connecting to multiple backends, use them like normal! You can check out [input and output formats, including other Python dataframes](./basics.qmd) for more information on how to get data in and out of backends.

0 comments on commit 6627016

Please sign in to comment.