docs: some how-to updates

ibis-project · Sep 6, 2023 · 6627016 · 6627016
1 parent c20d3ee
commit 6627016
Show file tree

Hide file tree

Showing 5 changed files with 143 additions and 114 deletions.
diff --git a/docs/_quarto.yml b/docs/_quarto.yml
@@ -102,8 +102,6 @@ website:
         - auto: "how-to/input-output"
         - auto: "how-to/analytics"
         - auto: "how-to/visualization"
-        - auto: "how-to/external-dataframes"
-        - auto: "how-to/old"
     - id: contribute
       title: "Contribute"
       style: "docked"

diff --git a/docs/how-to/analytics/basics.qmd b/docs/how-to/analytics/basics.qmd
@@ -1,3 +1,78 @@
 # Basic analytics
 
-TODO
+Assuming you have a table:
+
+{{< include /_code/setup_penguins.qmd >}}
+
+You can perform basic analytics by selecting, grouping, aggregating, filtering, sorting, mutating, and joining data.
+
+## Selecting
+
+Use the `.select()` method to select columns:
+```{python}
+t.select("species", "island", "year")
+```
+
+## Filtering
+
+Use the `.filter()` method to filter rows:
+```{python}
+t.filter(t["species"] != "Adelie")
+```
+
+## Aggregating
+
+Use the `.aggregate()` method to aggregate data:
+```{python}
+t.aggregate(avg_bill_length=t["bill_length_mm"].mean())
+```
+
+## Grouping
+
+Use the `.group_by()` method to group data:
+
+```{python}
+t.group_by(["species", "island"]).aggregate(avg_bill_length=t["bill_length_mm"].mean())
+```
+
+## Ordering
+
+Use the `order_by()` method to order data:
+
+```{python}
+t.order_by(t["bill_length_mm"].desc())
+```
+
+## Mutating
+
+Use the `.mutate()` method to create new columns:
+
+```{python}
+t.mutate(bill_length_cm=t["bill_length_mm"] / 10).relocate(
+    t.columns[0:2], "bill_length_cm"
+)
+```
+
+## Joining
+
+Use the `.join()` method to join data:
+
+```{python}
+t.join(t, t["species"] == t["species"], how="left_semi")
+```
+
+## Combining it all together
+
+We can use [the underscore to chain expressions together](./chain_expressions.qmd).]
+
+```{python}
+t.join(t, t["species"] == t["species"], how="left_semi").filter(
+    ibis._["species"] != "Adelie"
+).group_by(["species", "island"]).aggregate(
+    avg_bill_length=ibis._["bill_length_mm"].mean()
+).order_by(
+    ibis._["avg_bill_length"].desc()
+)
+```
+
+Since we've turned on interactive mode here, this executes the query and displays the result.
diff --git a/docs/how-to/analytics/chain_expressions.qmd b/docs/how-to/analytics/chain_expressions.qmd
@@ -15,7 +15,7 @@ from ibis import _
 import pandas as pd
 ```
 
-Let's create two in-memory tables using [`ibis.memtable`](../external-dataframes/memtable_join.qmd), an API introduced in 3.2:
+Let's create two in-memory tables using [`ibis.memtable`], an API introduced in 3.2:
 
 ```{python}
 df1 = pd.DataFrame({'x': range(5), 'y': list('ab')*2 + list('e')})

diff --git a/docs/how-to/external-dataframes/memtable_join.qmd b/docs/how-to/external-dataframes/memtable_join.qmd
diff --git a/docs/how-to/input-output/multiple-backends.qmd b/docs/how-to/input-output/multiple-backends.qmd
@@ -1,3 +1,68 @@
 # Work with multiple backends
 
-You can...
+You can work with multiple backends by creating and using separate connections.
+
+## Local example
+
+We'll use some of the local backends to demonstrate, but this applies to any backends.
+
+```{python}
+import ibis
+
+ibis.options.interactive = True
+
+t = ibis.examples.penguins.fetch()
+t.to_parquet("penguins.parquet")
+t.head(3)
+```
+
+You can create a connection or several:
+
+```{python}
+ddb_con = ibis.duckdb.connect()
+ddb_con2 = ibis.duckdb.connect()
+```
+
+You can use the connection to create a table:
+
+```{python}
+ddb_con.read_parquet("penguins.parquet")
+```
+
+```{python}
+ddb_con2.read_parquet("penguins.parquet")
+```
+
+Or different backends:
+
+```{python}
+pl_con = ibis.polars.connect()
+pl_con2 = ibis.polars.connect()
+```
+
+```{python}
+pl_con.read_parquet("penguins.parquet")
+```
+
+```{python}
+pl_con2.read_parquet("penguins.parquet")
+```
+
+Or a different backend:
+
+```{python}
+df_con = ibis.datafusion.connect()
+df_con2 = ibis.datafusion.connect()
+```
+
+```{python}
+df_con.read_parquet("penguins.parquet")
+```
+
+```{python}
+df_con2.read_parquet("penguins.parquet")
+```
+
+## Next steps
+
+After connecting to multiple backends, use them like normal! You can check out [input and output formats, including other Python dataframes](./basics.qmd) for more information on how to get data in and out of backends.