observablehq · allisonhorst · Jun 14, 2024 · Jun 13, 2024
diff --git a/examples/README.md b/examples/README.md
@@ -48,6 +48,7 @@
 - [`loader-julia-to-txt`](https://observablehq.observablehq.cloud/framework-example-loader-julia-to-txt/) - Generating TXT from Julia
 - [`loader-parquet`](https://observablehq.observablehq.cloud/framework-example-loader-parquet/) - Generating Apache Parquet files
 - [`loader-postgres`](https://observablehq.observablehq.cloud/framework-example-loader-postgres/) - Loading data from PostgreSQL
+- [`loader-python-to-csv`](https://observablehq.observablehq.cloud/framework-example-penguin-classification/) - Generating CSV from Python
 - [`loader-python-to-png`](https://observablehq.observablehq.cloud/framework-example-loader-python-to-png/) - Generating PNG from Python
 - [`loader-python-to-zip`](https://observablehq.observablehq.cloud/framework-example-loader-python-to-zip/) - Generating ZIP from Python
 - [`loader-r-to-csv`](https://observablehq.observablehq.cloud/framework-example-loader-r-to-csv/) - Generating CSV from R
@@ -75,7 +76,6 @@
 - [`google-analytics`](https://observablehq.observablehq.cloud/framework-example-google-analytics/) - A Google Analytics dashboard with numbers and charts
 - [`hello-world`](https://observablehq.observablehq.cloud/framework-example-hello-world/) - A minimal Framework project
 - [`intersection-observer`](https://observablehq.observablehq.cloud/framework-example-intersection-observer/) - Scrollytelling with IntersectionObserver
-- [`penguin-classification`](https://observablehq.observablehq.cloud/framework-example-penguin-classification/) - Logistic regression in Python; validating models with Observable Plot
 - [`responsive-iframe`](https://observablehq.observablehq.cloud/framework-example-responsive-iframe/) - Adjust the height of an embedded iframe to fit its content
 
 ## About these examples

diff --git a/examples/penguin-classification/.gitignore → examples/loader-python-to-csv/.gitignore b/examples/penguin-classification/.gitignore → examples/loader-python-to-csv/.gitignore
diff --git a/examples/loader-python-to-csv/README.md b/examples/loader-python-to-csv/README.md
@@ -0,0 +1,9 @@
+[Framework examples →](../)
+
+# Python data loader to generate CSV
+
+View live: <https://observablehq.observablehq.cloud/framework-example-penguin-classification/>
+
+This Observable Framework example demonstrates how to write a data loader in Python to generate a CSV file. The data loader uses scikit-learn’s [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) function to classify [penguins](https://journal.r-project.org/articles/RJ-2022-020/) by species, based on body mass, culmen and flipper measurements. Charts (made with Observable Plot) explore which penguins are misclassified.
+
+The data loader lives in [`src/data/predictions.csv.py`](./src/data/predictions.csv.py).
diff --git a/...uin-classification/observablehq.config.js → ...ader-python-to-csv/observablehq.config.js b/...uin-classification/observablehq.config.js → ...ader-python-to-csv/observablehq.config.js
diff --git a/examples/penguin-classification/package.json → examples/loader-python-to-csv/package.json b/examples/penguin-classification/package.json → examples/loader-python-to-csv/package.json
diff --git a/...s/penguin-classification/requirements.txt → ...les/loader-python-to-csv/requirements.txt b/...s/penguin-classification/requirements.txt → ...les/loader-python-to-csv/requirements.txt
diff --git a/...les/penguin-classification/src/.gitignore → examples/loader-python-to-csv/src/.gitignore b/...les/penguin-classification/src/.gitignore → examples/loader-python-to-csv/src/.gitignore
diff --git a/...guin-classification/src/data/penguins.csv → ...oader-python-to-csv/src/data/penguins.csv b/...guin-classification/src/data/penguins.csv → ...oader-python-to-csv/src/data/penguins.csv
diff --git a/...lassification/src/data/predictions.csv.py → ...python-to-csv/src/data/predictions.csv.py b/...lassification/src/data/predictions.csv.py → ...python-to-csv/src/data/predictions.csv.py
diff --git a/examples/loader-python-to-csv/src/index.md b/examples/loader-python-to-csv/src/index.md
@@ -0,0 +1,85 @@
+# Python data loader to generate CSV
+
+Here’s a Python data loader that performs logistic regression to classify penguin species based on bill and body size measurements, then outputs a CSV file to standard out.
+
+```python
+import pandas as pd
+from sklearn.linear_model import LogisticRegression
+import sys
+
+# Read the CSV
+df = pd.read_csv("src/data/penguins.csv")
+
+# Select columns to train the model
+X = df.iloc[:, [2, 3, 4, 5]]
+Y = df.iloc[:, 0]
+
+# Create an instance of Logistic Regression Classifier and fit the data.
+logreg = LogisticRegression()
+logreg.fit(X, Y)
+
+results = df.copy();
+# Add predicted values
+results['species_predicted'] = logreg.predict(X)
+
+# Write to CSV
+results.to_csv(sys.stdout)
+```
+
+<div class="note">
+
+To run this data loader, you’ll need python3 and the geopandas, matplotlib, io, and sys modules installed and available on your `$PATH`. We recommend setting up a virtual environment.
+
+</div>
+
+To start and activate a virtual Python environment, run the following commands:
+
+```
+$ python3 -m venv .venv
+$ source .venv/bin/activate
+```
+
+Then install the required modules from `requirements.txt` using:
+
+```
+$ pip install -r requirements.txt
+```
+
+The above data loader lives in `data/predictions.csv.py`, so we can load the data using `data/predictions.csv` with `FileAttachment`:
+
+```js echo
+const predictions = FileAttachment("data/predictions.csv").csv({typed: true});
+```
+
+We can create a quick chart of predicted species, highlighting cases where penguins are misclassified, using Observable Plot:
+
+```js echo
+Plot.plot({
+  grid: true,
+  height: 400,
+  caption: "Incorrect predictions highlighted with diamonds. Actual species encoded with color and predicted species encoded with symbols.",
+  color: {
+    legend: true,
+  },
+  x: {label: "Culmen length (mm)"},
+  y: {label: "Culmen depth (mm)"},
+  marks: [
+    Plot.dot(predictions, {
+      x: "culmen_length_mm",
+      y: "culmen_depth_mm",
+      stroke: "species",
+      symbol: "species_predicted",
+      r: 3,
+      tip: {channels: {"mass": "body_mass_g"}}
+    }),
+    Plot.dot(predictions, {
+      filter: (d) => d.species !== d.species_predicted,
+      x: "culmen_length_mm",
+      y: "culmen_depth_mm",
+      r: 7,
+      symbol: "diamond",
+      stroke: "currentColor"
+    })
+  ],
+})
+```
diff --git a/examples/penguin-classification/README.md b/examples/penguin-classification/README.md
diff --git a/examples/penguin-classification/src/index.md b/examples/penguin-classification/src/index.md