Merge pull request #2 from opendatastudio/1-datapackage-datakit

dataflow -> datakit
opendatastudio · Oct 20, 2024 · 275a6f7 · 275a6f7
2 parents 5ec72e9 + 972f91f
commit 275a6f7
Show file tree

Hide file tree

Showing 12 changed files with 81 additions and 81 deletions.
diff --git a/astro.config.mjs b/astro.config.mjs
@@ -16,9 +16,9 @@ export default defineConfig({
       },
       sidebar: [
         {
-          label: "Introduction to dataflows",
+          label: "Introduction to datakits",
           items: [
-            { label: "What are dataflows?", slug: "intro/intro" },
+            { label: "What are datakits?", slug: "intro/intro" },
             { label: "Hello world!", slug: "intro/helloworld" },
             { label: "Working with tabular data", slug: "intro/tabulardata" },
             { label: "Handling multiple runs", slug: "intro/multipleruns" },
@@ -33,7 +33,7 @@ export default defineConfig({
           label: "Advanced tutorials",
           items: [
             {
-              label: "Creating a model fitting dataflow",
+              label: "Creating a model fitting datakit",
               slug: "advanced/modelfit",
             },
             { label: "Using metaschemas", slug: "advanced/metaschemas" },

diff --git a/src/content/docs/advanced/metaschemas.mdx b/src/content/docs/advanced/metaschemas.mdx
@@ -13,11 +13,11 @@ column and an unknown number of Y columns.
 To handle this, we can use metaschema definitions. Metaschemas describe the
 allowable structure of a tabular data resource.
 
-In this tutorial, we'll build on the dataflow from the
+In this tutorial, we'll build on the datakit from the
 [tabular data tutorial](/intro/tabulardata). You can find it in the
-[helloworld-dataflow](https://github.com/opendatastudio/helloworld-dataflow)
+[helloworld-datakit](https://github.com/opendatastudio/helloworld-datakit)
 repository, under the
-["tabulardata" algorithm folder](https://github.com/opendatastudio/helloworld-dataflow/tree/main/tabulardata).
+["tabulardata" algorithm folder](https://github.com/opendatastudio/helloworld-datakit/tree/main/tabulardata).
 
 ## Algorithm overview
 
@@ -253,7 +253,7 @@ generated schema:
   ...
 ```
 
-Using the metaschema, the dataflow has automatically generated a 7-column schema
+Using the metaschema, the datakit has automatically generated a 7-column schema
 for this dataset.
 
 ## Executing the algorithm
@@ -283,4 +283,4 @@ opends show result
 ```
 
 Now you know how to handle dynamic input datasets by using metaschemas in a
-dataflow.
+datakit.
diff --git a/src/content/docs/advanced/modelfit.mdx b/src/content/docs/advanced/modelfit.mdx
@@ -1,6 +1,6 @@
 ---
-title: Creating a model fitting dataflow
-description: Creating a model fitting dataflow
+title: Creating a model fitting datakit
+description: Creating a model fitting datakit
 ---
 
 import { FileTree } from "@astrojs/starlight/components";
@@ -10,25 +10,25 @@ import linearGraph from "./modelfit_linear.png";
 import quadraticGraph from "./modelfit_quadratic.png";
 
 In this tutorial, we'll explore a practical example of creating a model fitting
-dataflow.
+datakit.
 
-By the end, we'll have created a dataflow that can fit a linear or quadratic
+By the end, we'll have created a datakit that can fit a linear or quadratic
 model to an x/y input dataset using `scipy`. We'll add a **view** to graph the
 resulting fit curve against the data, and use **relationships** to handle
 different parameters for each model.
 
 <Aside>
   The complete code for this tutorial is available
-  [here](https://github.com/opendatastudio/helloworld-dataflow/tree/main/modelfit).
+  [here](https://github.com/opendatastudio/helloworld-datakit/tree/main/modelfit).
 </Aside>
 
-## Creating a new dataflow
+## Creating a new datakit
 
-First, we need to create a new dataflow:
+First, we need to create a new datakit:
 
 ```bash
 opends new modelfit
-cd modelfit-dataflow
+cd modelfit-datakit
 ```
 
 ## Creating the algorithm configuration
@@ -497,7 +497,7 @@ opends show fit
 In order to be able to analyse whether our fit was a good one, we need to be
 able to graph the calculated fit curve.
 
-We can create a graph visualisation by defining a `view` on our dataflow.
+We can create a graph visualisation by defining a `view` on our datakit.
 
 First, create a views directory under the algorithm folder:
 
@@ -517,7 +517,7 @@ Now we can write a view configuration for our fit graph:
 }
 ```
 
-Here we are telling our dataflow that we want to use a matplotlib script,
+Here we are telling our datakit that we want to use a matplotlib script,
 `fitGraph.py`, to generate this view, and we want to execute this graph script
 in the `python-run-base` container.
 
@@ -569,7 +569,7 @@ In order to fit to a quadratic model, we need a way to dynamically add an extra
 parameter whenever the value of the `model` algorithm variable is set to
 `quadratic`.
 
-We can do this using **relationships**. Relationships in a dataflow describe
+We can do this using **relationships**. Relationships in a datakit describe
 relationships between variables. Whenever a variable changes, the CLI checks if
 there are any relationships that apply to that variable value, and if so updates
 any associated variables to the required values.
@@ -847,4 +847,4 @@ opends view fitGraph
   width="600"
 />
 
-Congratulations, you've written and executed your own model fitting dataflow.
+Congratulations, you've written and executed your own model fitting datakit.
diff --git a/src/content/docs/index.md b/src/content/docs/index.md
@@ -7,7 +7,7 @@ hero:
   # image:
   #   file: ../../assets/logo_dark.svg
   actions:
-    - text: Introduction to dataflows
+    - text: Introduction to datakits
       link: /intro/intro/
       icon: right-arrow
     - text: The opendata.studio project

diff --git a/src/content/docs/intro/containers.mdx b/src/content/docs/intro/containers.mdx
@@ -1,6 +1,6 @@
 ---
 title: Using custom Docker containers
-description: Using custom Docker containers with dataflows
+description: Using custom Docker containers with datakits
 ---
 
 import { FileTree } from "@astrojs/starlight/components";

diff --git a/src/content/docs/intro/helloworld.mdx b/src/content/docs/intro/helloworld.mdx
@@ -1,16 +1,16 @@
 ---
 title: Hello world!
-description: A very simple dataflow tutorial
+description: A very simple datakit tutorial
 ---
 
 import { FileTree } from "@astrojs/starlight/components";
 import { Aside } from "@astrojs/starlight/components";
 
-Let's create a simple dataflow to add some numbers together.
+Let's create a simple datakit to add some numbers together.
 
 <Aside type="tip">
   All of the examples covered in this tutorial are available at our
-  [helloworld-dataflow](https://github.com/opendatastudio/helloworld-dataflow)
+  [helloworld-datakit](https://github.com/opendatastudio/helloworld-datakit)
   repository.
 </Aside>
 
@@ -45,50 +45,50 @@ cd python-run-base
 ./build.sh
 ```
 
-## Creating a new dataflow
+## Creating a new datakit
 
-Let's create a new dataflow. The `opends` CLI tool provides a convenient command
+Let's create a new datakit. The `opends` CLI tool provides a convenient command
 to do this:
 
 ```bash
 opends new helloworld
 ```
 
-This will create a new dataflow inside a directory called `helloworld-dataflow`.
+This will create a new datakit inside a directory called `helloworld-datakit`.
 
-This is what your new dataflow should look like:
+This is what your new datakit should look like:
 
 {/* prettier-ignore */}
 <FileTree>
-- helloworld-dataflow/
+- helloworld-datakit/
   - helloworld/
     - algorithm.json
     - algorithm.py
-  - dataflow.json
+  - datakit.json
 </FileTree>
 
-This simple starter dataflow contains an algorithm that takes a single numerical
+This simple starter datakit contains an algorithm that takes a single numerical
 input and multiplies it by 2.
 
 ## Your first run
 
-Let's run your new dataflow. First, initialise the default run:
+Let's run your new datakit. First, initialise the default run:
 
 ```bash
-cd helloworld-dataflow
+cd helloworld-datakit
 opends init
 ```
 
-This will create a directory called `helloworld.run` in the root of your
-dataflow directory.
+This will create a directory called `helloworld.run` in the root of your datakit
+directory.
 
 {/* prettier-ignore */}
 <FileTree>
-- helloworld-dataflow/
+- helloworld-datakit/
   - helloworld/
   - **helloworld.run/**
     - run.json
-  - dataflow.json
+  - datakit.json
 </FileTree>
 
 This directory stores all the information about your run so that others can
@@ -207,7 +207,7 @@ input definition to the `inputs` list:
 ```
 
 Save and close `helloworld/algorithm.json`. We will need to initialise the
-dataflow again to add this new variable to the run configuration:
+datakit again to add this new variable to the run configuration:
 
 ```bash
 opends reset  # This deletes any existing runs

diff --git a/src/content/docs/intro/intro.md b/src/content/docs/intro/intro.md
@@ -1,19 +1,19 @@
 ---
-title: What are dataflows?
-description: An introduction to opendata.studio dataflows
+title: What are datakits?
+description: An introduction to opendata.studio datakits
 ---
 
-In opendata.studio, a dataflow is a structured way to organise and bundle a data
+In opendata.studio, a datakit is a structured way to organise and bundle a data
 analysis in a reusable and reproducible format.
 
-A dataflow contains:
+A datakit contains:
 
 - The analysis algorithm and its execution environment
 - Input and output data, along with configurable options
 - Saved run states from algorithm executions
 - User interface definitions
 
-These elements are defined by individual components inside each dataflow:
+These elements are defined by individual components inside each datakit:
 
 - **resources**: Store tabular data
 - **algorithms** and **containers**: Define the algorithm code and execution
@@ -29,5 +29,5 @@ tracked, creating a reproducible record of the analysis process. Once an
 analysis is completed, the results and process can be easily shared or
 published, ensuring transparency and allowing others to build upon your work.
 
-This tutorial will introduce you to working with dataflows. To begin with, let's
-create a simple dataflow containing an algorithm that adds two numbers together.
+This tutorial will introduce you to working with datakits. To begin with, let's
+create a simple datakit containing an algorithm that adds two numbers together.
diff --git a/src/content/docs/intro/multipleruns.mdx b/src/content/docs/intro/multipleruns.mdx
@@ -100,14 +100,14 @@ opends set function "median"
 Let's create a new run to find the median of this dataset.
 
 ```bash
-opends reset  # Reset the dataflow to a clean state again
+opends reset  # Reset the datakit to a clean state again
 opends init helloworld.median
 ```
 
 This command tells the CLI tool that we want to create a new run named
 `helloworld.median`. `helloworld` specifies the algorithm that we want to use
 for this run, and `median` is a user-chosen name for the run. The algorithm name
-before the period must match an algorithm listed in `dataflow.json`. The string
+before the period must match an algorithm listed in `datakit.json`. The string
 following the period can be anything you like.
 
 You can check which run is currently active by running:
@@ -143,11 +143,11 @@ opends show result
 
 ## The run folder structure
 
-Your dataflow should now look like this:
+Your datakit should now look like this:
 
 {/* prettier-ignore */}
 <FileTree>
-- helloworld-dataflow/
+- helloworld-datakit/
   - helloworld/
     - algorithm.json
     - algorithm.py
@@ -160,7 +160,7 @@ Your dataflow should now look like this:
       - data.json
       - result.json
     - views/
-  - dataflow.json
+  - datakit.json
 </FileTree>
 
 As you can see, the resources appear in both the run directory

diff --git a/src/content/docs/intro/repositories.mdx b/src/content/docs/intro/repositories.mdx
@@ -1,21 +1,21 @@
 ---
 title: Tracking with Git
-description: Tracking dataflow runs in a repository
+description: Tracking datakit runs in a repository
 ---
 
 import { FileTree } from "@astrojs/starlight/components";
 import { Aside } from "@astrojs/starlight/components";
 
-This guide explains how to track dataflow runs in a version control repository
+This guide explains how to track datakit runs in a version control repository
 using Git.
 
-One of the key advantages of using dataflows is the ability to version control
+One of the key advantages of using datakits is the ability to version control
 your entire analysis process. By tracking these changes with Git, every step of
 your analysis becomes reproducible and shareable, allowing others with access to
 your repository to replicate your work.
 
 This is a brief introduction for those who are new to Git to get started with
-tracking dataflow changes.
+tracking datakit changes.
 
 ## Installing Git
 
@@ -24,10 +24,10 @@ Before getting started, you'll need to install Git. Follow the instructions
 
 ## Initialising your repository
 
-Once Git is installed, you can initialise your dataflow as a Git repository:
+Once Git is installed, you can initialise your datakit as a Git repository:
 
 ```bash
-cd helloworld-dataflow       # Navigate to your dataflow root folder
+cd helloworld-datakit       # Navigate to your datakit root folder
 git init                        # Initialise a Git repository
 ```
 
@@ -47,12 +47,12 @@ git add --all
 git commit -m "Initial commit"
 ```
 
-Now, all files in your dataflow are tracked. You can revert to this state at any
+Now, all files in your datakit are tracked. You can revert to this state at any
 time if needed.
 
 ## Tracking changes
 
-After running an analysis in your dataflow, some files will be modified.
+After running an analysis in your datakit, some files will be modified.
 
 For example, if you run the following commands:
 
@@ -75,7 +75,7 @@ On branch main
 Changes not staged for commit:
   (use "git add <file>..." to update what will be committed)
   (use "git restore <file>..." to discard changes in working directory)
-        modified:   dataflow.json
+        modified:   datakit.json
 
 Untracked files:
   (use "git add <file>..." to include in what will be committed)
@@ -97,7 +97,7 @@ runs.
 
 ## Publishing to GitHub
 
-To share your dataflow or make it available publicly, you can upload your
+To share your datakit or make it available publicly, you can upload your
 repository to GitHub.
 
 First, create a new repository on GitHub and copy its URL.
@@ -114,5 +114,5 @@ Push your changes to GitHub:
 git push origin mian
 ```
 
-Your dataflow is now published to GitHub and can be accessed by others if your
+Your datakit is now published to GitHub and can be accessed by others if your
 repository is set to public.