Skip to content

Commit

Permalink
Merge pull request #2 from opendatastudio/1-datapackage-datakit
Browse files Browse the repository at this point in the history
dataflow -> datakit
  • Loading branch information
JamesWilmot authored Oct 20, 2024
2 parents 5ec72e9 + 972f91f commit 275a6f7
Show file tree
Hide file tree
Showing 12 changed files with 81 additions and 81 deletions.
6 changes: 3 additions & 3 deletions astro.config.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,9 @@ export default defineConfig({
},
sidebar: [
{
label: "Introduction to dataflows",
label: "Introduction to datakits",
items: [
{ label: "What are dataflows?", slug: "intro/intro" },
{ label: "What are datakits?", slug: "intro/intro" },
{ label: "Hello world!", slug: "intro/helloworld" },
{ label: "Working with tabular data", slug: "intro/tabulardata" },
{ label: "Handling multiple runs", slug: "intro/multipleruns" },
Expand All @@ -33,7 +33,7 @@ export default defineConfig({
label: "Advanced tutorials",
items: [
{
label: "Creating a model fitting dataflow",
label: "Creating a model fitting datakit",
slug: "advanced/modelfit",
},
{ label: "Using metaschemas", slug: "advanced/metaschemas" },
Expand Down
10 changes: 5 additions & 5 deletions src/content/docs/advanced/metaschemas.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,11 @@ column and an unknown number of Y columns.
To handle this, we can use metaschema definitions. Metaschemas describe the
allowable structure of a tabular data resource.

In this tutorial, we'll build on the dataflow from the
In this tutorial, we'll build on the datakit from the
[tabular data tutorial](/intro/tabulardata). You can find it in the
[helloworld-dataflow](https://github.com/opendatastudio/helloworld-dataflow)
[helloworld-datakit](https://github.com/opendatastudio/helloworld-datakit)
repository, under the
["tabulardata" algorithm folder](https://github.com/opendatastudio/helloworld-dataflow/tree/main/tabulardata).
["tabulardata" algorithm folder](https://github.com/opendatastudio/helloworld-datakit/tree/main/tabulardata).

## Algorithm overview

Expand Down Expand Up @@ -253,7 +253,7 @@ generated schema:
...
```

Using the metaschema, the dataflow has automatically generated a 7-column schema
Using the metaschema, the datakit has automatically generated a 7-column schema
for this dataset.

## Executing the algorithm
Expand Down Expand Up @@ -283,4 +283,4 @@ opends show result
```

Now you know how to handle dynamic input datasets by using metaschemas in a
dataflow.
datakit.
24 changes: 12 additions & 12 deletions src/content/docs/advanced/modelfit.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Creating a model fitting dataflow
description: Creating a model fitting dataflow
title: Creating a model fitting datakit
description: Creating a model fitting datakit
---

import { FileTree } from "@astrojs/starlight/components";
Expand All @@ -10,25 +10,25 @@ import linearGraph from "./modelfit_linear.png";
import quadraticGraph from "./modelfit_quadratic.png";

In this tutorial, we'll explore a practical example of creating a model fitting
dataflow.
datakit.

By the end, we'll have created a dataflow that can fit a linear or quadratic
By the end, we'll have created a datakit that can fit a linear or quadratic
model to an x/y input dataset using `scipy`. We'll add a **view** to graph the
resulting fit curve against the data, and use **relationships** to handle
different parameters for each model.

<Aside>
The complete code for this tutorial is available
[here](https://github.com/opendatastudio/helloworld-dataflow/tree/main/modelfit).
[here](https://github.com/opendatastudio/helloworld-datakit/tree/main/modelfit).
</Aside>

## Creating a new dataflow
## Creating a new datakit

First, we need to create a new dataflow:
First, we need to create a new datakit:

```bash
opends new modelfit
cd modelfit-dataflow
cd modelfit-datakit
```

## Creating the algorithm configuration
Expand Down Expand Up @@ -497,7 +497,7 @@ opends show fit
In order to be able to analyse whether our fit was a good one, we need to be
able to graph the calculated fit curve.

We can create a graph visualisation by defining a `view` on our dataflow.
We can create a graph visualisation by defining a `view` on our datakit.

First, create a views directory under the algorithm folder:

Expand All @@ -517,7 +517,7 @@ Now we can write a view configuration for our fit graph:
}
```

Here we are telling our dataflow that we want to use a matplotlib script,
Here we are telling our datakit that we want to use a matplotlib script,
`fitGraph.py`, to generate this view, and we want to execute this graph script
in the `python-run-base` container.

Expand Down Expand Up @@ -569,7 +569,7 @@ In order to fit to a quadratic model, we need a way to dynamically add an extra
parameter whenever the value of the `model` algorithm variable is set to
`quadratic`.

We can do this using **relationships**. Relationships in a dataflow describe
We can do this using **relationships**. Relationships in a datakit describe
relationships between variables. Whenever a variable changes, the CLI checks if
there are any relationships that apply to that variable value, and if so updates
any associated variables to the required values.
Expand Down Expand Up @@ -847,4 +847,4 @@ opends view fitGraph
width="600"
/>

Congratulations, you've written and executed your own model fitting dataflow.
Congratulations, you've written and executed your own model fitting datakit.
2 changes: 1 addition & 1 deletion src/content/docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ hero:
# image:
# file: ../../assets/logo_dark.svg
actions:
- text: Introduction to dataflows
- text: Introduction to datakits
link: /intro/intro/
icon: right-arrow
- text: The opendata.studio project
Expand Down
2 changes: 1 addition & 1 deletion src/content/docs/intro/containers.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Using custom Docker containers
description: Using custom Docker containers with dataflows
description: Using custom Docker containers with datakits
---

import { FileTree } from "@astrojs/starlight/components";
Expand Down
34 changes: 17 additions & 17 deletions src/content/docs/intro/helloworld.mdx
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
---
title: Hello world!
description: A very simple dataflow tutorial
description: A very simple datakit tutorial
---

import { FileTree } from "@astrojs/starlight/components";
import { Aside } from "@astrojs/starlight/components";

Let's create a simple dataflow to add some numbers together.
Let's create a simple datakit to add some numbers together.

<Aside type="tip">
All of the examples covered in this tutorial are available at our
[helloworld-dataflow](https://github.com/opendatastudio/helloworld-dataflow)
[helloworld-datakit](https://github.com/opendatastudio/helloworld-datakit)
repository.
</Aside>

Expand Down Expand Up @@ -45,50 +45,50 @@ cd python-run-base
./build.sh
```

## Creating a new dataflow
## Creating a new datakit

Let's create a new dataflow. The `opends` CLI tool provides a convenient command
Let's create a new datakit. The `opends` CLI tool provides a convenient command
to do this:

```bash
opends new helloworld
```

This will create a new dataflow inside a directory called `helloworld-dataflow`.
This will create a new datakit inside a directory called `helloworld-datakit`.

This is what your new dataflow should look like:
This is what your new datakit should look like:

{/* prettier-ignore */}
<FileTree>
- helloworld-dataflow/
- helloworld-datakit/
- helloworld/
- algorithm.json
- algorithm.py
- dataflow.json
- datakit.json
</FileTree>

This simple starter dataflow contains an algorithm that takes a single numerical
This simple starter datakit contains an algorithm that takes a single numerical
input and multiplies it by 2.

## Your first run

Let's run your new dataflow. First, initialise the default run:
Let's run your new datakit. First, initialise the default run:

```bash
cd helloworld-dataflow
cd helloworld-datakit
opends init
```

This will create a directory called `helloworld.run` in the root of your
dataflow directory.
This will create a directory called `helloworld.run` in the root of your datakit
directory.

{/* prettier-ignore */}
<FileTree>
- helloworld-dataflow/
- helloworld-datakit/
- helloworld/
- **helloworld.run/**
- run.json
- dataflow.json
- datakit.json
</FileTree>

This directory stores all the information about your run so that others can
Expand Down Expand Up @@ -207,7 +207,7 @@ input definition to the `inputs` list:
```

Save and close `helloworld/algorithm.json`. We will need to initialise the
dataflow again to add this new variable to the run configuration:
datakit again to add this new variable to the run configuration:

```bash
opends reset # This deletes any existing runs
Expand Down
14 changes: 7 additions & 7 deletions src/content/docs/intro/intro.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
---
title: What are dataflows?
description: An introduction to opendata.studio dataflows
title: What are datakits?
description: An introduction to opendata.studio datakits
---

In opendata.studio, a dataflow is a structured way to organise and bundle a data
In opendata.studio, a datakit is a structured way to organise and bundle a data
analysis in a reusable and reproducible format.

A dataflow contains:
A datakit contains:

- The analysis algorithm and its execution environment
- Input and output data, along with configurable options
- Saved run states from algorithm executions
- User interface definitions

These elements are defined by individual components inside each dataflow:
These elements are defined by individual components inside each datakit:

- **resources**: Store tabular data
- **algorithms** and **containers**: Define the algorithm code and execution
Expand All @@ -29,5 +29,5 @@ tracked, creating a reproducible record of the analysis process. Once an
analysis is completed, the results and process can be easily shared or
published, ensuring transparency and allowing others to build upon your work.

This tutorial will introduce you to working with dataflows. To begin with, let's
create a simple dataflow containing an algorithm that adds two numbers together.
This tutorial will introduce you to working with datakits. To begin with, let's
create a simple datakit containing an algorithm that adds two numbers together.
10 changes: 5 additions & 5 deletions src/content/docs/intro/multipleruns.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -100,14 +100,14 @@ opends set function "median"
Let's create a new run to find the median of this dataset.

```bash
opends reset # Reset the dataflow to a clean state again
opends reset # Reset the datakit to a clean state again
opends init helloworld.median
```

This command tells the CLI tool that we want to create a new run named
`helloworld.median`. `helloworld` specifies the algorithm that we want to use
for this run, and `median` is a user-chosen name for the run. The algorithm name
before the period must match an algorithm listed in `dataflow.json`. The string
before the period must match an algorithm listed in `datakit.json`. The string
following the period can be anything you like.

You can check which run is currently active by running:
Expand Down Expand Up @@ -143,11 +143,11 @@ opends show result

## The run folder structure

Your dataflow should now look like this:
Your datakit should now look like this:

{/* prettier-ignore */}
<FileTree>
- helloworld-dataflow/
- helloworld-datakit/
- helloworld/
- algorithm.json
- algorithm.py
Expand All @@ -160,7 +160,7 @@ Your dataflow should now look like this:
- data.json
- result.json
- views/
- dataflow.json
- datakit.json
</FileTree>

As you can see, the resources appear in both the run directory
Expand Down
22 changes: 11 additions & 11 deletions src/content/docs/intro/repositories.mdx
Original file line number Diff line number Diff line change
@@ -1,21 +1,21 @@
---
title: Tracking with Git
description: Tracking dataflow runs in a repository
description: Tracking datakit runs in a repository
---

import { FileTree } from "@astrojs/starlight/components";
import { Aside } from "@astrojs/starlight/components";

This guide explains how to track dataflow runs in a version control repository
This guide explains how to track datakit runs in a version control repository
using Git.

One of the key advantages of using dataflows is the ability to version control
One of the key advantages of using datakits is the ability to version control
your entire analysis process. By tracking these changes with Git, every step of
your analysis becomes reproducible and shareable, allowing others with access to
your repository to replicate your work.

This is a brief introduction for those who are new to Git to get started with
tracking dataflow changes.
tracking datakit changes.

## Installing Git

Expand All @@ -24,10 +24,10 @@ Before getting started, you'll need to install Git. Follow the instructions

## Initialising your repository

Once Git is installed, you can initialise your dataflow as a Git repository:
Once Git is installed, you can initialise your datakit as a Git repository:

```bash
cd helloworld-dataflow # Navigate to your dataflow root folder
cd helloworld-datakit # Navigate to your datakit root folder
git init # Initialise a Git repository
```

Expand All @@ -47,12 +47,12 @@ git add --all
git commit -m "Initial commit"
```

Now, all files in your dataflow are tracked. You can revert to this state at any
Now, all files in your datakit are tracked. You can revert to this state at any
time if needed.

## Tracking changes

After running an analysis in your dataflow, some files will be modified.
After running an analysis in your datakit, some files will be modified.

For example, if you run the following commands:

Expand All @@ -75,7 +75,7 @@ On branch main
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: dataflow.json
modified: datakit.json
Untracked files:
(use "git add <file>..." to include in what will be committed)
Expand All @@ -97,7 +97,7 @@ runs.

## Publishing to GitHub

To share your dataflow or make it available publicly, you can upload your
To share your datakit or make it available publicly, you can upload your
repository to GitHub.

First, create a new repository on GitHub and copy its URL.
Expand All @@ -114,5 +114,5 @@ Push your changes to GitHub:
git push origin mian
```

Your dataflow is now published to GitHub and can be accessed by others if your
Your datakit is now published to GitHub and can be accessed by others if your
repository is set to public.
Loading

0 comments on commit 275a6f7

Please sign in to comment.