Skip to content

Commit

Permalink
Add sample data structure, local validation scripts, and documentation
Browse files Browse the repository at this point in the history
Also - update errors/clarity in datapackage.json template
  • Loading branch information
e-lo committed Jul 14, 2023
1 parent 879ae69 commit f026881
Show file tree
Hide file tree
Showing 24 changed files with 1,393 additions and 646 deletions.
31 changes: 31 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,37 @@ Human-friendlier documentation is auto-generated and available at:
- [Architecture](https://tides-transit.github.io/TIDES/main/architecture)
- [Table Schemas](https://tides-transit.github.io/TIDES/main/tables)

## Example Data

Sample data can be found in the `/samples` directory, with one directory for each example.

## Validating TIDES data

TIDES data with a valid [`datapackage.json`](#data-package) can be easily validated using the [frictionless framework](https://framework.frictionlessdata.io/), which can be installed and invoke as follows:

```bash
pip install frictionless
frictionless validate path/to/your/datapackage.json
```

### Data Package

To validate a package of TIDES data, you must add a frictionless-compliant [`datapackage.json`](https://specs.frictionlessdata.io/data-package/) alongside your data which describes which files should be validated to which schemas. Most of this can be copied from [`/data/template/datapackage.json`](https://raw.githubusercontent.com/TIDES-transit/TIDES/main/data/template/datapackage.json).

Once this is created, mapping the data files to the schema, simply run:

```sh
frictionless validate datapackage.json
```

### Specific files

Specific files can be validated by running the frictionless framework against them and their corresponding schemas as follows:

```sh
frictionless validate vehicles.csv --schema https://raw.githubusercontent.com/TIDES-transit/TIDES/main/spec/schema.vehicles.json
```

## Contributing to TIDES

Those who want to help with the development of the TIDES specification should review the guidance in [CONTRIBUTING.md](CONTRIBUTING.md).
Expand Down
68 changes: 68 additions & 0 deletions bin/validate-data-package
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
#!/usr/bin/env bash

# Script: validate_frictionless_package.sh
# Description: Bash script to validate a Frictionless Data Package using the Frictionless CLI.
# Usage: validate_frictionless_package.sh [-v tides_version | -l local_schema_location] [-d dataset_location]
# -v tides_version: Optional. Specify the version of the TIDES specification or 'local' to
# use a local schema. Default is to use the schema specified in the datapackage.
# -l local_schema_location: Optional. Specify the location of the local schema directory.
# Default is '../spec'. Is only used if tides_version = local.
# -d dataset_location: Optional. Specify the location of the TIDES datapackage.json.
# Default is the current directory.

# Set default values
tides_version=""
local_schema_location="../spec"
dataset_location="."

# Parse command-line arguments
while getopts ":v:l:d:" opt; do
case $opt in
v)
tides_version=$OPTARG
;;
l)
local_schema_location=$OPTARG
;;
d)
dataset_location=$OPTARG
;;
\?)
echo "Invalid option: -$OPTARG" >&2
exit 1
;;
esac
done

# Create a temporary data package if using a different schema reference or a local schema
tmp_datapackage=""
if [ "$tides_version" != "" ] then
tmp_datapackage=$(mktemp)
cp "$dataset_location/datapackage.json" "$tmp_datapackage"
fi

# Set the schema URL based on the option chosen
schema_url=""
if [ "$tides_version" == "local" ]; then
schema_path_prefix="$local_schema_location"
else
schema_path_prefix="https://raw.githubusercontent.com/TIDES-transit/TIDES/$tides_version/spec"
fi

# Update the 'schema' property in the temporary copy of the datapackage.json file, if applicable
if [ "$tmp_datapackage" != "" ]; then
schema_file=$(echo "$tmp_datapackage" | sed 's/\//\\\//g')
sed -E -i "s/\"schema\": \"[^\/]+\.schema\.json\"/\"schema\": \"$schema_path_prefix\/\${schema_file##*\/}\"/g" "$tmp_datapackage"
dataset_location="$tmp_datapackage"
fi

# Validate the data package JSON against the TIDES schema
./validate-data-package-json.sh -v "$tides_version" -d "$dataset_location" -l "$local_schema_location"

# Validate the Frictionless Data Package using the Frictionless CLI
frictionless validate "$dataset_location"

# Remove the temporary data package file, if applicable
if [ "$tmp_datapackage" != "" ]; then
rm "$tmp_datapackage"
fi
64 changes: 64 additions & 0 deletions bin/validate-data-package-json
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
#!/usr/bin/env bash

# Script to validate a local JSON file against a schema specified in a GitHub repository.
# Usage: validate-data-package-json.sh [-r ref | -l local_schema_location] [-f datapackage_file]
# -r ref: Optional. Specify the ref name of the GitHub repository. Default is 'main'.
# -l local_schema_location: Optional. Specify the location of the local schema directory.
# -f datapackage_file: Optional. Specify the location of the datapackage.json file. Default is 'datapackage.json' in the execution directory.

# Check if jsonschema-cli is installed
command -v jsonschema-cli >/dev/null 2>&1 || {
echo >&2 "jsonschema-cli is required but not found. You can install it using 'pip install jsonschema-cli'. Aborting."
exit 1
}

# Set default values
ref="main"
local_schema_location=""
datapackage_file="datapackage.json"

# Parse command-line arguments
while getopts ":r:l:f:" opt; do
case $opt in
r)
ref=$OPTARG
;;
l)
local_schema_location=$OPTARG
;;
f)
datapackage_file=$OPTARG
;;
\?)
echo "Invalid option: -$OPTARG" >&2
exit 1
;;
esac
done

echo "Validating data package file in $dataset_location"

# Set the temporary directory path
temp_dir=$(mktemp -d)

# Set the schema file path based on the option chosen
schema_file=""
if [ "$local_schema_location" != "" ]; then
schema_file="$local_schema_location/tides-data-package.json"
else
# Download the schema file to the temporary directory
schema_url="https://raw.githubusercontent.com/TIDES-transit/TIDES/$ref/spec/tides-data-package.json"
schema_file="$temp_dir/data-package.json"

if curl -s --head "$schema_url/tides-data-package.json" >/dev/null; then
echo "Schema file not found on GitHub for the specified TIDES version: $tides_version"
exit 1
fi
curl -o "$schema_file" "$schema_url"
fi

# Validate datapackage against the downloaded schema
jsonschema-cli validate "$schema_file" "$datapackage_file"

# Clean up the temporary directory
rm -rf "$temp_dir"
59 changes: 59 additions & 0 deletions docs/datapackage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Data Package

TIDES data must include a `datapackage.json` in the format specified by the [`tides-data-package` json schema](https://raw.githubusercontent.com/TIDES-transit/TIDES/main/spec/tides-data-package.json), which is an extension of the [frictionless data package](https://specs.frictionlessdata.io/data-package/) schema.

You may create your own `datapackage.json` based on the documentaiton or start with the provided [template](#template), but don't forget to [validate](#validation) it to make sure it is in the correct format!

## Data Package Format

{{ frictionless_data_package('spec/tides-data-package.json') }}

## Tabular Data Resource

Required and recommended fields for each `tabluar-data-resource` are as follows:

{{ frictionless_data_package('spec/tides-data-package.json',sub_schema="resources") }}

## Template

The canonical `datapackage.json` template is available at [`/data/template/TIDES/datapackage.json`](https://raw.githubusercontent.com/TIDES-transit/TIDES/main/samples/template/TIDES/datapackage.json).

!!! warning
This version of `tides-data-package` template is dependent on the version of the documentation you are viewing and only represents the canonical `tides-data-package` template if you are viewing the `main` documentation version.

{{ include_file('samples/template/TIDES/datapackage.json',code_type='json') }}

## Validation

There are lots of options for validating your `datapackage.json` file including:

- [Command Line Interface (CLI) Script](#cli)
- [Various online websites](#point-and-drool)

### CLI

You can easily validate your data package file with the script provided in [`/bin/validate-data-package-json`](https://raw.githubusercontent.com/TIDES-transit/TIDES/main/bin/validate-data-package-json)

??? tip "installation requirements"

Make sure you have jsonschema-cli installed. You can install it specifically or with all of the other suggested tools using one of the commands below:

```sh
pip install jsonschema-cli
pip install -r requirements.txt
```

```sh title="usage"
validate-data-package-json -f my-datapackage.json
```

{{ include_file('bin/validate-data-package-json',code_type='sh') }}

### Point-and-Drool

Because a `tides-data-package` is just a json-schema, you can use the myriad of different json-schema validator out there on the web. Use the [canonical `tides-data-package`](https://raw.githubusercontent.com/TIDES-transit/TIDES/main/spec/tides-data-package.json) or copy and paste the version from below.

!!! warning
This version of `tides-data-package` is dependent on the version of the documentation you are viewing and only represents the canonical `tides-data-package` if you are viewing the `main` documentation version.

{{ include_file('spec/tides-data-package.json',code_type='json') }}
9 changes: 9 additions & 0 deletions docs/samples.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Sample Data

Sample data can be found in the `/samples` directory, with one directory for each data sample.

{{ include_file('samples/README.md')}}

## Data List

{{ list_samples('samples') }}
Loading

0 comments on commit f026881

Please sign in to comment.