Overview • Quick start • Schema development • Test • Debug • Containers
Requirements:
Docker compose environment (based on pycsw) for development and testing with CKAN Open Data portals.1
Tip
It can be easily tested with a CKAN-type Open Data portal deployment: mjanez/ckan-docker2.
Available components:
- pycsw: The pycsw app. An OARec and OGC CSW server implementation written in Python.
- ckan2pycsw: Software to achieve interoperability with the open data portals based on CKAN. To do this,
ckan2pycsw
reads data from an instance using the CKAN API, generates INSPIRE ISO-19115/ISO-19139 3 metadata using pygeometa, or another custom schema, and populates a pycsw instance that exposes the metadata using CSW and OAI-PMH.
Copy the .env.example
template and configure by changing the .env
file. Change PYCSW_URL
and CKAN_URL
, as well as the published port PYCSW_PORT
, if needed.
cp .env.example .env
Select the CKAN Schema (PYCSW_CKAN_SCHEMA
), and the pycsw output schema (PYCSW_OUPUT_SCHEMA
):
- Default:
PYCSW_CKAN_SCHEMA=iso19139_geodcatap PYCSW_OUPUT_SCHEMA=iso19139_inspire ... SSL_UNVERIFIED_MODE=True
- Avalaible:
-
CKAN metadata schema (
PYCSW_CKAN_SCHEMA
):iso19139_geodcatap
, default: [WIP] Schema based on GeoDCAT-AP custom dataset schema.iso19139_base
: [WIP] Base schema.
-
pycsw metadata schema (
PYCSW_OUPUT_SCHEMA
):iso19139_inspire
, default: Customised schema based on ISO 19139 INSPIRE metadata schema. 4iso19139
: Standard pycsw schema based on ISO 19139.
-
Change SSL_UNVERIFIED_MODE
to avoid SSL errors when using a self-signed certificate in CKAN development
.
- Default:
SSL_UNVERIFIED_MODE=True
Warning
Enabling SSL_UNVERIFIED_MODE
can expose your application to security risks by allowing unverified SSL certificates. Use this setting only in a trusted development environment and never in production.
To deploy the environment, docker compose
will build the latest source in the repo.
If you can deploy a 5 minutes
image, use the stable image (ghcr.io/mjanez/ckan-pycsw:main
) with docker-compose.ghcr.yml
git clone https://github.com/mjanez/ckan-pycsw
cd ckan-pycsw
docker compose up --build
# Github main registry image
docker compose -f docker-compose.ghcr.yml --build
# Or detached mode
docker compose up -d --build
Tip
Deploy the dev (multistage build) docker-compose.dev.yml
with:
docker compose -f docker-compose.dev.yml up --build
If needed, to build a specific container simply run:
docker build -t target_name xxxx/
Requirements:
>=
Python 3.9
Dependencies:
python3 -m pip install --user pipx
python3 -m pipx ensurepath --force
# You will need to open a new terminal or re-login for the PATH changes to take effect.
pipx install pdm
pdm install --no-self --group prod
Configuration:
PYCSW_URL=http://localhost:8000 envsubst < ckan-pycsw/conf/pycsw.conf.template > pycsw.conf
# Or update pycsw.conf vars manually
vi pycsw.conf
Generate database and add:
rm -f cite.db
# Remember create and update .env vars. Next add to .env environment:
bash doc/scripts/00_ennvars.sh
Run ckan2pycsw:
PYCSW_CONFIG=pycsw.conf pdm run python3 ckan2pycsw/ckan2pycsw.py
User-defined metadata schemas can be added, both for CKAN metadata input: ckan2pycsw/schemas/ckan/*
and for output schemas in pycsw: ckan2pycsw/schemas/pygeometa/*
.
You can customise and extend the metadata schemas that serve as templates to import as many metadata elements as possible from a custom schema into CKAN. e.g. Based on a custom schema from ckanext-scheming
.
-
Create a new folder in
schemas/ckan/
with the name intended for the schema. e.g.iso19139_spain
. -
Create the
main.j2
with the Jinja template to render the metadata.Examples in: `schemas/ckan/iso19139_geodcatap -
Add all needed mappings (
.yaml
) to a new folder inckan2pycsw/mappings/
. e.g.iso19139_spain
-
Update
ckan2pycsw/mappings/ckan-pycsw_assigments.yaml
to include the pycsw and ckan schema mapping. e.g.iso19139_geodcatap: ckan_geodcatap iso19139_base: ckan_base iso19139_inspire: inspire ... iso19139_spain: iso19139_spain
-
Modify
.env
to select the newPYCSW_CKAN_SCHEMA
:PYCSW_CKAN_SCHEMA=iso19139_spain PYCSW_OUPUT_SCHEMA=iso19139
New metadata schemas can be extended or added to convert elements extracted from CKAN into standard metadata profiles that can be exposed in the pycsw CSW Catalogue.
-
Create a new folder in
schemas/pygeometa/
with the name intended for the schema. e.g.iso19139_spain
. -
Add a
__init__.py
file with the extended pygeometa schema class. e.g.import ast import logging import os from typing import Union from lxml import etree from owslib.iso import CI_OnlineResource, CI_ResponsibleParty, MD_Metadata from pygeometa.schemas.base import BaseOutputSchema from model.template import render_j2_template LOGGER = logging.getLogger(__name__) THISDIR = os.path.dirname(os.path.realpath(__file__)) class ISO19139_spainOutputSchema(BaseOutputSchema): """ISO 19139 - Spain output schema""" def __init__(self): """ Initialize object :returns: pygeometa.schemas.base.BaseOutputSchema """ super().__init__('iso19139_spain', 'xml', THISDIR) ...
-
Create the
main.j2
with the Jinja template to render the metadata, macros can be added for more specific templates, for example:iso19139_inspire-regulation.j2
, orcontact.j2
, more examples in:schemas/pygeometa/iso19139_inspire
-
Add the Python class and the schema identifier to
ckan2pycsw.py
, e.g.from schemas.pygeometa.iso19139_inspire import ISO19139_inspireOutputSchema, ISO19139_spainOutputSchema ... OUPUT_SCHEMA = { 'iso19139_inspire': ISO19139_inspireOutputSchema, 'iso19139': ISO19139OutputSchema, 'iso19139_spain: ISO19139_spainOutputSchema }
-
Add all mappings (
.yaml
) to a new folder inckan2pycsw/mappings/
. e.g.iso19139_spain
-
Update
ckan2pycsw/mappings/ckan-pycsw_assigments.yaml
to include the pycsw and ckan schema mapping. e.g.iso19139_geodcatap: ckan_geodcatap iso19139_base: ckan_base iso19139_inspire: inspire ... iso19139_spain: iso19139_spain
-
Modify
.env
to select the newPYCSW_OUPUT_SCHEMA
:PYCSW_CKAN_SCHEMA=iso19139_geodcatap PYCSW_OUPUT_SCHEMA=iso19139_spain
Perform a GetRecords
request and return all:
{PYCSW_URL}?request=GetRecords&service=CSW&version=3.0.0&typeNames=gmd:MD_Metadata&outputSchema=http://www.isotc211.org/2005/gmd&elementSetName=full
- The
ckan-pycsw
logs will be created in the/log
folder. - Metadata records in
XML
format (ISO 19139) are stored in the/metadata
folder.
Note The
GetRecords
operation allows clients to discover resources (datasets). The response is anXML
document and the output schema can be specified.
- Build and run container.
- Attach Visual Studio Code to container.
- Start debugging on
ckan2pycsw.py
Python file (Debug the currently active Python file
) in the container.
- Update the previously created
.env
file in the root of theckan-ogc
repo and move it to:/ckan2pycsw
- Open
ckan2pycsw.py
. - Start debugging on
ckan2pycsw.py
Python file (Debug the currently active Python file
).
Note
By default, the Python extension looks for and loads a file named .env
in the current workspace folder. More info about Python debugger and Enviromental variables use.
List of containers:
Repository | Type | Docker tag | Size | Notes |
---|---|---|---|---|
python 3.11 | base image | python/python:3.11-slim-bullseye |
45.57 MB | - |
Repository | Type | Docker tag | Size | Notes |
---|---|---|---|---|
mjanez/ckan-pycsw | custom image | mjanez/ckan-pycsw:latest |
175 MB | Dev & Test latest version. |
mjanez/ckan-pycsw | custom image | mjanez/ckan-pycsw:main |
175 MB | Stable version. |
Note
GHCR and Dev Dockerfiles
using main
images as base.
Ports | Container |
---|---|
0.0.0.0:8000->8000/tcp | pycsw |
0.0.0.0:5678->5678/tcp | ckan-pycsw debug (debugpy) |
Footnotes
-
Extends the @frafra coat2pycsw package. ↩
-
A custom installation of Docker Compose with specific extensions for spatial data and GeoDCAT-AP/INSPIRE metadata profiles. ↩
-
INSPIRE dataset and service metadata based on ISO/TS 19139:2007. ↩
-
The output pycsw schema (
iso19139_inspire
), to comply with INSPIRE ISO 19139 is WIP. The validation of the dataset/series is complete and conforms to the INSPIRE reference validator datasets and dataset series (Conformance Class 1, 2, 2b and 2c). In contrast, spatial data services still fail in only 1 dimension [WIP]. ↩