Skip to content

pycsw endpoint for spatial open data portals (GeoDCAT-AP/INSPIRE)

License

Notifications You must be signed in to change notification settings

mjanez/ckan-pycsw

 
 

Repository files navigation

pycsw CKAN harvester ISO19139

pycsw ersion License: Unlicense

OverviewQuick startSchema developmentTestDebugContainers

Requirements:

Overview

Docker compose environment (based on pycsw) for development and testing with CKAN Open Data portals.1

Tip

It can be easily tested with a CKAN-type Open Data portal deployment: mjanez/ckan-docker2.

Available components:

  • pycsw: The pycsw app. An OARec and OGC CSW server implementation written in Python.
  • ckan2pycsw: Software to achieve interoperability with the open data portals based on CKAN. To do this, ckan2pycsw reads data from an instance using the CKAN API, generates INSPIRE ISO-19115/ISO-19139 3 metadata using pygeometa, or another custom schema, and populates a pycsw instance that exposes the metadata using CSW and OAI-PMH.

Quick start

With docker compose

Copy the .env.example template and configure by changing the .env file. Change PYCSW_URL and CKAN_URL, as well as the published port PYCSW_PORT, if needed.

cp .env.example .env

Select the CKAN Schema (PYCSW_CKAN_SCHEMA), and the pycsw output schema (PYCSW_OUPUT_SCHEMA):

  • Default:
    PYCSW_CKAN_SCHEMA=iso19139_geodcatap
    PYCSW_OUPUT_SCHEMA=iso19139_inspire
    
    ...
    
    SSL_UNVERIFIED_MODE=True
  • Avalaible:
    • CKAN metadata schema (PYCSW_CKAN_SCHEMA):

    • pycsw metadata schema (PYCSW_OUPUT_SCHEMA):

      • iso19139_inspire, default: Customised schema based on ISO 19139 INSPIRE metadata schema. 4
      • iso19139: Standard pycsw schema based on ISO 19139.

Change SSL_UNVERIFIED_MODE to avoid SSL errors when using a self-signed certificate in CKAN development.

  • Default:
    SSL_UNVERIFIED_MODE=True

Warning

Enabling SSL_UNVERIFIED_MODE can expose your application to security risks by allowing unverified SSL certificates. Use this setting only in a trusted development environment and never in production.

To deploy the environment, docker compose will build the latest source in the repo.

If you can deploy a 5 minutes image, use the stable image (ghcr.io/mjanez/ckan-pycsw:main) with docker-compose.ghcr.yml

git clone https://github.com/mjanez/ckan-pycsw
cd ckan-pycsw

docker compose up --build

# Github main registry image
docker compose -f docker-compose.ghcr.yml --build

# Or detached mode
docker compose up -d --build

Tip

Deploy the dev (multistage build) docker-compose.dev.yml with:

docker compose -f docker-compose.dev.yml up --build

If needed, to build a specific container simply run:

 docker build -t target_name xxxx/

Without Docker

Requirements:

Dependencies:

python3 -m pip install --user pipx
python3 -m pipx ensurepath --force

# You will need to open a new terminal or re-login for the PATH changes to take effect.
pipx install pdm
pdm install --no-self --group prod

Configuration:

PYCSW_URL=http://localhost:8000 envsubst < ckan-pycsw/conf/pycsw.conf.template > pycsw.conf

# Or update pycsw.conf vars manually
vi pycsw.conf

Generate database and add:

rm -f cite.db

# Remember create and update .env vars. Next add to .env environment:
bash doc/scripts/00_ennvars.sh

Run ckan2pycsw:

PYCSW_CONFIG=pycsw.conf pdm run python3 ckan2pycsw/ckan2pycsw.py

Schema development

User-defined metadata schemas can be added, both for CKAN metadata input: ckan2pycsw/schemas/ckan/* and for output schemas in pycsw: ckan2pycsw/schemas/pygeometa/*.

New input Metadata schema (CKAN)

You can customise and extend the metadata schemas that serve as templates to import as many metadata elements as possible from a custom schema into CKAN. e.g. Based on a custom schema from ckanext-scheming.

Sample workflow

  1. Create a new folder in schemas/ckan/ with the name intended for the schema. e.g. iso19139_spain.

  2. Create the main.j2 with the Jinja template to render the metadata.Examples in: `schemas/ckan/iso19139_geodcatap

  3. Add all needed mappings (.yaml) to a new folder in ckan2pycsw/mappings/. e.g. iso19139_spain

  4. Update ckan2pycsw/mappings/ckan-pycsw_assigments.yaml to include the pycsw and ckan schema mapping. e.g.

    iso19139_geodcatap: ckan_geodcatap
    iso19139_base: ckan_base
    iso19139_inspire: inspire
    ...
    iso19139_spain: iso19139_spain
  5. Modify .env to select the new PYCSW_CKAN_SCHEMA:

    PYCSW_CKAN_SCHEMA=iso19139_spain
    PYCSW_OUPUT_SCHEMA=iso19139

New ouput CSW Metadata schema (pycsw/pygeometa)

New metadata schemas can be extended or added to convert elements extracted from CKAN into standard metadata profiles that can be exposed in the pycsw CSW Catalogue.

Sample workflow

  1. Create a new folder in schemas/pygeometa/ with the name intended for the schema. e.g. iso19139_spain.

  2. Add a __init__.py file with the extended pygeometa schema class. e.g.

    import ast
    import logging
    import os
    from typing import Union
    
    from lxml import etree
    from owslib.iso import CI_OnlineResource, CI_ResponsibleParty, MD_Metadata
    
    from pygeometa.schemas.base import BaseOutputSchema
    from model.template import render_j2_template
    
    LOGGER = logging.getLogger(__name__)
    THISDIR = os.path.dirname(os.path.realpath(__file__))
    
    
    class ISO19139_spainOutputSchema(BaseOutputSchema):
        """ISO 19139 - Spain output schema"""
    
        def __init__(self):
            """
            Initialize object
    
            :returns: pygeometa.schemas.base.BaseOutputSchema
            """
    
            super().__init__('iso19139_spain', 'xml', THISDIR)
    ...
  3. Create the main.j2 with the Jinja template to render the metadata, macros can be added for more specific templates, for example: iso19139_inspire-regulation.j2, or contact.j2, more examples in: schemas/pygeometa/iso19139_inspire

  4. Add the Python class and the schema identifier to ckan2pycsw.py, e.g.

    from schemas.pygeometa.iso19139_inspire import ISO19139_inspireOutputSchema, ISO19139_spainOutputSchema
    
    ...
    
    OUPUT_SCHEMA = {
        'iso19139_inspire': ISO19139_inspireOutputSchema,
        'iso19139': ISO19139OutputSchema,
        'iso19139_spain: ISO19139_spainOutputSchema
    }
  5. Add all mappings (.yaml) to a new folder in ckan2pycsw/mappings/. e.g. iso19139_spain

  6. Update ckan2pycsw/mappings/ckan-pycsw_assigments.yaml to include the pycsw and ckan schema mapping. e.g.

    iso19139_geodcatap: ckan_geodcatap
    iso19139_base: ckan_base
    iso19139_inspire: inspire
    ...
    iso19139_spain: iso19139_spain
  7. Modify .env to select the new PYCSW_OUPUT_SCHEMA:

    PYCSW_CKAN_SCHEMA=iso19139_geodcatap
    PYCSW_OUPUT_SCHEMA=iso19139_spain

Test

Perform a GetRecords request and return all:

{PYCSW_URL}?request=GetRecords&service=CSW&version=3.0.0&typeNames=gmd:MD_Metadata&outputSchema=http://www.isotc211.org/2005/gmd&elementSetName=full
  • The ckan-pycsw logs will be created in the /log folder.
  • Metadata records in XML format (ISO 19139) are stored in the /metadata folder.

Note The GetRecords operation allows clients to discover resources (datasets). The response is an XML document and the output schema can be specified.

Debug

VSCode

Python debugger with Docker

  1. Build and run container.
  2. Attach Visual Studio Code to container.
  3. Start debugging on ckan2pycsw.py Python file (Debug the currently active Python file) in the container.

Python debugger without Docker

  1. Update the previously created .env file in the root of the ckan-ogc repo and move it to: /ckan2pycsw
  2. Open ckan2pycsw.py.
  3. Start debugging on ckan2pycsw.py Python file (Debug the currently active Python file).

Note

By default, the Python extension looks for and loads a file named .env in the current workspace folder. More info about Python debugger and Enviromental variables use.

Containers

List of containers:

Base images

Repository Type Docker tag Size Notes
python 3.11 base image python/python:3.11-slim-bullseye 45.57 MB -

Built images

Repository Type Docker tag Size Notes
mjanez/ckan-pycsw custom image mjanez/ckan-pycsw:latest 175 MB Dev & Test latest version.
mjanez/ckan-pycsw custom image mjanez/ckan-pycsw:main 175 MB Stable version.

Note

GHCR and Dev Dockerfiles using main images as base.

Network ports settings

Ports Container
0.0.0.0:8000->8000/tcp pycsw
0.0.0.0:5678->5678/tcp ckan-pycsw debug (debugpy)

Footnotes

  1. Extends the @frafra coat2pycsw package.

  2. A custom installation of Docker Compose with specific extensions for spatial data and GeoDCAT-AP/INSPIRE metadata profiles.

  3. INSPIRE dataset and service metadata based on ISO/TS 19139:2007.

  4. The output pycsw schema (iso19139_inspire), to comply with INSPIRE ISO 19139 is WIP. The validation of the dataset/series is complete and conforms to the INSPIRE reference validator datasets and dataset series (Conformance Class 1, 2, 2b and 2c). In contrast, spatial data services still fail in only 1 dimension [WIP].

Packages

 
 
 

Languages

  • Jinja 56.7%
  • Python 37.0%
  • Dockerfile 5.0%
  • Shell 1.3%