Skip to content

This Harvester script help synchronize external catalog and Open Canada catalog using CWS

Notifications You must be signed in to change notification settings

federal-geospatial-platform/harvester-2.0

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Open Maps - Federal Geospatial Platform Harvester

The Open Government Secretariat's (OGS) Open Maps (OM) harvester pulling from the Federal Geospatial Platform (FGP) developed and maintained at Statistics Canada (StatCan).

Harvester - FGP - Diagram

globals.py

Here you can swith environment mode production and staging To run in production un comment OPERATION_ENV = 'production' and comment #OPERATION_ENV = 'staging' like so:

...  
OPERATION_ENV = 'production' 
### or ### 
#OPERATION_ENV = 'staging'
...  

To run in staging, comment #OPERATION_ENV = 'production' and un-comment OPERATION_ENV = 'staging' like so:

... 
#OPERATION_ENV = 'production' 
### or ### 
OPERATION_ENV = 'staging'
...  

harvest_hnap.py

Extract HNAP XML from the CSW source. Prints xml out to be piped to another command or to a file.

./harvest_hnap.py [options] [options]... > hnap.xml
or
./harvest_hnap.py [options] [options]... | parsing_command

Presently extracts everything but will eventually extract a window of data (e.g.: metadata records updated in the last two weeks). The alternate time filtering request available and commended out in the script.

This process runs in a few seconds depending on network latency.

hnap2cc-json.py

Converts HNAP XML file to a Common Core mapped CKAN compliant JSON Lines file. Accepts streamed in or file path as an argument and prints out JSON Lines output.

./harvest_hnap.py [options] [options]...| ./hnap2cc-json.py hnap2cc-json [options] [options]... > CommonCore_CKAN.jsonl
or
cat hnap.xml | ./hnap2cc-json.py [options] [options]... > CommonCore_CKAN.jsonl
or
./hnap2cc-json.py hnap.xml [options] [options]... > CommonCore_CKAN.jsonl 

This process runs in a couple seconds.

Import to CKAN

Uploading the JSON Lines file has been tested with the ckanapi CLI

ckanapi load datasets -I CommonCore_CKAN.jsonl -r http://target.ckan.instance.ca/ -a <user api key>

This process runs, depending on how much data is being pushed, in under 20 seconds.

Timing

Since each of these commands totalled run in under a minute this process could safely cycle every 5 minutes but considering how the GeoNetwork uploads in batches (and other departments might too) we should be more careful.

From a process standpoint, for R1 daily or weekly is reasonable. We’ll start assuming weekly till we hear otherwise.

About

This Harvester script help synchronize external catalog and Open Canada catalog using CWS

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published