This repository contains an archived 2016 proof of concept for creating a checklist of alien species in Belgium from different sources. Many of the concepts tried here are used for the Global Register of Introduced and Invasive Species - Belgium, an open, reproducible checklist of alien species in Belgium created as part of the TrIAS project. See https://github.com/trias-project/unified-checklist for more information.
-
Choose and download source datasets
-
Format the data to tab-delimited values with Open Refine
-
Define common terms for all source datasets
-
Map the source datasets to the common terms schema, using this mapping file.
-
Concatenate all source datasets using:
csvcat --skip-headers data/interim/fishes/data-with-common-terms.tsv data/interim/harmonia/data-with-common-terms.tsv data/interim/macroinvertebrates/data-with-common-terms.tsv data/interim/plants/data-with-common-terms.tsv data/interim/rinse/data-with-common-terms.tsv data/interim/rinse-annex-b/data-with-common-terms.tsv data/interim/t0/data-with-common-terms.tsv data/interim/wrims/data-with-common-terms.tsv > data/interim/concatenated-checklist.tsv
Up to this point, all steps are repeatable. The rest is not.
-
Copy concatenated file to data/interim/verified-checklist.tsv.
-
Add a number of columns.
-
Define controlled vocabularies for the terms we're interested in.
-
Map the current values to controlled vocabularies, using the
-mapping
-files in vocabularies directory. -
Match scientific names to the GBIF backbone taxonomy (assuming inbo-pyutils is locally available):
python ../inbo-pyutils/gbif/gbif_name_match/gbif_species_name_match.py data/interim/verified-checklist.tsv data/interim/verified-checklist.tsv --update --namecol scientificName --kingdomcol kingdom --strict --api_terms usageKey scientificName canonicalName status rank matchType
-
Automatically update
nameMatchValidation
column for synonyms that have been verified:python ../inbo-pyutils/gbif/verify_synonyms/verify_synonyms.py data/interim/verified-checklist.tsv data/interim/verified-checklist.tsv --synonym_file data/vocabularies/verified-synonyms.tsv --usagekeycol gbifapi_usageKey --acceptedkeycol gbifapi_acceptedKey --taxonomicstatuscol gbifapi_status --outputcol nameMatchValidation
-
Review any remaining issues (see this procedure for updating names).
-
Aggregate the checklist with this notebook to create this the final checklist.