KB/LIBRIS Definitions

This repository is a collection of vocabulary and scheme definitions and mappings to external resources. These define the foundation of linked library data used by the National Library of Sweden.

Dependencies

Requires Python 3.7+. (Use PyPy for a general speed improvement.)

Preferably set up a virtualenv:

$ python3 -m venv PATH_TO_VENV_OF_YOUR_CHOICE
$ source PATH_TO_VENV_OF_YOUR_CHOICE/bin/activate

Install the Python-based dependencies:

$ pip install -r requirements.txt

Usage

See the files in `source/datasets/' for definitions of what is included in each (set of) datasets.

Run the following to build the full set of datasets:

$ python datasets.py -l

This is often not needed, as not all datasets are updated all the time. Instead, prefer to set up and use the following to produce a load file only for what has been worked on.

This builds the system core dataset:

$ python syscore.py -l

Run the following to build the full set of common datasets for id.kb.se:

$ python common.py -l

You can also pass dataset names to generate the different parts in isolation. Pass -h or --help to the script for details.

Finally, this builds as set of documentation articles for id.kb.se:

$ python docs.py -l

The vocabulary is split into the formally decided "vocab" terms (which we call the KBV namespace), and the legacy (often unstable) "marc" terms stemming from MARC21 constructs not yet interpreted according to the new modelling principles (based on RDF and linked data (see source/doc/model.en.mkd)).

In these files are special:

source/vocab/bf-to-kbv-base.rq and source/vocab/bf-map.ttl are used to automatically wire up the base BF2 mappings and term hierarchies.
source/vocab/display.jsonld defines lenses used to display data (as "cards" or "chips").
source/vocab/platform.ttl and source/vocab/services.ttl map various technical terms to public vocabularies.
source/vocab/enums.ttl, source/vocab/construct-enum-restrictions.rq, and source/marc/enums.ttl define the terms (properties and classes) for controlled, "enumerable" values. A lot of these stem from controlled values for columns in fixed fields in MARC21. Some come from RDA, and some from cleaned up defintions in BibFrame 2, or our own vocabulary. (See links in the data for references.)

These files also contain certain instances of these classes. Specifically, these correspond to the domain of the properties defined as @type: @vocab in source/vocab-overlay.jsonld. These are special values defined within the vocabulary (often because they are very "type-like"). A prime example is IssuanceType, whose values are kept together with the vocabulary itself.
source/marc/construct-enums.rq combine to create all other "enumerable" values, which may or may not become merged with other controlled lists in the future. (When that is done, the definition here must be removed and its URI be places in a sameAs relation in whatever term that is replacing it.

Note: the file source/vocab/check-bases.rq is used to check some sanity in the generated structures. It is advised to heed any warnings by correcting the relevant sources.

Maintenance

Tip: During vocab development. Regularly run just:

$ python syscore.py

which generates the vocab build file.

Look at it as Turtle by running:

$ rdfpipe -ijson-ld:context=sys/context/kbv.jsonld build/vocab.jsonld

, and/or make a nice, digested tree view by running:

$ python scripts/misc/vocab-summary.py build/vocab.jsonld -c sys/context/kbv.jsonld -v

When bigger changes are made, you can generate a more predicable output by calling e.g.:

$ PYTHONHASHSEED=1 python datasets.py -l

Use this in conjunction with switching between a stable branch and a feature, backing up the build directory when doing so, then using e.g.:

diff -qr /tmp/build-develop-bak build

to see the resulting differences.

Term Categories

To categorize classes and properties, we use or own kbv:category property, which links to various terms we've defined for various purposes, such as :pending.

We do not use vs:term_status for this, since:

We have a more broad set of categories than "status" implies. Categories are defined for various application-specific purposes, e.g. to state that a term is a shorthand term, or that a class belongs to a group of classes mappable to MARC bibliographic records).
Its use of string literals is poor practise, since out-of-band definitions are then needed to discover applicable values and their meanings. This is natural when using linked data by simply minting a URI for the status item and defining it with labels and definition texts (in any languages needed).

We have put vs:term_status "unstable" to use in some places, to clearly indicate that using a common colloquialism. But for out application purposes, we use :category :pending.

For deprecation we use owl:deprecated true, to facilitate any eventual tooling requiring this exact form.

We also mark terms using ptg:abstract true if they are not supposed to be used for resources directly (and thus choosable e.g. in an editing interface), but to represent a point in a class or property hierarchy defined for structuring the vocabulary.

Cleaning Up Terms

In principle, we should keep any published terms indefinitely. Everything at id.kb.se is potentially used externally (even without us knowing so), as we're an official agency tasked with ensuring long term stability and promoting data reuse.

If we consider a certain term ill-defined and detrimental to use, do not expect anyone else to be using it, and consider keeping it along with a owl:deprecated true as potentially problematic, it is OK to comment it out along with a note like:

# Dropped at 2021-09-08. Feel free to delete this after 5 years.

If its disappearance prompts any complaints, this gives us an easy way of seeing that we've removed it, and provides a window for restoring it.

KBV

This is a public application vocabulary. As such, we have no contract in terms of stability or officiality, other than that all terms we use in our data are to be defined within it. In general, this holds even if our data for certain resources is deleted, since their descriptions may have been kept in other systems. We do not guarantee this indefinitely though, and especially we might drop terms if they are deemed incorrect. Other than that, we will use owl:deprecated true to signal intended disappearance of a term.

MARC

All of these terms are implicitly owl:deprecated true and can in theory be dropped at any time (after removing any use of them from our datasets). No external use should depend on them. Any long-term use of these which indicate meaningful requirements should be reworked into proper KBV terms.

marcframe & legacy mappings

By using utilities in the whelk-core repository; you can generate a SPARQL construct file from the marcframe.json mappings, from which you can in turn generate a basic vocab file:

$ cd ../whelk-core/ && gradle -q vocabFromMarcFrame #.rq

To generate RDF descriptions from legacy MARC definitions, use:

$ python scripts/marcframe-skeleton-from-marcmap.py scripts/marc/marcmap.json --enums

See that script for other options.

Pipe the output to rdfpipe -ijson-ld:base=source/ -oturtle - to get it as Turtle.

Name		Name	Last commit message	Last commit date
Latest commit History 4,103 Commits
examples		examples
lxltools		lxltools
scripts		scripts
source		source
sys/context		sys/context
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
Makefile		Makefile
README.md		README.md
bibdb_datasets.py		bibdb_datasets.py
common.py		common.py
datasets.py		datasets.py
docs.py		docs.py
query.py		query.py
requirements.txt		requirements.txt
syscore.py		syscore.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KB/LIBRIS Definitions

Dependencies

Usage

Contents

Vocabulary Source Files

Maintenance

Term Categories

Cleaning Up Terms

KBV

MARC

marcframe & legacy mappings

About

Releases

Packages

Contributors 27

Languages

License

libris/definitions

Folders and files

Latest commit

History

Repository files navigation

KB/LIBRIS Definitions

Dependencies

Usage

Contents

Vocabulary Source Files

Maintenance

Term Categories

Cleaning Up Terms

KBV

MARC

marcframe & legacy mappings

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 27

Languages

Packages