Skip to content

Principles and concepts

der edited this page Nov 18, 2012 · 51 revisions

Contents

Design principles

Don’t repeat yourself

We attempt to minimise duplication of information within the registry to simplify data maintenance. Specific impacts of this principle are:

  • Where there are two ways to link resources (via a pair of inverse properties, such as dct:isVersionOf and dct:hasVersion) we standardise on one direction for internal use.
  • Some properties are “virtual”, they are derived from the stored information on request rather than explicitly maintained. In particular, non-monotonic properties such as version:currentVersion are avoided in the internal representation to simplify merging of updates.

Common cases should be simple

The registry supports complex machinery for representing the metadata about the status of an entry within a register and for recording the history of changes to entries. The majority of register users are not expected to be interested in such details and simply wish to know what things are registered where. The API design provides simplified default views of the information model in which details of metadata and versioning are hidden. Register users who need access to these details have to explicitly request higher-fidelity views.

The simplified views involve a more radical transformation on the stored data than just omitting some properties. This means that there is a mismatch between the simplified information model presented by the default view and the full model exposed to users who directly access the internal store via SPARQL. We accept the extra burden on such advanced users in return for the simpler operation for the majority of users.

Predicable URIs

The API should, where reasonable, follow REST principles. Specifically that any resource in the system should be identified by a URI and be manipulable by standard HTTP verbs (GET, PUT, DELETE, PATCH).

The URIs for these resources should follow a simple pattern with as few restrictions as possible. The URI patterns should be predictable (“hackable”) to enable developers to understand the structure of the registry and to more easily identify errors. However, hacking of URIs should not be required in order to operate the registry.

Register, RegisterItem, Entity

Fundamentally the registry is a service to enable organizations to maintain controlled lists.

The essential concept here is the notion of a register.

A register is a single controlled collection (list). Each register is operated on behalf of some owner organization which provides the authority for the collection. For example, the WMO might wish to provide a list of approved codes to represent “runway deposits”, entries in which might represent “slush” or “snow”. Critically there is some governance regime which defines what can be registered and the process to be followed. The user of such a register is assured by the owner that the set of codes within that register is complete and definitive for the declared usage – any code which is missing is not approved.

The owner may delegate the running of the register and enforcement of the governance policy to a manager who performs the actual validation and maintenance of register entries.

The type things that can be entered in a register is completely open. They include codes, ontology concepts, complete ontologies, spatial objects, coordinate reference systems, units of measure, spatial objects, organizations, licenses etc. We use the term entity1 to describe these things but there is no explicit reg:Entity class, anything which can be given a URI can be registered.

A register can itself be registered as an entry in another register, allowing us to create a hierarchy of registers.

As well as simply enumerating the entities which have been registered, the registry records information such as the status of an entry, the category of the entity in a classification scheme, aliases for the entity and so forth. This metadata is not intrinsic to the entity but is an aspect of how the organization regards the entity. The same entity might be entered in several registers and have a different status and classification in each, even within the same registry service. To achieve this the register maintains a set of metadata records called register items (represented by reg:RegisterItem) which describes the entry.

1 We use entity rather than thing so as to avoid confusion with owl:Thing which with DL-semantics (as opposed to RDF-semantics) would preclude registering classes and properties.

Information model

The full internal information model is described by the reg ontology http://purl.org/linked-data/registry# which is summarised in the diagram below:

We regard a register as being a container of the entities registered in it. This is represented in the information model by making reg:Resister a sub-class of the Linked Data Platform container class ldp:Container. In an LDP Container there is a direct membership property linking a container to its members. This property can be declared explicitly using ldp:membershipPredicate or defaults to rdfs:member. In keeping with our principles of minimising duplication we infer this membership predicate on demand as illustrated in the figure below.

This enables us to present a simple view of a register as a direct container of entities. In keeping with our principles that common cases should be simple this simplified container view is the default view returned when a register instance is fetched. More advanced use of the API is required in order to retrieve the full internal view.

A register is also a void Dataset, enabling void-aware clients discover information on the register contents.

Referenced and Managed entities

As well as maintaining registers of entities the registry service provides a repository function where it stores descriptions of entities itself.

An entity whose authoritative definition is stored in the registry is called a managed entity. It is possible for a suitably authorised maintainer to update the definition of a managed entity through the API (a PUT or PATCH operation on the entity URI).

An entity whose master definition is held elsewhere can still be entered into a register in which case it is called a referenced entity. This may be simply an external resource, created and managed by another organization in a different DNS domain, or it may be a resource in a different register within the same registry. When such an entity is registered the submitter still needs to provide a minimal description of the referenced entity. By default this minimal description requires a label (rdfs:label) and a type (rdf:type) though a particular register may impose additional constraints. The register will record the submitted description of the referenced entity. It is possible to update this recorded description if the entity changes.

Status and Life cycle

TODO: This section has been substantially revised following the addition of a different sort of “not valid” code to ISO 19135. Needs review.

A registered item has an associated status within the register. This status is not an intrinsic attribute of the entity itself but rather a statement of how the entity is regarded by the register’s authority (its owner).

The status codes are arranged in a hierarchy which reflects how groups of codes are treated.

   reg:statusHidden
        reg:statusProposed	
        reg:statusNotValid	
    reg:statusVisible
        reg:statusValid	
            reg:statusExperimental   
        reg:statusDeprecated	
            reg:statusSuperseded	
            reg:statusRetired

At the top level status codes are arrange in two groups hidden and visible. Items with a code which is a specialization of hidden will not be included in the normal listing of the register members. They are either entries which have been submitted but not yet approved ( proposed ) or have been deemed flawed ( not valid ).

The visible items either have a code of valid (meaning they are suitable for use) or deprecated (meaning they are should not be used for new applications though they may still be employed existing applications). Only valid entries are used in response to a validation request.

An item may become deprecated in one of two ways. It may be simply withdrawn ( retired ) or it may be replaced by an alternative ( superseded ). In the case of a superseded item it is possible to discover the item that succeeded it (via ^ reg:predecessor).

A valid item may also be marked as experimental to communicate the intention that the item is being trialed and might be withdrawn or replaced.

TODO: Discuss whether a reg:statusStable as a sibling to reg:statusExperimental would be useful. I originally put it in and then felt that simply using reg:statusValid for this is more in keeping with ISO 19135.

In the normal lifecycle a new entry in a register is given status proposed and thus is not shown in the list of register members. Once the entry is approved it becomes valid and normally remains in the list of register members permanently. If the entry is later deprecated (either by being withdrawn from use or being superseded) then it remains visible but is not used for validation.

Sometimes an entry is deemed too flawed to approve, or is approved and then a serious problem is discovered, and the item should be removed from the list of members – not even appearing as retired. This is supported by the not valid status code. The DELETE API call sets this status.

The status code hierarchy means that an item with a narrow code can be inferred to also have the corresponding set of broader codes. For example a retired item is also deprecated and visible. The registry service implementation may choose whether to materialize these inferred codes in the internal data store or perform the inference at query time.

History and versioning

The registry service maintains an accessible history of changes to registers and registered entities. This is to enable registry users to check whether a registration was valid at a particular time, for example to enable an organization to require use of a code list as defined on a particular date.

This versioning is independent of any audit trail maintained by the registry implementation for the purposes of maintenance and security checking.

Versioning of entities

If the definition of a entity changes in a way that affects its semantics then it should be treated as new entity. The new definition should be registered as a new item. Once the new entry has been approved it can be recorded as superseding the previous item (the new reg:RegisterItem links via reg:predecessor to the old item and the old item should be marked as status reg:statusSuperseded).

However, it is permissible to make some inessential changes to an entity (for example, to correct a mistake in a label) and still maintain the same identity and registration. The register’s governance policy defines what constitutes an inessential change and in the general case human decision making may be required to implement the policy. When such a change is made then the new definition will be stored and a new version of the reg:RegisterItem is created to reference the updated definition.

Similarly the metadata about an entry including its status, category and alias links may be changed. This again will result in a new version of reg:RegisterItem being created but in that case the reference to the entity itself is unchanged.

Versioning of registers

It is important for some use cases of the registry service that it be possible to refer to the state of a register at a particular time. This essential characteristics of a register includes the list of valid entries. So if the list of entries changes this implies that the register state has changed and a new version of the reg:Register instance will be created. Similarly the register metadata may be updated and this change will also result in a new version of the reg:Register instance. An inessential change to a particular register item does not change the state of the register, unless it results in a change of visibility level. This avoids the need to cascade changes all the way up the register hierarchy.

Versioned types

For recording the history of version changes the registry service adopts a “hub/spoke” model using the version vocabulary http://purl.org/linked-data/version#

Each resource whose history trail is to be explicitly maintained is an instance of version:VersionedThing. A particular version is an instance of version:Version and is annotated with a version string (owl:versionInfo) and a validity interval (version:interval). The version is a dct:isVersionOf the versioned thing and dct:replaces the previous version.

Properties of a versioned thing that are essential to its nature (e.g. its type) are termed rigid and the version vocabulary provides a version:rigidProperty annotation to declare the rigid properties of a class of versioned things. The rigid properties may be stored on the base version:VersionedThing since they don’t change.

We refer to types which are maintained using this versioning model as Versioned types. An instance of the registry service includes an explicit register (/system/VersionedTypes) which enumerates the versioned types for the service instance. By default reg:Register and ref:RegisterItem are versioned types.

When a registered entity is changed it is not always possible to use this explicit version strategy. An external referenced entity may change without its URI changing. The registry service maintains a historical copy of each version of what it knows about external referenced entities as separate logical named graphs2. The register item refers to the correct version of such entities using ref:EntityReference which defines the entity resource and the graph in which it occurs.

In this way it is possible to trace the history of any entity, even an external one, by walking the version trail of the register item and following the link to the appropriate entity reference.

In some registry implementations it may be permissible to add new types in the versionedTypes register. This would cause the registry to handle any managed entities which match those types using the hub/spoke model. This means that it is possible to externally reference a particular version of such an entity rather than just a version of the corresponding register item. This complexity is deemed unnecessary for the common use cases and is not a required feature of a registry implementation.

2 Typically we would expect this to be implemented using a quad store with each named graph being part of the registry’s SPARQL dataset. However, implementations are permitted to record history using a low cost external mechanism so as a key value store or a source control system.

URI patterns

Registers and managed entities are assigned URIs within the DNS domain of the registry service.

Register http://registry/{register}/../{subregister}
Registered entity http://registry/{register}/../{subregister}/{entity}
RegisterItem http://registry/{register}/../{subregister}/_{item}
System reserved http://registry/system
VersionedTypes register http://registry/system/VersionedTypes

Where registry represents whatever the base URI is for the registry e.g. location.data.gov.uk.

The register hierarchy is reflected in the URL path hierarchy in the obvious way.

There are only three restrictions on naming of registers and entities:

  • The top level register system is reserved for operation of the registry service.
  • No register or managed entity can have a _ as the first character of the final segment.
  • Items (i.e. resources of type reg:RegisterItem) should always have a _ as the first character of the final segment.

These resources are all linked data resources, they respond to a GET request with an appropriate RDF description.

The register resources also fulfil the contract for Linked Data Platform (LDP) Collections (as defined by the first pass working draft http://www.w3.org/TR/2012/WD-ldp-20121025/) to support listing, paging and adding of register entries.

The register items and the managed resources also support modification (for appropriately authorised users) through PUT and PATCH operations.

For details of these operations see API.

Given the flexibility of the URI structure a registry service can be regarded as a general purpose publication platform for distributed publication of RDF resources – optimized for high governance, modest volume, publication.

Containers and bulk publication

A register is simply a collection of entities (both in the sense of being a void data set and of being a LDP collection).

For some applications it is useful to also declare the register as being an instance of other collection types such as a skos:Collection or skos:ConceptScheme. The ldp:membershipPredicate can be used to declare the appropriate relation between the register and the registered entity (skos:member, skos:hasTopConcept). In this way the registry service can be used to create and manage such collection types.

To further facilitate this usage the registry service also supports bulk registration of instances of such collections. For example it is possible to upload an entire SKOS Collection to a register, creating a sub-register corresponding to the Collection and entries within that sub-register for each Concept in the collection. This enables convenient publication of collections using the “slash URI” pattern, including ontologies. The set of collection types which are supported for bulk upload is defined by the registry service through the /system/BulkCollectionTypes register. The supported types must be listed in that register. It is service-dependent whether new types can added to the bulk-collection-types register.

Validation

The notion of validation arises in two distinct parts of the operation of the registry service.

Validation of submissions

When an entity is submitted to a register there are two stages to the process of validation.

Firstly a technical validation process is run at the point of submission. This checks that:

  • the submission is syntactically valid;
  • the entity has at least one value for each the mandatory properties rdf:type and rdfs:label (in the case of rdfs:label there must be a value within an operating language of the register);
  • if the submission is a referenced entity which falls within the namespace of the registry (but outside that of the target register) then it must already exist (external reference entities are not required to resolve at time of submission);
  • if the register declares one or more SPARQL ASK validation queries (reg:validationQuery) then all of those queries, when applied to the submitted graph, must return true.

All customization of the technical validation is done through declaration of validation queries.

If a submission passes technical validation a reg:RegisterItem record will be created (with state statusProposed) and linked to the register.

At that point a register-specific approval process will be triggered. This may be a manual vetting process, including appeal options, or it may be automated or a mix of the two. For example, a further background technical validation process might be run before triggering a final human approval. The registry provides a means to declare the governance policies (reg:governancePolicy) but the mechanics of this approval process are outside the scope of the registry service itself.

When an entry has been approved its status will be updated to statusValid at which point it becomes a visible entry in the register.

Validation against a register

The other notion of validation supported by the registry service is the ability to verify that a given entity, or set of entities, is registered in a given register. This is supported by the entity and validate queries described in API.

Delegation

Note: This is section is not yet discussed or approved.

In a Linked Data setting then a key function of the registry is to provide effective management of a shared URI namespace. In both public sector and enterprise use of Linked Data there is a requirement for multiple organizations to be able to publish reference information into a common namespace. To avoid the registry becoming a scaling bottleneck, especially for update, then it must be possible for multiple organizations to serve their own parts of the shared namespace and not force all information to be physically stored in a centralised registry implementation.

To support this requirement the registry service provides three mechanisms for delegating parts of the registry namespace – namespace forwarding, registry federation and register delegation.

In each case the delegation is enabled by registering an entity which is a sub-class of reg:Delegated into a parent register. The reg:delegationTarget defines the service to which requests to the registered URI should be forward.

Namespace forwarding

This is enabled by registering an entity of type reg:NamespaceForward which indicates a delegation target URI to forward to, and optionally a status code to use for forwarding (reg:forwardingCode).

For example, suppose we have a register http://registry1/register.

If we register a new entry in the register at relative location ext:

<ext> a reg:NamespaceForward;
    reg:delegationTarget <http://extregistry/root/base> ;
    reg:forwardingCode 307 .

Then a request to http://registry1/register/ext/foo/bar will be forwarded to http://extregistry/root/base/foo/bar by returning an http 307 response with Content-Location: http://extregistry/root/base/foo/bar.

The default forwarding code in the absence of an explicit setting is 307.

A registry service implementation is only required to support 30X status codes. It MAY choose to support a proxy configuration indicated by specifying a 200 response code. In this case the registry service itself passes the request to the delegation target and returns the response to the requester. If the service does support proxy mode it SHOULD respect the http cache control headers in the response.

Note that there is no restriction on the behaviour of the delegation target. So entries at and below http://registry1/register/ext in our example will no longer necessarily support the registry API. Search requests to http://registry1/register (or above) will not include any content served from the delegation target.

Registry federation

This is enabled by registering an entity of type reg:FederatedRegister.

For requests to any resource within and below the federated register then this mode acts precisely the same as namespace forwarding, using a 307 code to perform the forwarding.

The key difference in this case is that the delegation target is assumed and required to support the full registry API. Search requests to parent registers of the federated register will trigger the search query to also be sent to the delegation target register and the results will be aggregated into the local search results.

Delegated register

The purpose of this delegation mode is to enable other organizations to maintain and serve the list of contents of a register.

This is enabled by registering an entity of type reg:DelegatedRegister with a reg:delgationTarget which gives the SPARQL endpoint for the service which will supply the item and membership information. This is a subtype of reg:Register and all normal register properties apply. In addition the delegated register specifies a partial triple pattern which can be used to enumerate the members of the register (reg:enumerationSubject, reg:enumerationPredicate, reg:enumerationObject).

For example, suppose we have a register http://location.data.gov.uk/id in which we register:

<bathingWaters> a reg:DelegatedRegister;
    rdfs:label "Bathing waters"@en;
    rdfs:description "Bathing Waters from the environment agency"@en;
    reg:owner <http://reference.data.gov.uk/2011-09-30/id/public-body/environment-agency> ;
    reg:manager <http://reference.data.gov.uk/2011-09-30/id/public-body/environment-agency> ;
    ldp:membershipPredicate  rdfs:member;
#
    reg:delegationTarget     <http://environment.data.gov.uk/sparql/bwq/query>;
    reg:enumerationObject    <http://environment.data.gov.uk/id/bathing-water/>;
    reg:enumerationPredicate <http://reference.data.gov.uk/def/reference/uriSet>;
    .

Then a GET request to http://location.data.gov.uk/id/bathingWaters will trigger a request to the specified SPARQL endpoint to discover all members of the register:

   CONSTRUCT 
    {
        <http://location.data.gov.uk/id/bathingWater> rdfs:member ?member.
    }
    WHERE
    {
        ?member <http://reference.data.gov.uk/def/reference/uriSet> <http://environment.data.gov.uk/id/bathing-water/> .
    }

The result of this query, along with the register specification itself will be returned as the register membership contents.

It is possible for the delegation target to list register members which occur within the namespace of the delegating register. The register will respond to a GET request on such an entity by first obtaining a description of the entity from the delegation target using SPARQL Describe and then returning that description.

So in the above example if the SPARQL endpoint <http://environment.data.gov.uk/sparql/bwq/query> contains a triple:

  <http://location.data.gov.uk/id/bathingWaters/ukl1234> 
        <http://reference.data.gov.uk/def/reference/uriSet>
            <http://environment.data.gov.uk/id/bathing-water/> .

Then a GET request to <http://location.data.gov.uk/id/bathingWaters>/ukl1234> will be responded to by the registry by issuing:

 DESCRIBE <http://location.data.gov.uk/id/bathingWaters/ukl1234> .

to <http://environment.data.gov.uk/sparql/bwq/query> and return the resulting RDF graph to the requester.

Note: only GET requests are supported by this delegation mode. The register contents are being maintained by the delegation target and cannot be modified by PUT, PATCH or POST requests to the registry.

Clone this wiki locally