-
Notifications
You must be signed in to change notification settings - Fork 2
Conceptual architecture
The diagram below shows the high level functional blocks needed for a registry service implementation. This is not a formal component architecture. There are many ways that the registry service could be structured internally. The only formal component interfaces are the external interfaces defined in the Api specification which in turn reference open standard specifications, particularly SPARQL 1.1. The example functional breakdown shown here is meant to be illustrative, in order to understand the functionality that must be delivered, rather than prescriptive.
- Dispatcher
- Request processor
- Auth
- Audit trail
- Registry core
- RDF store
- Text index
- Federation manager
- Renderer
- Access UI
- Admin UI
All requests to the registry service are made over HTTP connections which are initially received by the dispatcher.
The dispatcher has the following responsibilities:
Forwarding. Parts of the registry URI space can be configured to forward to other remote services by means of an HTTP (status code 30x) response. This is configured by means of NamespaceForward
or FederatedRegister
registrations. The dispatcher provides an internal interface by which the registry core can dynamically update the set of forwarding rules current in effect.
Proxy. Optionally a registry service may support proxy forwarding as well as redirection. The advantage of this is that the delegation target URIs do not appear in the HTTP response. The disadvantage is that raises scaling challenges because the connection to the proxied service must be held open until a response (or timeout) is received and such connections can be a significant limiting factor on some implementation platforms. A registry service which chooses to implement proxy support MAY choose to aggressively cache remote responses to manage this cost.
Request filtering. Request which are not forwarded MAY be subjected to additional filtering rules including, but limited to, rejection of blocked user agents (e.g. banned crawlers), rejection of blocked IP addresses and request throttling to manage server load especially in response to suspected denial of service attaches.
Dispatch. All requests that are neither forwarded nor filtered are then dispatched to the relevant internal service. An implementation may choose to partition and structure the internal services and dispatch processing in many ways, including internal HTTP routing to replicated or specialized internal servers. In that case the dispatcher acts as a reverse proxy for the internal service architecture.
Logging. All external requests should be logged using normal web server logging practises.
In the example functional partitioning shown in this conceptual architecture then there are two dispatch targets the request processor which handles all registry specific APIs requests and the RDF Store to which raw SPARQL queries are routed.
The request processor takes all registry API requests passed by the dispatcher and determines if they are authorised to proceed. This involves:
- converting the request payload to an normalised internal RDF representation
- identifying the action requested and action target
- authenticating the requesting user
- testing if the user is authorised for the requested action on the target resource
- logging the request and authorisation outcome in an audit log trail
If the action is authorised it is passed to an appropriate internal registry interface supplied by the registry core.
The registry service is required to support a range of MIME types in which the RDF request payload can be supplied. The mandatory types are application/rdf+xml
and text/turtle
. Optionally application/ld+json
may be supported. The request processor is responsible for converting these request syntaxes into an internal RDF form for supply to the registry core.
The auth module provides the authentication and authorisation services used by the request processor.
User authentication should be based on standard web service security practises, for example Basic Authentication over HTTPS. The credentials database for user authentication is a matter for the service implementation but SHOULD be separated from the registry RDF store.
The authorisation process should support a role-based access control mechanism whereby users are assigned to roles and roles are permitted a range of actions on each sub-tree of the registry namespace.
In some registry service settings the roles and the users assigned to those roles may be publicly accessible in which case the role and role-user bindings MAY be held in the registry data store and/or may be accessible as configuration registers under the /service
reserved namespace.
The audit trail records all requests at the level of registry actions and the associated authorised user. This supplements, not replaces, the dispatcher’s web server logs. The audit trail provides for more convenient application level trace back which can help with both service maintenance and verification.
It is up to the service operator to define the level of audit logging required for a particular implementation. This may range from no such logging required though to a complete journal-ed record sufficient to enable the entire registry state to be replicated in the event of complete storage and storage backup failure.
This module provides all of the “business logic” to implement the registry API actions as a set of transformations of the RDF representations held in the RDF store.
We describe its operation in terms of serveral sub-modules:
The bulk of the defined registry API provides a set of ways in the registry state can be updated. Specifically:
- registration of new entities within a register
- creation of new sub-registers
- updating the description of a registered entity (whether the canonical description of a managed entity or the registered partial description of a referenced entity)
- updating the metadata describing a registered entity (register item)
- changing the status of a register item, a special case of the previous item
- deletion of registers and registered items (equivalent to changing the status to invalid)
The update logic provides implementations of each of these specific actions.
In all cases a set of action-specific validation steps is run before the action is permitted. The validation module provides common facilities reused by the separate action functions.
If the action is validated the logic for each action involves updating one or more resources (register, a register item or an entity reference). In each case there are API-specific rules to determine the new desired state including merging of current and submitted state, default-override rules and automated properties. There are specified in the Api design.
The validation module provides checking of an update payload before permitting the operation to proceed. This includes:
- testing URI conformance, that the payload URIs match the target resource
- life cycle enforcement so that life cycle transitions are restricted to the transitions described in the life cycle state diagram and that immutability of properties is enforced for accepted items
- per-register enforcement of validation rules specified via the
reg:validationQuery
register property
Any successful update to the register state results in the creation of a new version of a register, register item or both. The versioning model provides details of this version history of resources is to be represented in RDF. The version management module is responsible for implementing this versioning model including management of the associated named graphs.
The registry API provides for retrieval of individual resources. It also supports search operations which traverse a register subtree testing for registration of a particular entity (entity retrieval), of a set of entities (validation) or of all entities matching a set of search criteria (search query).
The search module translates such implicit and explicit search requests into either a SPARQL query (routed to the RDF store) or a free text query (routed to the text index) or a mix of the two.
If the register tree contains one or more federated registers then the search interface must pass the search query on to those federated registry services via the federation manager. Similarly if the register tree contains one or more delegated registers then the search must be converted to a SPARQL query over the delegated SPARQL endpoint, again via the federation manager.
The cost of filtering the search to the target register hierarchy may be non-trivial. An implementation may implement some interval coding scheme for register containment and is permitted to record such information as part of the RDF representation of registers.
Both the individual retrieval and the search APIs support a range of different view modifiers as discussed in the Api (paging, non-member-properties, with-metadata and with-version views). The view generation module implements these modifiers – retrieving the correct version of each entity in the response, retrieving the associated register item if required and merging version/verioned-thing views.
This module takes the response RDF and renders it according to the client request accept headers.
In the simple case (application/rdf+xml
, text/turtle
and optionally application/ld+json
) then this rendering is straightforward serialization of the response RDF into the requested RDF syntax.
For text/html
then the registry service is required to provide human readable web pages which make it possible for a user to navigate the registry via a browser. The precise format an structure of these pages is at the discretion of the registry service implementation. However, implementations are likely to provide a means (style sheets, rendering templates) to allow a given instance to tune the look & feel of the service.
The logical base URI of a registry service may differ from the physical DNS domain at which it is hosted (e.g. to support a staging version of a registry service to allow evaluation and testing, or delegation of some part(s) of the registry namespace to another party).
HTML views of each registry resource shall enable navigation within the physical deployment domain; HTML links in such views target the physical deployment URL of the service.
The registry state is maintained in an RDF store.
The versioning model requires multiple states of the same resource to be described in the same store using named graphs.
Direct access to the RDF store is provided via a SPARQL 1.1 compliant query endpoint.
A particular registry service implementation is free to architect this facility in different ways. The RDF store could be an embedded part of the registry service implementation which could directly expose a SPARQL 1.1 query facility. Alternative the store could be run as a separate service, possibly replicated. If the store itself is separated then it may use the standard SPARQL 1.1 Update protocol for routing compiled updates to the store as we as using SPARQL 1.1 Query for all retrievals.
Note that the SPARQL endpoint exposed to registry users MAY be a replicated instance of the operational store for performance and/or security reasons.
The registry search specification requires support for free text search.
While this a common feature of many RDF store implementations it is not a formal part of the SPARQL specification and so is shown as a separate component of the conceptual architecture. An particular registry service may implement this is a integral part of the main RDF store, as a separated but embedded text search index or as a separate distributed text search service.
The federation manager is responsible for handling requests to both federated and delegated registers.
The federation manager MAY provide caching of search requests, including a common framework for cache timeout, to enable a balance between adequate performance of federated queries and latency of update propagation. Registry service implementations may choose different trade-offs here.
The federation manager MAY actively harvest resources from federated and delegated registers for local indexing, in particular local free-text indexing. A full framework for active change notification is out of scope for the current specification so similar time-to-live considerations apply here as for reactive caching.
The registry operation is specified in terms of a web service API so that a diversity of clients can access and update a given registry service.
In addition a registry service should provide a basic web user interface to allow registry users to at least:
- discover resources
- assess the status of a resource
This would typically comprise:
- a search page to support free-text search
- a retrieval and validation page which can find all entries for a given entity or set of entities
- rendering for individual entries which makes clear both the description of the entity and the current status (as recorded in the registry item metadata)
- a navigation facility to allow navigation from an entry to its containing register and from a register down to contained items and up to containing registers
- a navigation facility to follow links to previous versions of items.
All such access user interfaces SHOULD be implemented as separate services which internally call the same registry web service API as any other client implementations.
The process and workflow management for submission, review and approval of registry entries is outside the scope of the registry service. The registry service simply provides the technical API through which the results of the process can be recorded and accessed. The registry service MAY provide a web forms interface for management of register entries or may delegate the interface to custom client implementations which access the registry service via the specified API.
In additional to any register management facilities a registry service should provide some facility for administration of the service itself including:
- administration of access and authorisation (users, user credentials, user roles)
- administration of the service itself (backups, rendering templates, cache management etc)
The precise scope and nature of this administration interface is at the discretion of the service implementations.
As noted in the overview, this conceptual architecture is intended to show the functional components required. Implementers may choose a variety of different approaches to deliver the specified API. Here we point out some of the considerations that implementers may need to take into account.
The primary envisioned usage for a registry service is the registration of data sources, reference identifiers, code lists and so forth. In such applications the sheer number of items to be managed is likely to be modest.
The API design is suited to a simple implementation strategy based on an underlying RDF “triple” store, as illustrated in this conceptual architecture. Assuming O(5) versions of each registered item and O(40) triples per registered item version then registries up to around 5 million registered items can be supported on a single server instance running open source triple stores.
The registry service design provides for various forms of delegation so that substantially more entities may be accessible through the registry namespace than are physically stored in the registry service itself.
Note that the natural implementation of the registry API using a triple store uses named graphs within the same RDF dataset to hold the different views of each entity. The implementer should thus select a triple store implementation that can scale to large numbers of named graphs.
For higher volumes of registered items then a number of alternative implementation strategies are possible:
- use of a clustered triple store implementation (open source or commercial)
- separated storage of the named graphs for prior versions, using the live triple store only for the latest version of each resource (may give order of magnitude scaling benefit depending on use of versioning)
- use a non-triple store solution, such as a horizontally scalable key-value store
A key part of the registry service design is the support for namespace forwarding and register federation (see delegation). The conceptual design supports this through use of the dispatcher. The dispatcher should be designed to support high throughput of forward requests and so should not perform live queries to the registry RDF store as part of normal forwarding. Instead, the register core should compile any registrations of reg:NamespaceForward
or reg:FederatedRegister
into a set of proxy forward rules for the dispatcher to act on.
Implementers may reuse industry standard scalable proxy implementations as components within a dispatcher design so long as they are able to dynamically update the proxy configuration from the registry core logic.
The option to harvest and cache entries in federated registers for high performance search is afforded by the Federation Manager component within the conceptual architecture.
Access to registered items or complete registers normally requires execution of the registry core logic which in turn will perform a set of SPARQL queries to retrieve the entity descriptions from the registry RDF store.
For registry instances which are expected to handle a high volume of look up requests then implementers should take into consideration caching and replication options.
The registry RDF store may be implemented as a replicated service to enable higher throughput of queries.
The formatted API response may be cached using standard HTTP cache implementations (typically as part of the dispatcher).
Implementers may also consider caching at the RDF store interface, for example by maintaining precomputed RDF views of each entity in a high performance cache.
Registration of new entities or updating the status of an item entails non-trivial logic in the registry core together with a number of SPARQL queries, which result in a SPARQL Update to the registry status.
For normal patterns of registry use very high volumes of update requests are unlikely so a normal three tier architecture is likely to be sufficient. The business logic (registry core) may be horizontally replicated if necessary to handle update throughput. The database (registry RDF store) implementation should be chosen to handle the the anticipated update rates.
For extreme cases then a non-RDF storage solution may be considered.
A registry service MAY omit the SPARQL endpoint from the Search and Discovery API in the interests of scaling.
In the absence of a SPARQL endpoint point then other implementations of the underlying storage solution, including high performance key-value stores are possible.
Furthermore the SPARQL endpoint is a possible source of very high cost requests which may impact other users. As noted earlier an implementation which does provide a SPARQL endpoint MAY do so via a separate store replica which isolates registry operation from such access requests. This will entail some latency between successful change of registry state and visibility of the change through the SPARQL endpoint.
Component | Proof of Concept | Deployment |
Dispatcher | Required, non scalable | Scalable forwarding |
Request processor | Required | Required |
Auth | No | Required |
Registry core | Partial, some operations may be stubbed out | Full |
RDF store | Required | Required |
Text index | Local index | Index external registers |
Federation manager | Stub | Full |
Renderer | Required | Required |
Access UI | Demonstration level | Full |
Admin UI | No | As appropriate |