-
Notifications
You must be signed in to change notification settings - Fork 9
Compare API
When creating or updating an item in the registry, it may be necessary to check whether an item with similar characteristics already exists in the same register, or elsewhere in the registry. For example, two items cannot share the same item URI in the registry (that is, the URI which locates the item in the registry rather than the entity URI, if they are different). Similarly, two items sharing a label might indicate that they are duplicates, but only the user can know for sure.
In order to make this type of check easy to perform, the registry provides a "compare" API, which accepts the same RDF payloads (TTL, XML, etc.) as the "register" API, and returns information about any similarities between the submitted items and the current contents of the registry. This functionality is integrated into the UI on each of the pages where you would normally be able to create or update registry items.
A similarity between new and existing items may be detected in the following ways.
- If they have the same item URI, they are in conflict and the new registration would fail.
- If they have the same value for one or more label properties (case insensitive) and they are both registers or both register items, then the new item may be a duplicate.
- If they have a similar value for one or more label properties and they are both registers or both register items, then the new item may be a duplicate.
Existing items which have an "Invalid" status are considered to be no longer part of the registry, and as such are excluded from the comparison results.
You must be logged in as a registry user to access this API. No other permissions are required.
The label properties whose values are compared by the API are determined by the text-indexed properties that are
configured on the store. You can configure these by setting the baseStore.textIndex
property in your app.conf
file.
These properties also determine the behaviour of the search API.
The similarity of labels is determined by performing a "fuzzy" search on the text index (see the
Lucene documentation
for more information). You can configure the precision of this search by setting the config.similarityParam
property in your app.conf
file.
# The underlying RDF store
basestore = com.epimorphics.registry.store.impl.TDBStore
basestore.textIndex = rdfs:label,dct:title,foaf:name,skos:prefLabel,skos:altLabel,rdfs:comment
basestore.index = /var/opt/ldregistry/index
# The Registry store API wrapper, which uses the base RDF store and indexer
storeapi = com.epimorphics.registry.store.StoreBaseImpl
storeapi.store = $basestore
# Additional configuration paramaters, typically to control UI behaviour
config = com.epimorphics.appbase.core.GenericConfig
config.similarityParam = 0.7
# The Registry configuration itself
registry = com.epimorphics.registry.core.Registry
registry.store = $storeapi
registry.configExtensions = $config
You can access the API by sending a POST request to the register to which you intend to register the prospective new items,
with the query parameter compare
.
The body of the request should contain the details of the new items in RDF format. This can be in simple form (containing only the entities to be added) or with registry metadata. Only the characteristics of the entities will be compared, not the metadata.
The compare API should accept any request body that you could send to the registration API, and vice versa.
Optionally, you can use the query parameter compare-edit
in addition to compare
to signal that the body contains
updates to existing items. As a result, clashing URIs will be ignored, since they denote updates to
existing entries.
Although the request targets a specific register, the payload will be compared to the entire registry.
The response will be in an RDF format determined by the Accept
header of the request. You can also request text/html
to get the results panel that would normally be rendered in the UI.
The root of the response will be a node with the type reg:CompareResult
.
This is a marker type which has no particular meaning outside of this API.
The root has rdfs:member
relationships with the register items corresponding to the originally submitted items.
Even if the items were given in entity form (without metadata), they will be presented with metadata in the response.
The register items in the response may have skos:exactMatch
or skos:closeMatch
relationships with the existing register
items that they resemble (if any).
-
skos:exactMatch
denotes that the register items have the same URI, or that their entities have the same label. -
skos:closeMatch
denotes that their entities have similar labels.
The details of the matching register items and entities will be included in the response in the usual form.
For example, to compare a prospective new item for the "colour" register, you could send a POST request to:
localhost/registry/colour?compare
With the headers:
Accept: text/turtle
Content-Type: text/turtle
Authorization: # your authorisation here #
With the body:
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix reg: <http://purl.org/linked-data/registry#> .
<http://example.org/registry/colour/grn>
a skos:Concept ;
rdfs:label "green" ;
reg:notation "grn" .
And receive a response containing results of this form.
In order to access the API from the UI, you will need to have the latest changes from the
registry-config-base project in your /opt/ldregistry
directory.
The "Create Register", "Manual Entry" and "Register New or Changed Entries" action pages have a button labelled "Find Similar" next to the usual confirmation button. You can click on it to submit the contents of the form to the compare API. The response will be rendered as an alert panel at the bottom of the page, displaying exact and close matches in separate tables. The table has the following columns:
Column | Meaning |
---|---|
New | The URI of an item that was submitted by the user. Only displayed when uploading multiple items. |
Suggested | An item currently in the registry which has similar characteristics to the new item. |
Register | The register where the suggested item was located. / denotes the root register. |
Types | The RDF types of the suggested item. |
Status | The status of the suggested item. |
On the "Register New or Changed Entries" page, you can choose which type of registration to perform. When submitting new entries (including those submitted in "batch" mode), clicking on the "Find Similar" button will have the same effect as on the single entry pages. If the option to add new and update existing entries is chosen, then clashing item URIs will be ignored, since they denote updates to existing entries. Similarly, the labels of entries will not be checked for similarity with their current state in the register.