-
Notifications
You must be signed in to change notification settings - Fork 46
DCAT Identifiers
[ Still in progress]
From the use case Use Case 5.11 Modeling identifiers and making them actionable [ID11]
- effective use across platform
- actionable independently from the platforms
- the identifier can be encoded as HTTP URIs, which seems to be the most effective way of making them actionable
- otherwise "the type" can help so that a common identifier type registry would ensure interoperability.
- Distinguishing primary and secondary identifiers
- Dereferenceable identifiers [RDID]
- Identifier type [RIDT]
- Primary and alternative identifier [RIDALT]
DCAT should rely on HTTP URIs, which is an effective way of making identifiers actionable.
Primary and secondary identifiers can be specified following the indications available in dcat ap guidelines, which recommends
- Assign a stable identifier to the dataset in the catalogue where the dataset is first published. This should be the primary identifier of the dataset. Include this identifier as the value of dct:identifier.
- In the case of duplicates, other locally minted identifiers or external identifiers, like Datacite, DOI, ELI etc., will be assigned to the dataset. As long as they are globally unique and stable, these identifiers should be included as values to adms:identifier.
- Harvesting systems should not delete or change the value of adms:identifier and only use it to compare harvested metadata to detect duplicates.
When identifiers are not HTTP dereferenceable, common identifier type can be specified for the sake of interoperability.
A stable primary identifier is set by using dct:identifier.
Example 1: An example of HTTP dereferenceable ID used in the catalogue where the dataset is first published
<https://example.org/id> a dcat:Dataset;
dct:identifier "https://example.org/id"^^xsd:anyURI
...
.
Harvesting systems should not delete or change the value of adms:identifier.
Example 2: An example in the catalogue that has harvested the dataset
<https://othercatalog.org/id> a dcat:Dataset;
# dct:identifier shouldn't be changed by harvesters
dct:identifier "https://example.org/id"^^xsd:anyURI
...
.
DCAT specifies secondary identifiers by adms:identifier
Example 3:
<https://example.org/id> a dcat:Dataset;
...
# Secondary ID
adms:identifier <https://example.org/iddoi>
<https://example.org/iddoi> rdf:type adms:Identifier ;
skos:notation "https://doi.org/10.1109/5.771073"^^xsd:anyURI;
# reading https://www.w3.org/TR/skos-reference/#notations more than one skos:notation can be set,
skos:notation "info:doi/10.1109/5.771073"^^xsd:anyURI ;
# the authority/agency defining the identifier scheme, used if the agency has no URI
adms:schemeAgency "International DOI Foundation" ;
# the authority/agency defining the identifier scheme, used if the agency has URI
dct:creator ex:InternationalDOIFundation.
ex:InternationalDOIFundation a dct:Agent;
rdfs:label "International DOI Foundation";
foaf:homepage <https://www.doi.org/> .
DCAT uses adms:schemeAgency and dct:creator to represent the authority that defines the identifier scheme (e.g., DOI foundation in the example), adms:schemeAgency is used when the authority has no URI associated. DCAT does not represent the authority responsible for assigning and maintaining identifiers using that scheme (e.g., IEEE ) as naming the registrant goes against the philosophy of DOI where the sub-spaces are abstracted from the organisation that registers them, with the advantage that DOIs don't change when the organisation changes or the responsibility for that sub-space is handed over to someone else.
When the HTTP dereferenceable ID returns rdf/owl, the use owl:sameAs might be consider
<https://example.org/id> a dcat:Dataset;
...
owl:sameAs <https://doi.org/10.1109/5.771073>
If indentifiers are not HTTP dereferenceable, common identifier types can be served as RDF datatype or custom OWL datatype, see 'ex:type' in the following
Example 4:
<https://example.org/id> a dcat:Dataset;
...
adms:identifier <https://example.org/sid>
<https://example.org/sid> rdf:type adms:Identifier ;
# the actual id
skos:notation "PA 1-060-815"^^ex:type ;
# Human readable schema agency
adms:schemaAgency "US Copyright Office" ;
dcterms:issued "2001-09-12"^^xsd:date .
If a registered URI type is used (following RFC-3986), the identifier scheme is part of the URI; thus indicating a separate identifier scheme in 'type' is redundant. For example, DOI is registered as a namespace in the 'info' URI scheme (see faq #11), so according to RFC-3986 URI it should be encoded as in the following
Example 5:
<https://example.org/id> a dcat:Dataset;
dct:identifier "info:doi/10.1109/5.771073"^^xsd:anyURI
...
.
or
<https://example.org/sid> rdf:type adms:Identifier ;
# the actual id
skos:notation "info:doi/10.1109/5.771073"^^xsd:anyURI ;
...
.
Otherwise, examples of common types for identifier scheme (arXiv, ect) are defined in DataCite schema and FAIRsharing Registry.
This proposal relies on dct:identifier and adms:identifier as the former is included in DCAT 1, and the latter is already included in different DCAT APs.
Part of this proposal is inspired by the Recommendation available in dcat ap guidelines, which recommends
- Assign a stable identifier to the dataset in the catalogue where the dataset is first published. This should be the primary identifier of the dataset. Include this identifier as the value of dct:identifier.
- In the case of duplicates, other locally minted identifiers or external identifiers, like Datacite, DOI, ELI etc., will be assigned to the dataset. As long as they are globally unique and stable, these identifiers should be included as values to adms:identifier.
- Harvesting systems should not delete or change the value of adms:identifier and only use it to compare harvested metadata to detect duplicates.
Compatibility with Google Schema identifier can be addressed proposing proper mappings.
A stable primary identifier is set by using dct:identifier.
Example 1: An example of HTTP dereferenceable ID used in the catalogue where the dataset is first published
<https://example.org/id> a dcat:Dataset;
dct:identifier "https://example.org/id"^^xsd:anyURI
...
.
If the dataset hasn't an HTTP dereferenceable ID, it must have at least a proxy dereferenceable URI as in the following example
Example 2:
<https://example.org/proxyid> a dcat:Dataset;
dct:identifier "id"^^type
'type' can be any RDF recognized datatype IRIs or it can also be custom OWL datatype specified for indicating identifier scheme.
If a registered URI type is used (following RFC-3986), the identifier scheme is part of the URI; thus indicating a separate identifier scheme in 'type' is redundant. For example, DOI is registered as a namespace in the 'info' URI scheme (see faq #11), so it would appear that to formally encode a DOI as an rfc 3986 URI, see below.
Example 2.1:
<https://example.org/proxyid> a dcat:Dataset;
dct:identifier "info:doi/10.1109/5.771073"^^xsd:anyURI
Otherwise common types for identifier scheme (arXiv, ect) are defined in DataCite schema and FAIRsharing Registry.
Question 2: should we find a way to associate to the type a way to resolve the id?
Example 3: An example in the catalogue that has harvested the dataset
<https://othercatalog.org/id> a dcat:Dataset;
# dct:identifier shouldn't be changed by harvesters
dct:identifier "https://example.org/id"^^xsd:anyURI
...
.
DCAT specifies secondary identifiers by adms:identifier
Example 4:
<https://example.org/id> a dcat:Dataset;
...
adms:identifier <https://example.org/sid>
<https://example.org/sid> rdf:type adms:Identifier ;
# the actual id
skos:notation "PA 1-060-815"^^ex:USCO ;
# Human readable schema agency
adms:schemaAgency "US Copyright Office" ;
# machine readable schema agency
dct:creator <https://www.copyright.gov/>
dcterms:issued "2001-09-12"^^xsd:date .
Some guidelines for the most common identifier schemas should be suggested in order to avoid unnecessarily fancy distinct representations for the same ids. Specific datatypes can be considered to foster harmonization..
For example, checking real RDF fragments, DOIs are represented quite differently see examples below.
Example 5: from DCAT-AP-IT guidelines
<http://dati.gov.it/resource/AltroIdentificativo/altroidentificativoDataset1>
a adms:Identifier ;
skos:notation "doi:10.1109/5.771073";
...
.
or
Example 6: from csiro-dap-examples.ttl
dap:doi-P366-2003SEPT
rdf:type adms:Identifier ;
dct:creator <https://researchdata.ands.org.au/> ;
skos:notation "10.4225/08/598dc08d07bb7" ;
adms:schemeAgency "International DOI Foundation" ;
.
Option 1: Using ADMS Identifies
Example 7:
<https://example.org/id> a dcat:Dataset;
...
# Secondary ID
adms:identifier <https://example.org/iddoi>
<https://example.org/iddoi> rdf:type adms:Identifier ;
skos:notation "https://doi.org/10.1109/5.771073"^^xsd:anyURI;
# reading https://www.w3.org/TR/skos-reference/#notations more than one skos:notation can be set,
skos:notation "info:doi/10.1109/5.771073"^^xsd:anyURI ;
# the authority/agency defining the identifier scheme, used if the agency has no URI
adms:schemeAgency "International DOI Foundation" ;
# the authority/agency defining the identifier scheme, used if the agency has URI
dct:creator ex:InternationalDOIFundation.
ex:InternationalDOIFundation a dct:Agent;
rdfs:label "International DOI Foundation";
foaf:homepage <https://www.doi.org/> .
DCAT uses adms:schemeAgency and dct:creator to represent the authority that defines the identifier scheme (e.g., DOI foundation in the example), adms:schemeAgency is used when the authority has no URI associated. DCAT does not represent the authority responsible for assigning and maintaining identifiers using that scheme (e.g., IEEE ) as naming the registrant goes against the philosophy of DOI where the sub-spaces are abstracted from the organisation that registers them, with the advantage that DOIs don't change when the organisation changes or the responsibility for that sub-space is handed over to someone else.
When the HTTP dereferenceable ID returns rdf/owl, the use owl:sameAs might be consider
<https://example.org/id> a dcat:Dataset;
...
owl:sameAs <https://doi.org/10.1109/5.771073>
- does make sense to use owl:sameAs when the id is not returning owl/rdf? I am not sure we should recommend it.
In ADMS this is expressed using the adms:Identifier
class with the following properties:
- the identifier is represented as skos:notation, datatyped with the identifier scheme (including the version number if appropriate);
- the agency that manages the identifier is set using
- dcterms:creator to link to a class that represents the agency
- adms:schemaAgency to provide the name of the agency as a literal;
- date on which the identifier was issued is represented with further properties such as dcterms:issued .
An important point to note is that properties of adms:Identifier
are properties of the Identifier, not the resource that it identifies or the agency that issued it.
Example from Phil Archer, Marios Meimaris, Agisilaos Papantoniou, Registered Organization Vocabulary, W3C Working Group Note 01 August 2013
1 <http://business.data.gov.uk/id/company/04285910>
2 a rov:RegisteredOrganization ;
3 ... ....
9 adms:identifier <http://example.com/id/oc04285910> ;
10 org:registeredSite <http://example.com/id/rs04285910> .
# The actual registration
11 <http://example.com/id/li04285910> a adms:Identifier ;
# textual identifier
12 skos:notation "04285910"^^ex:idType ;
13 adms:schemaAgency "UK Companies House" ;
14 dcterms:issued "2001-09-12"^^xsd:date .
# A supplementary identifier (Open Corporates)
15 <http://example.com/id/oc04285910> a adms:Identifier ;
16 skos:notation "http://opencorporates.com/companies/gb/04285910"^^ex:OCid ;
17 dcterms:issued "2010-10-21T15:09:59Z"^^xsd:dateTime ;
18 dcterms:modified "2012-04-26T15:16:44Z"^^xsd:dateTime ;
19 dcterms:creator <http://opencorporates.com/companies/gb/07444723> .
Example from DCAT-AP-IT
<http://dati.gov.it/resource/Dataset/ContrattiSPC_agid>
a dcatapit:Dataset , dcat:Dataset ;
dct:identifier "agid:D.1" ;
# Secondary identifier
adms:identifier <http://dati.gov.it/resource/Identifier/ContrattiSPC_agid_altroID> ;
Example in DXWG GitHub space csiro-dap-examples.ttl
dap:atnf-P366-2003SEPT
rdf:type dcat:Dataset ;
...
dct:description "Parkes multibeam high-latitude pulsar survey" ;
dct:identifier "https://doi.org/10.4225/08/598dc08d07bb7"^^xsd:anyURI ;
dct:identifier "ivo://au.csiro.atnf/P366-2003SEPT"^^xsd:anyURI ;
dct:license <https://creativecommons.org/licenses/by/4.0/> ;
dct:modified "2017-07-30T08:55:55Z"^^xsd:dateTime ;
dct:relation [
dct:identifier "PH0090_0011.sf" ;
] ;
dct:relation [
dct:identifier "PH0090_0021.sf" ;
] ;
dct:relation [
dct:identifier "PH0090_0031.sf" ;
] ;
dct:rights [
rdf:type dct:RightsStatement ;
rdfs:comment "All Rights (including copyright) CSIRO 2017." ;
] ;
dct:temporal [
rdf:type dct:PeriodOfTime ;
rdf:type time:ProperInterval ;
time:hasBeginning [
rdf:type time:Instant ;
time:inXSDDate "2003-09-01"^^xsd:date ;
] ;
time:hasEnd [
rdf:type time:Instant ;
time:inXSDDate "2003-12-31"^^xsd:date ;
] ;
] ;
dct:title "Parkes observations for project P366 semester 2003SEPT" ;
dcat:contactPoint dap:MartaBurgay-vcard ;
dcat:keyword "pulsar" ;
dcat:landingPage <https://data.csiro.au/dap/landingpage?pid=csiro:P366-2003SEPT> ;
dcat:theme <http://registry.it.csiro.au/def/keyword/anzsrc/astronomical-and-space-sciences-not-elsewhere-classified> ;
prov:wasGeneratedBy dap:P366 ;
.
dap:doi-P366-2003SEPT
rdf:type adms:Identifier ;
dct:creator <https://researchdata.ands.org.au/> ;
skos:notation "10.4225/08/598dc08d07bb7" ;
adms:schemeAgency "International DOI Foundation" ;
.
Example with https://schema.org/identifier
... To do ?!...